PP-DocLayoutV3 Inference Benchmark: SafeTensor vs ONNX vs PaddlePaddle

by andynoodles - opened Mar 21

Mar 21

I benchmarked all three inference backends for PP-DocLayoutV3 on an NVIDIA RTX 5060 Ti (16 GB), Linux, CUDA 13
5 warmup + 50 timed runs, preprocessing excluded, GPU sync on all frameworks.

Metric	SafeTensor (PyTorch)	ONNX Runtime	PaddlePaddle
End-to-end (mean)	41.7 ms	55.4 ms	64.3 ms
Throughput	24.0 FPS	18.1 FPS	15.6 FPS
Latency stdev	0.7 ms	0.2 ms	1.2 ms
RAM (total)	2,634 MB	3,213 MB	3,844 MB
GPU (peak)	1,534 MB	2,062 MB	1,658 MB

All three backends produce 13 detections with matching labels and bounding boxes (scores differ by < 0.01).

Key findings:

SafeTensor/PyTorch is 1.3–1.5x faster end-to-end, even with post-processing outside the graph
ONNX has the most consistent latency (0.2 ms stdev)
Important: the ONNX model expects mean=[0,0,0], std=[1,1,1] (rescale by 1/255 only) — NOT ImageNet normalization. Using ImageNet norm drops detections from 13 to 12

Full code, scripts, and methodology:
https://github.com/andynoodles/PPDocLayout-V3-Benchmark

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment