Benchmark Analysis

We have benchmarked how many samples per second different models can process on different hardware devices. Below are some results for BERT-Squad, MobileNetV2, ResNet50, SuperResolution, YOLOv4 and FastNeuralStyleTransfer.

We always used batch size of 1 which is relevant for real time request response applications.

CPU

Benchmarks for c5a.4xlarge, an AWS EC2 CPU compute instance. Higher is better.

GPU

Benchmarks for g4dn.xlarge, an AWS EC2 GPU instance.
Some model conversions failed, that is why some backend results are missing.

CPU vs GPU vs ARM

Comparison of the following similarly priced AWS EC2 instances in the us-east-1 region.

Instance Type	Device	Cost
c5n.2xlarge	CPU	$0.432
g4dn.xlarge	GPU	$0.526
c6g.4xlarge	ARM	$0.544
c5a.4xlarge	CPU	$0.616

Only the performance of the best backend for each of the instances is shown.

More expensive instance does not always deliver higher throughput. Also, notice that the same model on the same device type (CPU) is sometimes faster using one backend sometimes another.

Conclusion

If your application requires a large amount of requests per second the GPU seems to be the cheapest option to choose from. If your demands are lower, a cheaper compute instance or a group of them might be a better option. See the graph below. Red dashed lines represent two applications with different requirements, y-axis is in logscale.

In any case, running a DNN-Bench before deploying and identifying the best inference backend for your model can save you a lot of cost and increase your model's throughput.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!