Skip to content

Latest commit

 

History

History
79 lines (60 loc) · 3.5 KB

analysis.md

File metadata and controls

79 lines (60 loc) · 3.5 KB

Benchmark Analysis

We have benchmarked how many samples per second different models can process on different hardware devices. Below are some results for BERT-Squad, MobileNetV2, ResNet50, SuperResolution, YOLOv4 and FastNeuralStyleTransfer.

We always used batch size of 1 which is relevant for real time request response applications.

CPU

Benchmarks for c5a.4xlarge, an AWS EC2 CPU compute instance. Higher is better.

Bert-CPU Mobilenet-CPU Resnet-CPU SuperRes-CPU YoloV4-CPU FastNeuralStyle-CPU

GPU

Benchmarks for g4dn.xlarge, an AWS EC2 GPU instance.
Some model conversions failed, that is why some backend results are missing.

Bert-GPU Mobilenet-GPU Resnet-GPU SuperRes-GPU YoloV4-GPU FastNeuralStyle-GPU

CPU vs GPU vs ARM

Comparison of the following similarly priced AWS EC2 instances in the us-east-1 region.

Instance Type Device Cost
c5n.2xlarge CPU $0.432
g4dn.xlarge GPU $0.526
c6g.4xlarge ARM $0.544
c5a.4xlarge CPU $0.616

Only the performance of the best backend for each of the instances is shown.

More expensive instance does not always deliver higher throughput. Also, notice that the same model on the same device type (CPU) is sometimes faster using one backend sometimes another.

Bert-CPU-GPU-ARM Mobilenet-CPU-GPU-ARM Resnet-CPU-GPU-ARM SuperRes-CPU-GPU-ARM YoloV4-CPU-GPU-ARM FastNeuralStyle-CPU-GPU-ARM

Conclusion

If your application requires a large amount of requests per second the GPU seems to be the cheapest option to choose from. If your demands are lower, a cheaper compute instance or a group of them might be a better option. See the graph below. Red dashed lines represent two applications with different requirements, y-axis is in logscale.

Bert-CPU-GPU-ARM-stacked

In any case, running a DNN-Bench before deploying and identifying the best inference backend for your model can save you a lot of cost and increase your model's throughput.