We have mostly discussed the DPU for use with edge AI applications. If you are wondering if the DPU can also be used for Data Center or on-premise AI applications, the answer is yes!
We recently demonstrated the DPU on an Alveo data center acceleration card at the CVPR 2019 conference.
We showed a Resnet50 demo which runs on the Alveo U250 data center accelerator card with an industry-leading mixed precision implementation. This demo used Int8/Int2 activation and Int8/Ternary weights.
With the DPU design optimized for the Alveo U250 data center accelerator card, it can run Resnet50 @ 5100+ fps and around 3ms latency with batch size of 16. By comparison, with batch size of 32, Nvidia T4 runs Resnet50 @ 4600+ fps (6.8ms latency!)