We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Showing results for 
Search instead for 
Did you mean: 

Low-latency HD Inference - a New Treatment for Myopic Vision Systems

Xilinx Employee
Xilinx Employee
1 0 517

This is a guest post from Quenton Hall, AI System Architect for Industrial, Scientific and Medical applications. 

One of the AI demo highlights at XDF2019 in San Jose was a high-performance inference demo leveraging Alveo.  If you are familiar with Alveo and ML Suite, this might at first glance not seem that novel.  However, what was indeed very novel was that this demonstration leveraged a brand-new inference engine.  Whereas past Alveo ML inference implementations have leveraged the xDNN engine architecture, this latest demo implements a new version of the Xilinx DPU IP, specifically optimized for the Alveo U280 and Xilinx SSIT devices.

The Alveo U280, based on Xilinx 16nm Ultrascale+ fabric, supports integrated HBM2 memory.  This memory is accessible directly from the FPGA fabric via a high-performance hardened HBM controller.  The Xilinx HBM controller integrates 16 independent 256-bit wide AXI slave ports.  Each of these ports can directly access ANY address within the HBM memory space, enabled by the inclusion of a 16 x 16 hardened AXI crossbar switch.  The Alveo U280 supports 8GB of HBM2, supporting total bandwidths of 460 GB/s (yes, that is capital “B” for “Bytes” for those who might question….).


If you are familiar with the Xilinx DPU (Deep Neural-Net Processor) architecture, then you may already be aware that one of the key advantages is that it is a pipelined architecture, consisting of an array of heterogeneous processing elements, a DMA, scheduling logic, and a micro-coded engine whose primary task is to schedule acceleration of your compiled network graph.  One key advantage of this architecture is that it enables the developer to run inference on multiple neural networks in a TDM configuration with extremely low latency.

Another key advantage of the DPU is that intermediate activations can be stored in local memory on-chip, reducing the power consumption and latency required for inference.  However, it is true that it is not possible to store all intermediate activations in local memory, and that there does remain a need to schedule reads and writes to external memory.  The more input channels, feature maps, and higher the resolution used for inference, the larger the memory footprint.  Along comes HBM2…..

In this demonstration, the design team implemented two HBM-enabled DPU instances running at 250MHz on the U280.  Each of these two instances supports peak INT8 inference performance of 4 TOPs.  The U280 used in this demonstration runs inference (MobileNetv1_SSD) on 8 streams, with an aggregate frame rate of a 200fps and 8ms latency.  And if that were not enough, inference for each stream is at the native 720p resolution (yes, 1280 x 720 are the DIMs of the first CONV layer, for those who might question…..).

Xilinx Alveo cards support the acceleration of a diverse set of workloads and are seeing widespread adoption in many applications where the lack of flexibility of general-purpose CPUs and GPUs is a limitation. With the recent announcement at XDF2019 of VITIS and VITIS AI, developers will be able to take advantage of open and standard APIs and Frameworks to accelerate applications “out-of-the-box” without there being a pre-requisite for domain-specific knowledge and FPGA expertise.


As a first look at this new paradigm, visit https://developer.xilinx.com/ and learn more about your personalized treatment for “design myopia”.

Interested in more on AI Camera Development?  Join us October 22 for a webinar on “Accelerating AI Camera Development” with Quenton Hall, AI System Architect.  Find out more and register at https://event.on24.com/wcc/r/2099987/0590AEFDCE940FE23F526E995EF8FA6E?partnerref=ism.