At XDF China in December 2019, Mr. Jiansong Zhang, Staff Engineer of Alibaba, gave a great talk about AI Platforms and Heterogenous Computing.
Alibaba developed a deep learning stack on top of Xilinx FPGAs including IP, shell, runtime, driver, compiler and models to enable various AI workloads.
This solution features -
Optimized and customized hardware design
On-chip streaming structure
3D-systolic-array conv. engine which makes full use of DSP carry chain and supertile design @ 600MHz
Configurable dimensions of parallelism
Resource allocation & task scheduling using a runtime
Software-hardware co-optimization in compiler
Model parsers for ONNX and Tensorflow
Mr. Zhang also introduced the following 4 key use cases to demonstrate their excellent results.
Case 1: OCR (Optical Character Recognition) in Public Cloud Services
Case 2: Edge solution for Smart Retail
Case 3: Private Cloud Service
~7x TCO saving achieved in replacement of CPU servers
Case 4: Speech Synthesis
Speech synthesis is an iterative task in which 16,000 iterations are needed to generate one second of audio. NN-based TTS (Text-to-speech) can be indistinguishable from human speech. Alibaba developed a Xilinx FPGA-based solution for real-time WaveNet, a state-of-art NN model for TTS. With the customized autoregressive low-latency IP in hardware and customized on-chip loop implemented in a compiler, they achieved 150x speed-up compared to a GPU implementation!
This is another great example of adaptable AI inference powered by Xilinx!