AI is rapidly evolving across many fronts, with tremendous advancements in capability, accuracy and efficiency achieved by Deep Neural Network (DNN) model types. The state-of-the-art is more than just the advancement of models however. Productizing models in an efficient and practical manner is also a key step towards widespread deployment of AI.
While Floating point precision is important during the DNN training phase, it is not nearly as important for deployment. The bit-precision of a DNN model can be significantly reduced (quantized) to reduce the compute resources required, with little (sometimes no) impact on the accuracy at which the model performs.
Maintaining accuracy while reducing bit-precision, and therefore compute resources, is critical for ensuring AI can be broadly deployed in a cost-effective and energy efficient manner. This is important for the Data Center and for the Edge, as the DNN models deployed to provide AI inference in real applications can be replicated thousands or even millions of times.
Quantization is one of several key areas that Xilinx is helping to advance the state-of-the-art. Xilinx will be presenting results from an improved quantization methodology at the MLSys 2020 (previously SysML) conference in Austin Texas. The conference runs from March 2nd to 4th (and still has early bird pricing, so register before Feb 4th!)
Entitled “Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks”, the paper covers in detail the techniques used to train quantization thresholds (TQT) and weights simultaneously using standard backpropagation and gradient descent. Of particular interest are the results from quantization of traditionally difficult networks such as MobileNets. This is one of 34 accepted papers out of 170 submissions. A pre-print of the paper is available here, and the tool developed by Xilinx (Graffitist) is available on Github.
Sambhav Jain will be introducing the paper and hosting a poster session. Please join the event if you can.