UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 

VectorBlox Matrix Processor IP for FPGAs accelerates image, video, other types of processing

Xilinx Employee
Xilinx Employee
0 0 46.6K

You know what it’s like when you connect with a professor who’s really, really good at explaining things? That’s how I felt talking to Guy Lemieux, who is both the CEO of VectorBlox, an embedded supercomputing IP vendor, and a Professor of Electrical and Computer Engineering at the University of British Columbia. We met at this week’s Embedded Vision Summit in Santa Clara, California where I got a fast education in matrix processor IP in general and a real-time demo of the VectorBlox MXP Matrix Processor IP core. (See the video below).

 

The VectorBlox MXP is a scalable, soft matrix coprocessor that you can drop into an FPGA to accelerate image, vision, and other tasks that require vector or matrix processing. If your system has a lot of such processing to handle and needs real-time performance, you’ve got three choices—design paths you might take:

 

  1. HDL design using Verilog or VHDL
  2. High-level synthesis using a C-to-gates compiler like Xilinx Vivado HLS
  3. Use a vector co-processor to boost performance

Path number 1 is the traditional path taken by hardware designers since HDLs became popular at the end of the 1980s. In the early 1980s, nascent HDL compilers weren’t that great at generating hardware, commonly described as having poor QoR—Quality of Results. Many designers back then either felt or said outright, “I'll give you my schematics when you pry them from my cold, dead hands.”

 

You don’t see many systems being designed with schematics these days. Systems have gotten far too complicated and HDLs represent a far more suitable level of abstraction. Tools change with the times.

 

First-generation high-level synthesis tools, as embodied in Synopsys’ Behavioral Compiler, met with similar resistance and they didn’t go very far. However, design path number two has become viable as HLS compilers have improved. You can now find a growing number of testimonials to the effectiveness of such tools, like this one from NAB 2014.

 

Design path number three has the compelling allure of a software-based, quick-iteration design approach. Software compilation remains faster than HDL-based hardware compilation followed by placement and routing but depends on using a processor with appropriate matrix acceleration—not really the purview of the usual RISC suspects.

 

Matrix processing is exactly what the VectorBlox MXP is designed to do.

 

The VectorBlox MXP Matrix Processor is a licensable, configurable IP core that integrates with Xilinx Vivado and ISE Design Suites. C/C++ libraries and code examples accompany the core as does a functional simulator for debugging and regression testing.

 

Depending on the configuration, the MXP Matrix Processor handles 1D, 2D, and 3D matrices using a range of data types including bytes, half words, and words interpreted as integers, fixed-point numbers, or floating-point numbers. Here’s a block diagram of the MXP accelerator when instantiated as a system:

 

 

VectorBlox MXP Matrix Processor 2.jpg 

 

 

As you can see, the MXP Matrix Processor is configured as an AXI device and simply drops into the on-chip AXI interconnect fabric so that it can communicate with a host processor and bulk AXI memory. An integrated DMA engine moves the vectors and matrices around in the system.

 

So how well does this work? Perhaps you’ll find these performance numbers helpful:

 

 

VectorBlox MXP Matrix Processor Performance Numbers.jpg

 

 

Note: These numbers were measured with an MXP Matrix Processor instantiated on an Avnet ZedBoard with a Zynq Z-7020 All Programmable SoC. The acceleration numbers are relative to the same C code running on the Zynq SoC’s on-chip ARM Cortex-A9 MPCore processor with NEON extensions.

 

The MXP Matrix Processor is a configurable IP core that’s instantiated in an FPGA programmable logic fabric so its performance scales. You get faster numbers with bigger and/or faster FPGAs and by adding ALUs and custom instructions, as shown in this graph from VectorBlox (a Xilinx MicroBlaze soft processor used as a performance reference):

 

 

VectorBlox Performance Graph.jpg

 

 

Now, here’s that explanatory video promised in the first paragraph: