One of the great things about programmable logic is its ability to free us from the sequential world that limits software performance.
Recent years have seen the introduction and adoption of tools such as High-Level Synthesis (HLS), SDAccel and SDSoC. These tools enable us to use higher level languages such as C, C++ and OpenCL to develop programmable logic-based solutions. Not only freeing us from the sequential SW world, but also opening up programmable logic to new users outside those of the traditional logic designer.
The Xilinx ALVEO announced at XDF last year card is designed to allow data center applications to benefit from programmable logic acceleration by using the SDAccel toolchain.
From the start, ALVEO has been developed for the acceleration of data center applications either in the cloud (e.g. Nimbix Cloud) or deployed on premises locally. ALVEO and its supporting frameworks have been designed to enable acceleration of applications such as machine learning, video processing and data analytics.
There are a range of ALVEO cards that are available from the U50 to the U200 and U280. Each one offers different logic resources, memory capacity and memory types, e.g. DDR and HBM.
ALVEO product range — the red box is U200 which I have
Internally, each ALVEO card contains a FPGA, DDR / HBM memory and connects to the host using a PCIe Gen3 x16.
ALVEO block diagram
It is through this PCIe interface that we will deploy our SDAccel application, but how do we do that?
The FPGA on the ALVEO card contains a dynamically reconfigurable region, which is configured using partial reconfiguration to implement the kernel. This dynamic region then connects to the PCIe end point and other interface, such as the DDR / HBM and QSFP interfaces using AXI interconnects that are contained within the non-dynamic region of the FPGA.
Developing applications for ALVEO and its dynamic region uses the OpenCL framework
Developing applications for ALVEO and its dynamic region uses the OpenCL framework.
OpenCL is meant to support heterogeneous platforms that consist of a CPU, (the host which is typically x86 based) and several acceleration kernels (sometimes called compute devices), which can be either GPU, DSP, FPGA or specialist hardware. OpenCl allows the development of portable applications that can be deployed across a range of different kernels. In the OpenCL flow, the ALVEO card is an OpenCL Kernel.
The host program is often developed in C or C++ with relevant support from OpenCL APIs to manage the kernels.
While the kernel application is developed using the OpenCL C language, which is based on C99 and C++11; however, there are some limitations to support portability across different kernel types. This includes the removal of support for stdio.h and stdlib.h, while scalar types e.g. char, float, etc .are defined at fixed sizes again to increase portability.
As such, OpenCL introduces both platform and execution models:
Ability to define the representation of any platform.
Contains a single host and several OpenCL kernel (compute devices).
Host program — manages the application using OpenCL APIs.
Kernels — run on OpenCL compute devices and accelerate functions.
When it comes to compiling the application, the host application will use a compiler such as G++ or GCC. Whereas the compiler for the OpenCL kernel is the vendor specific.
In the Xilinx development flow using SDAccel, the host application is compiled by XCPP and the Kernel by XOCC.
Transferring information between the host and kernels uses a memory with five different regions:
Host memory — accessible only to the host.
Global memory — accessible to both the host and the kernel, this is the main medium of transferring data between host and kernel.
Constant global memory — accessible to the host and kernel. However only the host as read write access, for kernels this region is read only.
Local memory — used by the kernel for computation and storage, not accessible to the host directly.
Private memory — used by tasks within a kernel, other tasks cannot access the memory area. Again there is no direct host access.
The OpenCL framework therefore lets us create the applications and accelerate them into our ALVEO card from a host.
I have just received a ALVEO U200 card, so over the next few weeks I will be building up a rack for it and setting it to work. I will, of course, share my journey!