We recently used Xilinx’s SDAccel development environment to compile and optimize a video-watermarking application written in OpenCL for an FPGA accelerator card. Video content providers use watermarking to brand and protect their content. Our goal was to design a watermarking application that would process high-definition (HD) video at a 1080p resolution with a target throughput of 30fps running on an Alpha Data ADM-PCIE-7V3 card.
The SDAccel development environment enables designers to take applications captured in OpenCL and compile them to an FPGA without requiring knowledge of the underlying FPGA implementation tools. The video-watermarking application serves as a perfect way to introduce the main optimization techniques available in SDAccel.
The main function of the video-watermarking algorithm is to overlay a logo at a specific location on a video stream. The logo used for the watermark can be either active or passive. An active logo is typically represented by a short, repeating video clip, while a passive logo is a still image. The most common technique among broadcasting companies that brand their video streams is to use a company logo as a passive watermark, so that was the aim of our example design. The application inserts a passive logo on a pixel-by-pixel level of granularity based on the operations of the following equations:
The input and output frames are two-dimensional arrays in which pixels are expressed using the YCbCr color space. In this color space, each pixel is represented in three components: Y is the luma component, Cb is the chroma blue-difference component and Cr is the chroma red-difference component. Each component is represented by an 8-bit value, resulting in a total of 24 bits per pixel.
The system on which we executed the application is shown in the figure below. It is composed of an Alpha Data ADMPCIE-7V3 card communicating with an x86 processor over a PCIe link. In this system, the host processor retrieves the input video stream from disk and transfers it to the device global memory. The device global memory is the memory on the FPGA card that is directly accessible from the FPGA. In addition to placing the video frames in device global memory, the logo and mask are transferred from the host to the accelerator card and placed in on-chip memory to take advantage of the low latency of BRAM memories. The code that runs on the host processor is responsible for sending a video frame to the FPGA accelerator card, launching the accelerator and then retrieving the processed frame from the FPGA accelerator card.
System Overview for the Video Watermarking Application
The optimizations necessary when creating applications like this one using SDAccel are software optimizations. Thus, these optimizations are similar to the ones required to extract performance from other processing fabrics, such as GPUs. As a result of using SDAccel, the details of getting the PCIe link to work, drivers, IP placement and interconnect became a non-issue, allowing us as designers to focus solely on the target application.
Alpha Data, the maker of the FPGA-based Alpha Data ADM-PCIE-7V3 PCIe accelerator card discussed in this article, has joined the OpenPOWER Foundation. The organization is a group of technology organizations working collaboratively to build advanced server, networking, storage and acceleration technology as well as industry leading open source software aimed at delivering more choice, control, and flexibility to developers of next-generation hyperscale and cloud data centers.