We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

OpenCL code compiled with Xilinx SDAccel accelerates genome sequencing, beats CPU/GPU performance/W by 12-21x

by Xilinx Employee ‎02-10-2016 11:10 AM - edited ‎02-11-2016 11:49 AM (30,422 Views)


In a paper published at the recent SC15 (accompanying poster here), Ashish Sirasao, Elliott Delaye, Ravi Sunkavalli, and Stephen Neuendorffer of Xilinx describe their use of the OpenCL language and the Xilinx SDAccel Design Environment to accelerate execution of the Smith-Waterman alignment algorithm, which is used for genome sequencing. Smith-Waterman algorithmic performance is measured in GCUPS (billions of cell updates per second) and, taking a quick shortcut to the reported result, the systolic array architecture implemented for this FPGA-accelerated Smith-Waterman algorithm and instantiated in a Xilinx Virtex-7 690T FPGA on an off-the-shelf Alpha Data ADM-PCIE-7V3 PCIe card runs:


  • 3.9x faster with nearly 19x better performance/W than it does on a 12-core Intel X86 server CPU
  • 6x faster with more than 21x better performance/W than it does on a 60-core Intel Xeon Phi MIC (Many Integrated Core Architecture) coprocessor
  • 30% faster with nearly 12x better performance/W than it does on an nVidia Tesla K40 GPU with 2880 stream processors



 Alpha Data ADM-PCIE-7V3.jpg


Alpha Data ADM-PCIE-7V3 PCIe card based on a Xilinx Virtex-7 690T FPGA




Here are the Smith-Waterman performance results, taken from the SC15 poster:



SDAccel Smith-Waterman Performance from SC15.jpg 



Saying that these performance and performance/W results are significant is putting it mildly.


The diagram below from the SC15 poster shows why the Smith-Waterman algorithm is well-suited to a highly parallel systolic-processing approach:



Smith-Waterman Systolic Processing.jpg 



Of course, large FPGAs like the Xilinx Virtex-7 690T have abundant parallel computing resources so they are adept at implementing highly parallel compute engines such as the systolic array needed to efficiently execute the Smith-Waterman algorithm.


The authors’ experiments with FPGA-based Smith-Waterman algorithm implementations were multi-dimensional. In one dimension, the experiments determined the optimal number of systolic cells per OpenCL kernel versus the number of instantiated kernel instances needed to obtain maximum algorithmic performance. In this implementation, that number turns out to be 32 systolic cells per OpenCL kernel based on numerical analysis of the results, as shown in the diagram below (taken from the poster).



Smith-Waterman Optimal PE per OpenCL kernel.jpg 


Several more experimental dimensions are represented by performance and performance/W comparisons with the Smith-Waterman algorithm running on the 12-core Intel Xeon CPU, the 60-core Intel Xeon Phi MIC coprocessor, and the nVidia Tesla K40 GPU (as reviewed in the results table appearing a few paragraphs above).


Perhaps the most significant result however is not necessarily the FPGA implementation’s better performance or even the vastly superior performance/W but the ease-of-use result. This paper demonstrates how you can compile OpenCL code using SDAccel to successfully implement high-performance, low-power systolic arrays on FPGAs—something that was previously possible only by writing RTL code. It’s that sort of result that will put FPGA acceleration into more data centers more quickly than anything else.


Here’s a thumbnail image of the SC15 Poster, which capsulizes the information from the paper:



Smith-Waterman SDAccel Poster.jpg 



If this real-world example has piqued your curiosity about algorithmic FPGA-acceleration or SDAccel, you might want to read:


About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.