UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Adam Taylor’s MicroZed Chronicles, Part 112: Finite Impulse Response Filter PL implementation using SDSoC

by Xilinx Employee ‎12-14-2015 09:32 AM - edited ‎01-06-2016 10:31 AM (64,723 Views)

 

By Adam Taylor

 

With the FIR filter all up and running in software as desired on the PS (Processor System) side of our Zynq SoC, we will now proceed to accelerate the function using the PL (Programmable Logic) side of the device.

 

To get the best performance we need to ensure that:

 

  • We use DMA data movers between the PS and the PL – To achieve this we often need contiguous memory allocations. We can achieve this using the sds_alloc() function available within sys_lib.h. We can also use pragmas to define the interface type provided any prerequisites are met. If we do not use pragmas the SDS compiler will select the most appropriate data mover.

 

 Image1.jpg

 

 

  • We pipeline / unroll loops as appropriate – Pipelining allows instructions within a loop to be implemented concurrently. We define pipelining using a pragma with a parameter called the iteration interval, which defines the target number of clock cycles between commands. Loop unrolling creates multiple copies of the contents of the loop. Choosing whether to pipeline (and selecting an iteration interval), and whether or not to unroll a loop depends upon data interdependencies within the loop.

 

  • We have correctly segmented any memory arrays within the implementation – Selecting the correct segmentation allows us to ensure that we maximize available memory bandwidth, which increases the performance of our accelerated function. Like most SDSoC commands, we do this using a pragma.

 

  • Select the best clock rates for the Data Mover Network and the accelerated function itself – These clock rates were designated in our platform definition in Vivado.

 

 

Before I implemented any optimizations within the filter, I wanted to initially determine its initial performance. Just running the bare results with no optimization resulted in a 36.7% performance increase. That’s not bad. However, we can do better.

The next step was implementing the optimization. To minimise the build time, I used the SDR Estimate build to chart the improvements as I fine-tuned my pragmas. Using the above four points to get the best performance, I ensured that the memory allocation for the samples being transferred in to the accelerated function were contiguous. The FIR filter is implemented as two loops: an inner loop that applies the filter and an outer loop that cycles through the sample buffer. There is obvious data dependency between these loops but we can still pipeline them to reduce the initial iteration interval. I segmented the samples and coefficients completely to achieve maximum memory bandwidth.

 

The final step was to define the clocks for the data mover and the accelerated function. Putting all of this together results in a significant improvement. The total execution time was 54696 clock cycles, an 89.78% improvement. This should come as no surprise as FPGA fabric is especially good at implementing FIR filters, using hard macros like the DSP48E DSP Slice.

 

When I ran the accelerated function on the ZedBoard, I again captured the filter input and output for a signal within the passband and the same for a signal within the stopband. You can see the results below:

 

 

Image2.jpg

 

 

Image3.jpg

 

Due to the holiday period, this is the last MicroZed Chronicles blog of 2015. They will resume in 2016 so please check after your New Year celebration. Until then, have a Merry Christmas and Happy New year.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

You also can find links to all the previous MicroZed Chronicles blogs on my own Web site, here.

 

Comments
by Newbie arobson73
on ‎03-21-2016 04:32 PM

Would there be any improvement in performance if the FIR filter was coded in Verilog / VHDL ? Is the trend now to avoid using HDL and instead use the High Level Synthesis? 

by Observer taylo_ap
on ‎03-22-2016 01:49 PM

arobson73,

 

As with everything you could probably fine tune it and make it perform faster if you did it by hand, just as you can get assembler to be more optimsed than C say, th questions becomes do we need to do that. 

 

As engineers we are required to deliver our systems to customers on quality, time and cost (I think in that order too) HLS provides us the ability to quickly and easily implement the filter and in this case it meet my requirements (obviously as I set the requirements). It therefore in my view becomes the obvious choice to use, it is the right tool for the job at hand. 

 

I think there is a trend to increasing levels of abstraction and it has been on going ever since programmable logic was first invented - we have been through logic equations, schematic entry, ADHL, HDLs and now HLS. When you think of the performance the devices are capable of now compared to where they started it kind of makes sense that the level of abstraction  used to programme these devices increases to enable us to meet the quality, schedule and cost our customers require. Having recently had to struggle with a large modern FPGA an older engineer had designed at the schematic level and get it working it is quite a challenge believe me. 

 

I would not think HDL is going to go away quickly but I do think that as engineers we need to keep up with modern design techniques and HLS is definately here to stay and will be playing a increasing part in FPGA and SoC Developmments although of course HDL will be around for a good while yet.

 

Best Regards and thanks for reading 

 

Adam 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.