UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer ain0102
Observer
2,440 Views
Registered: ‎02-06-2018

What is the difference between unroll and pipeline?

Jump to solution

Hi.

 

in my code

void topFunc(hls::stream<UINT8> &in, hls::stream<UINT8> &out)
{
   ...

   for(int i=0; i<WIDTH; i++)
//#pragma HLS UNROLL
//#pragma HLS PIPELINE out << 255; ... }

When using the pipeline, latency increased by WIDTH.

 

pipeline Where is the difference between unroll and pipeline?
When should I use the pipeline?

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
Scholar u4223374
Scholar
2,414 Views
Registered: ‎04-26-2015

Re: What is the difference between unroll and pipeline?

Jump to solution

Pipelining a loop with II="N" means that HLS should try to finish one iteration of the loop every N clock cycles. Normally you'd set II=1, which means that it finishes one loop iteration every clock cycle. Resource count often does not increase by much, and may even decrease - the same hardware is being used but it's being utilized more of the time. Instead of a RAM with the source data being read every fifth cycle (and the other four cycles being used for processing), it's being read every cycle while processing continues for the previous four elements. As a result, for many processing tasks pipelining is a very good approach. However, it has a limitation: you can't pipeline to less than one clock cycle per iteration, so the absolute minimum time taken for a loop with M iterations is M cycles (normally it's a little bit more as starting and stopping the pipeline takes time).

 

Unrolling a loop with factor="N" means that HLS should create N copies of the processing hardware and run them in parallel. This has the advantage that it is possible to finish lots of iterations in every single clock cycle - I've had a block that did 54 iterations of a loop in one cycle. The disadvantage is that it takes a lot of hardware. If you're going to process 30 elements from an array at once, you need to be able to read 30 elements at once. Since Xilinx block RAMs have a maximum of two ports, this implies that you're going to need at least 15 block RAMs in parallel. If you don't have that, then HLS will construct a huge state machine so that your processing hardware gets sequential access to the RAM, which essentially drops you back to the un-unrolled performance while using 30 times as much hardware. As a result, unrolling only makes sense when you have (or can make) data structures that work with it. Other key requirements are that the extra hardware it generates (N sets of the processing hardware) is justified by the performance gain.

 

An example of where unrolling really helps is if you've got a 72-bit RAM which stores nine 8-bit values per element, and you want to sum those values. Doing this with a normal loop or pipelined loop will result in HLS reading each element nine times, using a mux to select the relevant 8 bits every time, and sequentially feeding them through one adder. On the other hand, unrolling it completely will result in HLS reading each element once and using eight adders to do the additions simultaneously. This eliminates the mux (each adder reads from a constant location in the 72-bit RAM output bus), simplifies the state machine, and gives roughly nine times the performance - with the only tradeoff being that it uses more adders. Since adders are very plentiful on Xilinx FPGAs, this is probably a worthwhile tradeoff.

2 Replies
Explorer
Explorer
2,423 Views
Registered: ‎05-23-2011

Re: What is the difference between unroll and pipeline?

Jump to solution

Hi

In my words:
Unroll means that the same sort of function which should be done n-times should be done in (sem-)parallel to get a shorter latency and higher thoughput.

Pipeline means to instruct a task to execute in a pipeline, allowing the next execution of the task to begin before the current execution is complete.
This could be for example other operations with the same data.

Kind regards

Thomas

0 Kudos
Scholar u4223374
Scholar
2,415 Views
Registered: ‎04-26-2015

Re: What is the difference between unroll and pipeline?

Jump to solution

Pipelining a loop with II="N" means that HLS should try to finish one iteration of the loop every N clock cycles. Normally you'd set II=1, which means that it finishes one loop iteration every clock cycle. Resource count often does not increase by much, and may even decrease - the same hardware is being used but it's being utilized more of the time. Instead of a RAM with the source data being read every fifth cycle (and the other four cycles being used for processing), it's being read every cycle while processing continues for the previous four elements. As a result, for many processing tasks pipelining is a very good approach. However, it has a limitation: you can't pipeline to less than one clock cycle per iteration, so the absolute minimum time taken for a loop with M iterations is M cycles (normally it's a little bit more as starting and stopping the pipeline takes time).

 

Unrolling a loop with factor="N" means that HLS should create N copies of the processing hardware and run them in parallel. This has the advantage that it is possible to finish lots of iterations in every single clock cycle - I've had a block that did 54 iterations of a loop in one cycle. The disadvantage is that it takes a lot of hardware. If you're going to process 30 elements from an array at once, you need to be able to read 30 elements at once. Since Xilinx block RAMs have a maximum of two ports, this implies that you're going to need at least 15 block RAMs in parallel. If you don't have that, then HLS will construct a huge state machine so that your processing hardware gets sequential access to the RAM, which essentially drops you back to the un-unrolled performance while using 30 times as much hardware. As a result, unrolling only makes sense when you have (or can make) data structures that work with it. Other key requirements are that the extra hardware it generates (N sets of the processing hardware) is justified by the performance gain.

 

An example of where unrolling really helps is if you've got a 72-bit RAM which stores nine 8-bit values per element, and you want to sum those values. Doing this with a normal loop or pipelined loop will result in HLS reading each element nine times, using a mux to select the relevant 8 bits every time, and sequentially feeding them through one adder. On the other hand, unrolling it completely will result in HLS reading each element once and using eight adders to do the additions simultaneously. This eliminates the mux (each adder reads from a constant location in the 72-bit RAM output bus), simplifies the state machine, and gives roughly nine times the performance - with the only tradeoff being that it uses more adders. Since adders are very plentiful on Xilinx FPGAs, this is probably a worthwhile tradeoff.