cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
939 Views
Registered: ‎01-02-2019

Prevent a loop in function from unrolled

Jump to solution

Hi,

In xilinx website, I found the following description about loop unroll, is there any way to avoid the unroll for a loop with full-partitioned array ?   The unroll consumes a lot of resource in my design.

Thanks in advance.

 

 

TIP: When the use of pragmas like DATA_PACK, ARRAY_PARTITION, or ARRAY_RESHAPE, let more data be accessed in a single clock cycle, Vivado HLS automatically unrolls any loops consuming this data, if doing so improves the throughput. The loop can be fully or partially unrolled to create enough hardware to consume the additional data in a single clock cycle. This feature is controlled using the config_unroll command. See config_unroll in the Vivado Design Suite User Guide: High-Level Synthesis (UG902) for more information.

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
902 Views
Registered: ‎09-05-2018

Hey @y.lee0320 ,

HLS only automatically unrolls loops as stated in your tip if you have added config_unroll to your configuration settings. You may have to remove this option from your configuration settings.

The other possibility is that you have a PIPELINE directive above this loop in your code. That would force the lower level loops to unroll as well.

Nicholas Moellers

Xilinx Worldwide Technical Support

View solution in original post

10 Replies
Highlighted
Xilinx Employee
Xilinx Employee
926 Views
Registered: ‎06-04-2018

Hi @y.lee0320 ,

As we have option of partial unrolling with factor = x

#pragma HLS UNROLL factor=1

Can you try providing factor=1 in your case to see if the  unrolling happens only 1 level(means no unrolling done).

Regards,
Vishnu
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

Highlighted
Xilinx Employee
Xilinx Employee
903 Views
Registered: ‎09-05-2018

Hey @y.lee0320 ,

HLS only automatically unrolls loops as stated in your tip if you have added config_unroll to your configuration settings. You may have to remove this option from your configuration settings.

The other possibility is that you have a PIPELINE directive above this loop in your code. That would force the lower level loops to unroll as well.

Nicholas Moellers

Xilinx Worldwide Technical Support

View solution in original post

Highlighted
Advisor
Advisor
880 Views
Registered: ‎04-26-2015

@y.lee0320  Anything with a fully partitioned array tends to occupy a lot of resources in HLS. If the loop is unrolled then you end up with lots of copies of the loop hardware. If the loop isn't unrolled then HLS has to build a massive multiplexer to get access to the data. A better approach might be to check whether you can avoid partitioning the array, or turn it into a shift register.

0 Kudos
Highlighted
Observer
Observer
846 Views
Registered: ‎01-02-2019

Yes, I tried the unroll factor directive, it's no use. I think the main reason is because I have a PIPELINE II=1 directive above this loop. The only way to avoid unroll is to relieve the II to above 2.

0 Kudos
Highlighted
Observer
Observer
259 Views
Registered: ‎07-31-2019

In UG902 config_unroll command is mentioned, but there is no example given on how to insert this command into the code or into SDx project settings or script, etc. Could someone show this example, please?

Thank you.

0 Kudos
Highlighted
Observer
Observer
180 Views
Registered: ‎07-31-2019

Hi All,

by moving code into the "native" vivado_hls project I set config_unroll with small thresholds like 8 or 4. However, this command was ignored and the result is still the same = fully unrolled loop.

Is there a solution to this issue or is there an active SR? There should be a way to fully disable the automatic unrolling and control it with partial unroll pragmas (e.g. based on the memory replication factors multiplied by 2 in dual-port RAM implementation). I would appreciate your input.

avv

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
154 Views
Registered: ‎09-04-2017

@avv  I would start by removing all the pragmas. PIPELINE, ARRAY_PARTITION etc. and see the implementation.

If you still see it unrolled, then please share the project.

Thanks,

Nithin

0 Kudos
Highlighted
Observer
Observer
128 Views
Registered: ‎07-31-2019

Hi nithink,

Thank you for your reply. Indeed removing all pragmas and memory partitioning will eliminate automatic unrolling, but it will also eliminate performance gains made by applying PIPELINE and ARRAY_PARTITION to the rest of the code. I have two other functions that use these pragmas for high performance. The third one is just one level loop with math functions when fully unrolled consumes way too much DSPs and logic to fit into U200.

In more complex than a single loop nest or a single function code user should be able to turn on and off such features as automatic loop unrolling per each function or per each loop nest or better yet for each loop. If you believe there is a solution that I am describing I will try to come up with the test project w/o proprietary code to share.

avv

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
126 Views
Registered: ‎09-04-2017

@avv   what you can do is selectively apply the pragma's and see which one is causing the unroll.

Ex: Applying PIPELINE on a function which has loops will automatically unroll the loop. There is no way we can stop this because, to meet the pipelining criteria, we will have to operate in parallel on the inner loops.

Once you identify which one is causing this, you can see an alternative way to do the same thing.

Hope this helps.

Thanks,

Nithin

0 Kudos
Highlighted
Observer
Observer
116 Views
Registered: ‎07-31-2019

Hi nithink,

Thank you for your quick reply. I think I mentioned that all 3 functions are in the main iteration loop and unless I pipeline the main loop I will not get high performance for the first 2 functions. But it also leads to a full unroll of the 3-rd function's loop and huge resource utilization.

I think I will have to re-structure our code into separate kernels with pipes among them to avoid this. let me know if you can think of a better alternative

Thank you.

avv

 

0 Kudos