UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
2,146 Views
Registered: ‎03-22-2017

ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

I am trying to flatten a relatively big array (1536) into registers:

 

ap_fixed<18,8> data[1536];
#pragma HLS ARRAY_PARTITION variable=data complete dim=0

Vivado HLS pre-synthesis terminates with the following error:

 

ERROR: [XFORM 203-103] Array 'data.V' (...): partitioned elements number (1536) has exeeded the threshold (1024), which may cause long run-time.
ERROR: [HLS 200-70] Pre-synthesis failed.

Is there a solution for this kind of limit/threshold?

 

 

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
Scholar u4223374
Scholar
2,131 Views
Registered: ‎04-26-2015

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

@scampbell It'd be nice to have the option to disable a bunch of these HLS "sanity check" errors. Sometimes an operation that looks like a really bad idea is deliberate because I'm experimenting with designs or trying to fill an FPGA.

 

@gdg In my experience, pushing parallelism too far in HLS (1536 elements at once is definitely too far) just results in a huge and slow block. I would say that HLS really excels at 1 - 30 parallel tasks. Once you get above about 30 the design becomes so large that your clock speed drops dramatically (and/or resource consumption rises dramatically) and it makes more sense to just run a simpler block at a higher clock speed to compensate.

 

 

9 Replies
Moderator
Moderator
2,088 Views
Registered: ‎10-04-2011

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

Hello @gdg,

 

The issue here is that in 2016.x there was no limit to the partitioning. However, what was happening is that due to complete partitioning, the size of the registers and control logic grew very large. So much so that the designs were not meeting timing in the device due to routing. Because of this, a limit was placed in 2017.x for 1024 maximum elements. 

 

In your example, 1536 elements * 18 bits = 27,648 registers. That is a very large number. For example, in the mid-range Zynq device, that is about 10% of the available registers in the device. And, more importantly, partitioning completely implies that you will be operating on 1536 fixed point values simultaneously. Is that truly the case? 

 

We do not have a method right now to turn this off, but are evaluating that for a future release. 

 

OK, hope this helps,
Scott

 

 

 

Explorer
Explorer
2,078 Views
Registered: ‎03-22-2017

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

@scampbell, the number of registers is prohibitive indeed (a lot of multiplexers as well, I assume), but I was trying a first, rough, and greedy synthesis.

 

I may try a little of design-space exploration (modify the code) to workaround the issue.

 

Having that limit removed (or as a parameter) would help though.

 

Thank you!

0 Kudos
Moderator
Moderator
2,072 Views
Registered: ‎10-04-2011

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

Hi @gdg,

 

What about partitioning that array so that BlockRam can still be used?  I think maybe the key here will be how many fixed point "words" you can operate on simultaneously. 

 

Thank you,
Scott

Explorer
Explorer
2,069 Views
Registered: ‎03-22-2017

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

@scampbell, completely removing the BRAM in the design is one of the goals of my experiments. BRAMs have higher access latency of registers (apparently). What do you think?

0 Kudos
Moderator
Moderator
2,060 Views
Registered: ‎10-04-2011

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

Hello @gdg,

 

I am not sure I agree with that. HLS usually configures the Bram to be 1 cycle address, and the next data. And a register would be similar in that new data being written in is available on the output the next cycle. So, latency of an individual "word" I don't think is any different.

 

However, the main limitation of Bram is the number of ports, so the number of simultaneous words that can be read/written is limited by that. With dual-port memory we can expand that to two words at the same time per Bram, but still, it would take several (many) cycles to access them all. This would increase the latency of the entire design then. 

 

You are right too to mention multiplexing in the complete partitioning option if you still wanted an addressable sort of access to the memory. I  do not think this is the best use of partitioning though as you would really want to use it for creating parallel operations on those words not for just changing where those words are stored. 

 

You could place these elements into a LutRam using a resource directive. However, maybe a big advantage of Bram is that it is paired with the DSP-48's. Since many designs typically perform some math operations on the data, the placement/routing convenience of this is very beneficial as compared to the placement and routing of the Registers and Muxes all over the FPGA fabric. 

 

OK, I hope this helps?

Scott

Explorer
Explorer
2,046 Views
Registered: ‎03-22-2017

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

@scampbell,

 

In the context of latency constrained applications, I want to investigate a little more the pros and cons of BRAMs and registers. You most likely are 100% correct and I agree with all of the rest you said and keep it in mind.

 

If I have any significant results BRAM vs registers I will post it.

 

Thank you!

0 Kudos
Xilinx Employee
Xilinx Employee
2,040 Views
Registered: ‎05-06-2008

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

Hello @gdg,

 

I also recommend reviewing array partitioning and array reshaping to get around this limitation.

 

Why do we need an array so large?

 

Thanks,
Chris

0 Kudos
Explorer
Explorer
2,034 Views
Registered: ‎03-22-2017

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

@chrisz, the code is part of a convolution algorithm. Feature patches are stored in arrays and passed down to a pipeline of kernels (matrix multiplication, sum-reduce, etc).

 

This was a day-zero attempt to synthesize the code using only registers.

0 Kudos
Scholar u4223374
Scholar
2,132 Views
Registered: ‎04-26-2015

Re: ARRAY_PARTITION COMPLETE has exceeded the threshold (1024)

Jump to solution

@scampbell It'd be nice to have the option to disable a bunch of these HLS "sanity check" errors. Sometimes an operation that looks like a really bad idea is deliberate because I'm experimenting with designs or trying to fill an FPGA.

 

@gdg In my experience, pushing parallelism too far in HLS (1536 elements at once is definitely too far) just results in a huge and slow block. I would say that HLS really excels at 1 - 30 parallel tasks. Once you get above about 30 the design becomes so large that your clock speed drops dramatically (and/or resource consumption rises dramatically) and it makes more sense to just run a simpler block at a higher clock speed to compensate.