cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
daniel.cogan
Explorer
Explorer
376 Views
Registered: ‎07-30-2013

It won't parallelize

I'm having trouble getting the tool to make three instances of HW.  It appears to be sharing the function resource no matter what I do.

I have a function I'd like to replicate because I need to save the latency, and they should be independent from each other.

The code is something like:

int in10[SAMPLES], in20[SAMPLES];

int in11[SAMPLES], in21[SAMPLES];

int in12[SAMPLES], in22[SAMPLES];

{
rms1 = root_mean_square<1, SAMPLES>(in10, in20);
rms2 = root_mean_square<2, SAMPLES>(in11, in21);
rms3 = root_mean_square<3, SAMPLES>(in12, in22);
}

 

I've tried using an unused templated int to differentiate the 3 of them.  I've tried pragma HLS latency (and set it to the latency of just one of them).  I've tried inlining and not inlining.  I've tried pragma HLS allocation with limit set to 3.  I've tried pragma HLS dataflow.

Within the console/report, looking at notes/warnings, it does seem like it goes through the analysis 3 times (and gives me some II Violations) for each.  But still seems from the latency summary that it is serially doing all 3.  If I comment out 2 of them, the overall latency is ~28us less.   Any ideas? What am I missing?  

Capture.JPG

0 Kudos
2 Replies
blank
Visitor
Visitor
288 Views
Registered: ‎10-30-2020

Maybe you should look at the utilization report. If you got 3 instances of this function, then it's parallelized. For the II violation, the array is synthesised as RAMs probably. When you read it multiple times, these reads are serial because it has at most two read ports

0 Kudos
u4223374
Advisor
Advisor
273 Views
Registered: ‎04-26-2015

@daniel.cogan 

Do they all rely on some other function or a global variable? Or are they all using floating-point, which may cause HLS to implement them with a single floating-point core?

0 Kudos