cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
delphicgp
Visitor
Visitor
6,579 Views
Registered: ‎04-30-2014

Why I can only Unroll 2 times?

Jump to solution

Hi all,

 

I'm trying to optimize my HLS solution. The first step I'm tring is to unroll the loop. For example, I have the function like following:

 

void matpro(const float v[15],float d[15]){

char i,i111,i112;

float b_StateVectors[15];

  for (i = 0; i < 15; i++) {

    d[i]=v[i]*b_StateVectors[i];

   }

}

 

After I unrolled the for loop, HSL showed me this result. 

Capture.PNG

 

Capture.PNG

 

I think the loop is not full unrolled (the result looks like unroll by 2 and pipeline the loop). I have tried a lot of ways, but still it wasn't full unrolled. Can anyone tell me what's wrong here? I don't care the resource, I just want to get the maximal throughput (use 15 mutiplier at the same time).

 

Thank you in advance,

Gongpei

0 Kudos
1 Solution

Accepted Solutions
gszakacs
Instructor
Instructor
10,223 Views
Registered: ‎08-14-2007

It's possible that your arrays are being stored in some sort of memory rather than independent registers.  If so, there would be a maximum of 2 read ports to that memory structure limiting the loop unroll factor to 2.  Here's a thread that discusses a similar issue:

 

http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Optimizing-for-loop/m-p/457640

 

-- Gabor

View solution in original post

0 Kudos
4 Replies
gszakacs
Instructor
Instructor
10,224 Views
Registered: ‎08-14-2007

It's possible that your arrays are being stored in some sort of memory rather than independent registers.  If so, there would be a maximum of 2 read ports to that memory structure limiting the loop unroll factor to 2.  Here's a thread that discusses a similar issue:

 

http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Optimizing-for-loop/m-p/457640

 

-- Gabor

View solution in original post

0 Kudos
ywu
Xilinx Employee
Xilinx Employee
6,565 Views
Registered: ‎11-28-2007

Like gszakacs said, what you are seeing is limited by the number of memory ports for the input/output arrays. To get completely parallel logic, you will also need to completely partiaion the two arrays using ARRAY_PARTITION directive.

 

Cheers,
Jim
0 Kudos
delphicgp
Visitor
Visitor
6,550 Views
Registered: ‎04-30-2014
Thank you! You are right.

After I set all my array as complete partition, I get expected unroll result.
0 Kudos
nehagaur
Visitor
Visitor
1,491 Views
Registered: ‎01-16-2018

I am targeting xc7a35tcpg236   Artix device. but after the partitioning, i will not get it complete unrolling

 

0 Kudos