UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
1,337 Views
Registered: ‎08-26-2014

How to increase parallelization when executing this line of code?

Jump to solution

Hello,

 

When I implement this code, the substractions are done sequentially one after the other, not using more hardware even if I add 50 more lines of PWMx:

 

current = current + Ts_L * (Vtot -
			(PWM1==false?0:Vcap1) -
			(PWM2==false?0:Vcap2) -
			(PWM3==false?0:Vcap3) -
			(PWM4==false?0:Vcap4) -
			(PWM5==false?0:Vcap5) -
			R*current);

Does anyone know how can I parallelize this code?

 

Thanks,

 

Cerilet

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
Voyager
Voyager
1,899 Views
Registered: ‎06-24-2013

Re: How to increase parallelization when executing this line of code?

Jump to solution

Hey Cerilet,

 

You can create a balanced tree where nodes with identical depth can execute at the same time ...

temp1 = (PWM1==false?0:Vcap1) + (PWM2==false?0:Vcap2);
temp2 = (PWM3==false?0:Vcap3) + (PWM4==false?0:Vcap4);
temp3 = (PWM5==false?0:Vcap5) + R*current;
temp4 = Vtot - temp3;
temp5 = temp1 + temp2;
current += Ts_L * (temp4 - temp5);

... this will reduce the cycle time required for the subtractions to log(n).

 

Hope this helps,

Herbert

-------------- Yes, I do this for fun!

View solution in original post

3 Replies
Voyager
Voyager
1,900 Views
Registered: ‎06-24-2013

Re: How to increase parallelization when executing this line of code?

Jump to solution

Hey Cerilet,

 

You can create a balanced tree where nodes with identical depth can execute at the same time ...

temp1 = (PWM1==false?0:Vcap1) + (PWM2==false?0:Vcap2);
temp2 = (PWM3==false?0:Vcap3) + (PWM4==false?0:Vcap4);
temp3 = (PWM5==false?0:Vcap5) + R*current;
temp4 = Vtot - temp3;
temp5 = temp1 + temp2;
current += Ts_L * (temp4 - temp5);

... this will reduce the cycle time required for the subtractions to log(n).

 

Hope this helps,

Herbert

-------------- Yes, I do this for fun!

View solution in original post

Explorer
Explorer
1,275 Views
Registered: ‎08-26-2014

Re: How to increase parallelization when executing this line of code?

Jump to solution

Yes @hpoetzl, you are right.

 

I finally implemented an adder tree to easily change the number of operators. Here the code if anyone wants to use it.

 

 

num_cells = 8;
bool PWM[8]; double Vcap[8];
#pragma HLS RESOURCE variable=adder_tree core=RAM_S2P_LUTRAM double adder_tree[3][4]; #pragma HLS RESOURCE variable=adder_tree core=RAM_S2P_LUTRAM // Fully unrolled inner loop inner_loop: for(int i=0;i<num_cells/2;i++) #pragma HLS UNROLL adder_tree[num_ranks-1][i] = ((PWM[i*2])==false?0:(Vcap[i*2])) + ((PWM[i*2+1])==false?0:(Vcap[i*2+1])); // Build adder tree unsigned rank_size = num_ranks; first_loop: for(int adder_tree_rank=num_ranks-2;adder_tree_rank>=0;adder_tree_rank--) { rank_size=(rank_size+1)/2; // rank size // Fixed loop size so it can be unrolled second_loop: for(int jj=0;jj<((num_cells/2+1)/2);jj++) #pragma HLS UNROLL if(jj<rank_size) adder_tree[adder_tree_rank][jj] = adder_tree[adder_tree_rank+1][jj*2] + adder_tree[adder_tree_rank+1][jj*2+1]; }

 

Best regards,

 

Cerilet

 

Scholar u4223374
Scholar
1,267 Views
Registered: ‎04-26-2015

Re: How to increase parallelization when executing this line of code?

Jump to solution

You could try something like this:

 

int PWM1_val = (PWM1 ? Vcap1 : 0);
int PWM2_val = (PWM2 ? Vcap2 : 0);
int PWM3_val = (PWM3 ? Vcap3 : 0);
int PWM4_val = (PWM4 ? Vcap4 : 0);
int PWM5_val = (PWM5 ? Vcap5 : 0);

current += Ts_L * (Vtot - PWM1_val - PWM2_val - PWM3_val - PWM4_val - PWM5_val - R*current);

Sometimes HLS just gets confused about required order of operations; this way it should be able to see that the initial calculations are all completely independent, and then the final line can be done in one cycle (assuming appropriate clock speeds).

0 Kudos