cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
1,561 Views
Registered: ‎08-26-2014

How to increase parallelization when executing this line of code?

Jump to solution

Hello,

 

When I implement this code, the substractions are done sequentially one after the other, not using more hardware even if I add 50 more lines of PWMx:

 

current = current + Ts_L * (Vtot -
			(PWM1==false?0:Vcap1) -
			(PWM2==false?0:Vcap2) -
			(PWM3==false?0:Vcap3) -
			(PWM4==false?0:Vcap4) -
			(PWM5==false?0:Vcap5) -
			R*current);

Does anyone know how can I parallelize this code?

 

Thanks,

 

Cerilet

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Voyager
Voyager
2,123 Views
Registered: ‎06-24-2013

Hey Cerilet,

 

You can create a balanced tree where nodes with identical depth can execute at the same time ...

temp1 = (PWM1==false?0:Vcap1) + (PWM2==false?0:Vcap2);
temp2 = (PWM3==false?0:Vcap3) + (PWM4==false?0:Vcap4);
temp3 = (PWM5==false?0:Vcap5) + R*current;
temp4 = Vtot - temp3;
temp5 = temp1 + temp2;
current += Ts_L * (temp4 - temp5);

... this will reduce the cycle time required for the subtractions to log(n).

 

Hope this helps,

Herbert

-------------- Yes, I do this for fun!

View solution in original post

3 Replies
Highlighted
Voyager
Voyager
2,124 Views
Registered: ‎06-24-2013

Hey Cerilet,

 

You can create a balanced tree where nodes with identical depth can execute at the same time ...

temp1 = (PWM1==false?0:Vcap1) + (PWM2==false?0:Vcap2);
temp2 = (PWM3==false?0:Vcap3) + (PWM4==false?0:Vcap4);
temp3 = (PWM5==false?0:Vcap5) + R*current;
temp4 = Vtot - temp3;
temp5 = temp1 + temp2;
current += Ts_L * (temp4 - temp5);

... this will reduce the cycle time required for the subtractions to log(n).

 

Hope this helps,

Herbert

-------------- Yes, I do this for fun!

View solution in original post

Highlighted
Explorer
Explorer
1,499 Views
Registered: ‎08-26-2014

Yes @hpoetzl, you are right.

 

I finally implemented an adder tree to easily change the number of operators. Here the code if anyone wants to use it.

 

 

num_cells = 8;
bool PWM[8]; double Vcap[8];
#pragma HLS RESOURCE variable=adder_tree core=RAM_S2P_LUTRAM double adder_tree[3][4]; #pragma HLS RESOURCE variable=adder_tree core=RAM_S2P_LUTRAM // Fully unrolled inner loop inner_loop: for(int i=0;i<num_cells/2;i++) #pragma HLS UNROLL adder_tree[num_ranks-1][i] = ((PWM[i*2])==false?0:(Vcap[i*2])) + ((PWM[i*2+1])==false?0:(Vcap[i*2+1])); // Build adder tree unsigned rank_size = num_ranks; first_loop: for(int adder_tree_rank=num_ranks-2;adder_tree_rank>=0;adder_tree_rank--) { rank_size=(rank_size+1)/2; // rank size // Fixed loop size so it can be unrolled second_loop: for(int jj=0;jj<((num_cells/2+1)/2);jj++) #pragma HLS UNROLL if(jj<rank_size) adder_tree[adder_tree_rank][jj] = adder_tree[adder_tree_rank+1][jj*2] + adder_tree[adder_tree_rank+1][jj*2+1]; }

 

Best regards,

 

Cerilet

 

Highlighted
Advisor
Advisor
1,491 Views
Registered: ‎04-26-2015

You could try something like this:

 

int PWM1_val = (PWM1 ? Vcap1 : 0);
int PWM2_val = (PWM2 ? Vcap2 : 0);
int PWM3_val = (PWM3 ? Vcap3 : 0);
int PWM4_val = (PWM4 ? Vcap4 : 0);
int PWM5_val = (PWM5 ? Vcap5 : 0);

current += Ts_L * (Vtot - PWM1_val - PWM2_val - PWM3_val - PWM4_val - PWM5_val - R*current);

Sometimes HLS just gets confused about required order of operations; this way it should be able to see that the initial calculations are all completely independent, and then the final line can be done in one cycle (assuming appropriate clock speeds).

0 Kudos