UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor po092000
Visitor
4,524 Views
Registered: ‎07-24-2016

unrolling loop vs manual unrolling loop

HI i have a qeusetion about unrolling.

 

first code is just pipelined code. and second code is manually unrolled code.

compare two codes, i think manual unroll is good to utilize dsp core, and i will get a good performance.

but result is reverse.

 

second code is this.

 

 for (row = 0; row < 8; row++)
 {
  for (col = 0; col < 8; col++)
  {
   for (feature = 0; feature<25; feature++)
   {
    #pragma HLS PIPELINE II=1

    __temp_79=0;
    // 0, 8 16,
    for(depth = 0 ; depth < 20; depth=depth+10)  
    {

     //temp += _src[depth][(row+row_f)*12+(col+col_f)] * _convolution_filer[feature][depth][row_f*5+col_f];
     //temp += _src[depth][(row+row_f)*12+(col+col_f)] * convolution_filer[feature * 25 * 20 + depth * 25 + row_f * 5 + col_f];

     temp_0 = _src[depth][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + depth * 25 + 0 * 5 + 0] + _src[depth][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + depth * 25 + 0 * 5 + 1];
     temp_1 = _src[depth][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + depth * 25 + 0 * 5 + 2] + _src[depth][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + depth * 25 + 0 * 5 + 3];
     temp_2 = _src[depth][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + depth * 25 + 0 * 5 + 4] + temp_0;
     temp_3 = temp_2 + temp_1;

     temp_4 = _src[depth][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + depth * 25 + 1 * 5 + 0] + _src[depth][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + depth * 25 + 1 * 5 + 1];
     temp_5 = _src[depth][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + depth * 25 + 1 * 5 + 2] + _src[depth][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + depth * 25 + 1 * 5 + 3];
     temp_6 = _src[depth][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + depth * 25 + 1 * 5 + 4] + temp_3 + temp_4;
     temp_7 = temp_5 + temp_6;

     temp_8 = _src[depth][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + depth * 25 + 2 * 5 + 0] + _src[depth][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + depth * 25 + 2 * 5 + 1];
     temp_9 = _src[depth][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + depth * 25 + 2 * 5 + 2] + _src[depth][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + depth * 25 + 2 * 5 + 3];
     temp_10 = _src[depth][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + depth * 25 + 2 * 5 + 4] + temp_7 + temp_8;
     temp_11 = temp_9 + temp_10;

     temp_12 = _src[depth][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + depth * 25 + 3 * 5 + 0] + _src[depth][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + depth * 25 + 3 * 5 + 1];
     temp_13 = _src[depth][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + depth * 25 + 3 * 5 + 2] + _src[depth][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + depth * 25 + 3 * 5 + 3];
     temp_14 = _src[depth][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + depth * 25 + 3 * 5 + 4] + temp_11 + temp_12;
     temp_15 = temp_13 + temp_14;

     temp_16 = _src[depth][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + depth * 25 + 4 * 5 + 0] + _src[depth][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + depth * 25 + 4 * 5 + 1];
     temp_17 = _src[depth][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + depth * 25 + 4 * 5 + 2] + _src[depth][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + depth * 25 + 4 * 5 + 3];
     temp_18 = _src[depth][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + depth * 25 + 4 * 5 + 4] + temp_16 + temp_15;
     temp_19 = temp_17 + temp_18;

     // Phase 2
     temp_20 = _src[depth+1][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 0 * 5 + 0] + _src[depth+1][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 0 * 5 + 1];
     temp_21 = _src[depth+1][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 0 * 5 + 2] + _src[depth+1][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 0 * 5 + 3];
     temp_22 = _src[depth+1][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 0 * 5 + 4] + temp_19 + temp_20;
     temp_23 = temp_22 + temp_21;

     temp_24 = _src[depth+1][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 1 * 5 + 0] + _src[depth+1][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 1 * 5 + 1];
     temp_25 = _src[depth+1][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 1 * 5 + 2] + _src[depth+1][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 1 * 5 + 3];
     temp_26 = _src[depth+1][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth +1)* 25 + 1 * 5 + 4] + temp_23 + temp_24;
     temp_27 = temp_25 + temp_26;

     temp_28 = _src[depth+1][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 2 * 5 + 0] + _src[depth+1][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 2 * 5 + 1];
     temp_29 = _src[depth+1][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth +1)* 25 + 2 * 5 + 2] + _src[depth+1][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 2 * 5 + 3];
     temp_30 = _src[depth+1][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 2 * 5 + 4] + temp_27 + temp_28;
     temp_31 = temp_29 + temp_30;

     temp_32 = _src[depth+1][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 3 * 5 + 0] + _src[depth+1][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 3 * 5 + 1];
     temp_33 = _src[depth+1][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 3 * 5 + 2] + _src[depth+1][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 3 * 5 + 3];
     temp_34 = _src[depth+1][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 3 * 5 + 4] + temp_31 + temp_32;
     temp_35 = temp_32 + temp_31;

     temp_36 = _src[depth+1][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 4 * 5 + 0] + _src[depth+1][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 4 * 5 + 1];
     temp_37 = _src[depth+1][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 4 * 5 + 2] + _src[depth+1][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+1)* 25 + 4 * 5 + 3];
     temp_38 = _src[depth+1][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+1) * 25 + 4 * 5 + 4] + temp_36 + temp_35;
     temp_39 = temp_37 + temp_38;

     // Phase 3
     temp_40 = _src[depth+2][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 0 * 5 + 0] + _src[depth+2][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 0 * 5 + 1];
     temp_41 = _src[depth+2][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 0 * 5 + 2] + _src[depth+2][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 0 * 5 + 3];
     temp_42 = _src[depth+2][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 0 * 5 + 4] + temp_39 + temp_40;
     temp_43 = temp_42 + temp_41;

     temp_44 = _src[depth+2][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 1 * 5 + 0] + _src[depth+2][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 1 * 5 + 1];
     temp_45 = _src[depth+2][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 1 * 5 + 2] + _src[depth+2][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 1 * 5 + 3];
     temp_46 = _src[depth+2][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+2)* 25 + 1 * 5 + 4] + temp_43 + temp_44;
     temp_47 = temp_45 + temp_46;

     temp_48 = _src[depth+2][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 2 * 5 + 0] + _src[depth+2][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 2 * 5 + 1];
     temp_49 = _src[depth+2][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+2)* 25 + 2 * 5 + 2]  + _src[depth+2][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 2 * 5 + 3];
     temp_50 = _src[depth+2][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 2 * 5 + 4] + temp_47 + temp_48;
     temp_51 = temp_49 + temp_50;

     temp_52 = _src[depth+2][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 3 * 5 + 0] + _src[depth+2][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 3 * 5 + 1];
     temp_53 = _src[depth+2][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 3 * 5 + 2] + _src[depth+2][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 3 * 5 + 3];
     temp_54 = _src[depth+2][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 3 * 5 + 4] + temp_51 + temp_52;
     temp_55 = temp_53 + temp_54;

     temp_56 = _src[depth+2][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 4 * 5 + 0] + _src[depth+2][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 4 * 5 + 1];
     temp_57 = _src[depth+2][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 4 * 5 + 2] + _src[depth+2][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 4 * 5 + 3];
     temp_58 = _src[depth+2][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+2) * 25 + 4 * 5 + 4] + temp_56 + temp_55;
     temp_59 = temp_57 + temp_58;

     // Phase 4
     temp_60 = _src[depth+3][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 0 * 5 + 0] + _src[depth+3][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 0 * 5 + 1];
     temp_61 = _src[depth+3][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 0 * 5 + 2] + _src[depth+3][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 0 * 5 + 3];
     temp_62 = _src[depth+3][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 0 * 5 + 4] + temp_59 + temp_60;
     temp_63 = temp_62 + temp_61;

     temp_64 = _src[depth+3][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 1 * 5 + 0] + _src[depth+3][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 1 * 5 + 1];
     temp_65 = _src[depth+3][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 1 * 5 + 2] + _src[depth+3][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 1 * 5 + 3];
     temp_66 = _src[depth+3][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+3)* 25 + 1 * 5 + 4] + temp_63 + temp_64;
     temp_67 = temp_65 + temp_66;

     temp_68 = _src[depth+3][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 2 * 5 + 0] + _src[depth+3][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 2 * 5 + 1];
     temp_69 = _src[depth+3][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+3)* 25 + 2 * 5 + 2]  + _src[depth+3][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 2 * 5 + 3];
     temp_70 = _src[depth+3][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 2 * 5 + 4] + temp_67 + temp_68;
     temp_71 = temp_49 + temp_50;

     temp_72 = _src[depth+3][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 3 * 5 + 0] + _src[depth+3][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 3 * 5 + 1];
     temp_73 = _src[depth+3][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 3 * 5 + 2] + _src[depth+3][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 3 * 5 + 3];
     temp_74 = _src[depth+3][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 3 * 5 + 4] + temp_71 + temp_72;
     temp_75 = temp_73 + temp_74;

     temp_76 = _src[depth+3][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 4 * 5 + 0] + _src[depth+3][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 4 * 5 + 1];
     temp_77 = _src[depth+3][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 4 * 5 + 2] + _src[depth+3][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 4 * 5 + 3];
     temp_78 = _src[depth+3][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+3) * 25 + 4 * 5 + 4] + temp_76 + temp_75;
     temp_79 = temp_77 + temp_78;

     // Phase by 4
     // depth +4
     _temp_0 = _src[depth+4][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 0 * 5 + 0] + _src[depth+4][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 0 * 5 + 1];
     _temp_1 = _src[depth+4][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 0 * 5 + 2] + _src[depth+4][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 0 * 5 + 3];
     _temp_2 = _src[depth+4][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 0 * 5 + 4] + temp_79 +_temp_0 ;
     _temp_3 = _temp_2 + _temp_1;

     _temp_4 = _src[depth+4][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 1 * 5 + 0] + _src[depth+4][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 1 * 5 + 1];
     _temp_5 = _src[depth+4][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 1 * 5 + 2] + _src[depth+4][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 1 * 5 + 3];
     _temp_6 = _src[depth+4][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 1 * 5 + 4] + _temp_3 + _temp_4;
     _temp_7 = _temp_5 + _temp_6;

     _temp_8 = _src[depth+4][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 2 * 5 + 0] + _src[depth+4][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 2 * 5 + 1];
     _temp_9 = _src[depth+4][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 2 * 5 + 2] + _src[depth+4][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 2 * 5 + 3];
     _temp_10 = _src[depth+4][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 2 * 5 + 4] + _temp_7 + _temp_8;
     _temp_11 = _temp_9 + _temp_10;

     _temp_12 = _src[depth+4][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 3 * 5 + 0] + _src[depth+4][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 3 * 5 + 1];
     _temp_13 = _src[depth+4][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 3 * 5 + 2] + _src[depth+4][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 3 * 5 + 3];
     _temp_14 = _src[depth+4][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 3 * 5 + 4] + _temp_11 + _temp_12;
     _temp_15 = _temp_13 + _temp_14;

     _temp_16 = _src[depth+4][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 4 * 5 + 0] + _src[depth+4][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 4 * 5 + 1];
     _temp_17 = _src[depth+4][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 4 * 5 + 2] + _src[depth+4][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 4 * 5 + 3];
     _temp_18 = _src[depth+4][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+4) * 25 + 4 * 5 + 4] + _temp_16 + _temp_15;
     _temp_19 = _temp_17 + temp_18;

     // Phase 2
     _temp_20 = _src[depth+5][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 0 * 5 + 0] + _src[depth+5][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 0 * 5 + 1];
     _temp_21 = _src[depth+5][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 0 * 5 + 2] + _src[depth+5][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 0 * 5 + 3];
     _temp_22 = _src[depth+5][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 0 * 5 + 4] + _temp_19 + _temp_20;
     _temp_23 = temp_22 + temp_21;

     _temp_24 = _src[depth+5][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 1 * 5 + 0] + _src[depth+5][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 1 * 5 + 1];
     _temp_25 = _src[depth+5][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 1 * 5 + 2] + _src[depth+5][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 1 * 5 + 3];
     _temp_26 = _src[depth+5][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+5)* 25 + 1 * 5 + 4] + _temp_23 + _temp_24;
     _temp_27 = _temp_25 + _temp_26;

     _temp_28 = _src[depth+5][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 2 * 5 + 0] + _src[depth+5][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 2 * 5 + 1];
     _temp_29 = _src[depth+5][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+5)* 25 + 2 * 5 + 2] + _src[depth+5][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 2 * 5 + 3];
     _temp_30 = _src[depth+5][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 2 * 5 + 4] + _temp_27 + _temp_28;
     _temp_31 = _temp_29 + _temp_30;

     _temp_32 = _src[depth+5][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 3 * 5 + 0] + _src[depth+5][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 3 * 5 + 1];
     _temp_33 = _src[depth+5][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 3 * 5 + 2] + _src[depth+5][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 3 * 5 + 3];
     _temp_34 = _src[depth+5][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 3 * 5 + 4] + _temp_31 + _temp_32;
     _temp_35 = _temp_32 + _temp_31;

     _temp_36 = _src[depth+5][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 4 * 5 + 0] + _src[depth+5][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 4 * 5 + 1];
     _temp_37 = _src[depth+5][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 4 * 5 + 2] + _src[depth+5][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+5)* 25 + 4 * 5 + 3];
     _temp_38 = _src[depth+5][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+5) * 25 + 4 * 5 + 4] + _temp_36 + _temp_35;
     _temp_39 = _temp_37 + _temp_38;


     // phase
     _temp_40 = _src[depth+6][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 0 * 5 + 0] + _src[depth+6][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 0 * 5 + 1];
     _temp_41 = _src[depth+6][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 0 * 5 + 2] + _src[depth+6][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 0 * 5 + 3];
     _temp_42 = _src[depth+6][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 0 * 5 + 4] + _temp_39 + _temp_40;
     _temp_43 = _temp_42 + temp_41;

     _temp_44 = _src[depth+6][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 1 * 5 + 0] + _src[depth+6][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 1 * 5 + 1];
     _temp_45 = _src[depth+6][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 1 * 5 + 2] + _src[depth+6][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 1 * 5 + 3];
     _temp_46 = _src[depth+6][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+6)* 25 + 1 * 5 + 4] + _temp_43 + _temp_44;
     _temp_47 = _temp_45 + _temp_46;

     _temp_48 = _src[depth+6][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 2 * 5 + 0] + _src[depth+6][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 2 * 5 + 1];
     _temp_49 = _src[depth+6][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+6)* 25 + 2 * 5 + 2]  + _src[depth+6][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 2 * 5 + 3];
     _temp_50 = _src[depth+6][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 2 * 5 + 4] + _temp_47 + _temp_48;
     _temp_51 = _temp_49 + _temp_50;

     _temp_52 = _src[depth+6][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 3 * 5 + 0] + _src[depth+6][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 3 * 5 + 1];
     _temp_53 = _src[depth+6][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 3 * 5 + 2] + _src[depth+6][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 3 * 5 + 3];
     _temp_54 = _src[depth+6][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 3 * 5 + 4] + _temp_51 + _temp_52;
     _temp_55 = _temp_53 + _temp_54;

     _temp_56 = _src[depth+6][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 4 * 5 + 0] + _src[depth+6][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 4 * 5 + 1];
     _temp_57 = _src[depth+6][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 4 * 5 + 2] + _src[depth+6][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 4 * 5 + 3];
     _temp_58 = _src[depth+6][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+6) * 25 + 4 * 5 + 4] + _temp_56 + _temp_55;
     _temp_59 = _temp_57 + _temp_58;

     // phase
     _temp_60 = _src[depth+7][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 0 * 5 + 0] + _src[depth+7][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 0 * 5 + 1];
     _temp_61 = _src[depth+7][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 0 * 5 + 2] + _src[depth+7][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 0 * 5 + 3];
     _temp_62 = _src[depth+7][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 0 * 5 + 4] + _temp_59 + _temp_60;
     _temp_63 = _temp_62 + _temp_61;

     _temp_64 = _src[depth+7][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 1 * 5 + 0] + _src[depth+7][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 1 * 5 + 1];
     _temp_65 = _src[depth+7][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 1 * 5 + 2] + _src[depth+7][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 1 * 5 + 3];
     _temp_66 = _src[depth+7][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+7)* 25 + 1 * 5 + 4] + _temp_63 + _temp_64;
     _temp_67 = _temp_65 + _temp_66;

     _temp_68 = _src[depth+7][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 2 * 5 + 0] + _src[depth+7][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 2 * 5 + 1];
     _temp_69 = _src[depth+7][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+7)* 25 + 2 * 5 + 2]  + _src[depth+7][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 2 * 5 + 3];
     _temp_70 = _src[depth+7][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 2 * 5 + 4] + _temp_67 + _temp_68;
     _temp_71 = _temp_69 + _temp_70;

     _temp_72 = _src[depth+7][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 3 * 5 + 0] + _src[depth+7][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 3 * 5 + 1];
     _temp_73 = _src[depth+7][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 3 * 5 + 2] + _src[depth+7][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 3 * 5 + 3];
     _temp_74 = _src[depth+7][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 3 * 5 + 4] + _temp_71 + _temp_72;
     _temp_75 = _temp_73 + _temp_74;

     _temp_76 = _src[depth+7][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 4 * 5 + 0] + _src[depth+7][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 4 * 5 + 1];
     _temp_77 = _src[depth+7][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 4 * 5 + 2] + _src[depth+7][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 4 * 5 + 3];
     _temp_78 = _src[depth+7][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+7) * 25 + 4 * 5 + 4] + _temp_76 + _temp_75;
     _temp_79 = _temp_77 + _temp_78;

     // phase 80~100
     __temp_40 = _src[depth+8][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 0 * 5 + 0] + _src[depth+8][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 0 * 5 + 1];
     __temp_41 = _src[depth+8][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 0 * 5 + 2] + _src[depth+8][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 0 * 5 + 3];
     __temp_42 = _src[depth+8][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 0 * 5 + 4] + _temp_79 + __temp_40;
     __temp_43 = __temp_42 + _temp_41;

     __temp_44 = _src[depth+8][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 1 * 5 + 0] + _src[depth+8][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 1 * 5 + 1];
     __temp_45 = _src[depth+8][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 1 * 5 + 2] + _src[depth+8][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 1 * 5 + 3];
     __temp_46 = _src[depth+8][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+8)* 25 + 1 * 5 + 4] + __temp_43 + __temp_44;
     __temp_47 = __temp_45 + __temp_46;

     __temp_48 = _src[depth+8][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 2 * 5 + 0] + _src[depth+8][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 2 * 5 + 1];
     __temp_49 = _src[depth+8][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+8)* 25 + 2 * 5 + 2]  + _src[depth+8][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 2 * 5 + 3];
     __temp_50 = _src[depth+8][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 2 * 5 + 4] + __temp_47 + __temp_48;
     __temp_51 = __temp_49 + __temp_50;

     __temp_52 = _src[depth+8][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 3 * 5 + 0] + _src[depth+8][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 3 * 5 + 1];
     __temp_53 = _src[depth+8][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 3 * 5 + 2] + _src[depth+8][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 3 * 5 + 3];
     __temp_54 = _src[depth+8][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 3 * 5 + 4] + __temp_51 + __temp_52;
     __temp_55 = __temp_53 + __temp_54;

     __temp_56 = _src[depth+8][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 4 * 5 + 0] + _src[depth+8][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 4 * 5 + 1];
     __temp_57 = _src[depth+8][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 4 * 5 + 2] + _src[depth+8][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 4 * 5 + 3];
     __temp_58 = _src[depth+8][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+8) * 25 + 4 * 5 + 4] + __temp_56 + __temp_55;
     __temp_59 = __temp_57 + __temp_58;

     // phase
     __temp_60 = _src[depth+9][(row+0)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 0 * 5 + 0] + _src[depth+9][(row+0)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 0 * 5 + 1];
     __temp_61 = _src[depth+9][(row+0)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 0 * 5 + 2] + _src[depth+9][(row+0)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 0 * 5 + 3];
     __temp_62 = _src[depth+9][(row+0)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 0 * 5 + 4] + __temp_59 + __temp_60;
     __temp_63 = __temp_62 + __temp_61;

     __temp_64 = _src[depth+9][(row+1)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 1 * 5 + 0] + _src[depth+9][(row+1)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 1 * 5 + 1];
     __temp_65 = _src[depth+9][(row+1)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 1 * 5 + 2] + _src[depth+9][(row+1)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 1 * 5 + 3];
     __temp_66 = _src[depth+9][(row+1)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+9)* 25 + 1 * 5 + 4] + __temp_63 + __temp_64;
     __temp_67 = __temp_65 + __temp_66;

     __temp_68 = _src[depth+9][(row+2)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 2 * 5 + 0] + _src[depth+9][(row+2)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 2 * 5 + 1];
     __temp_69 = _src[depth+9][(row+2)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+9)* 25 + 2 * 5 + 2]  + _src[depth+9][(row+2)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 2 * 5 + 3];
     __temp_70 = _src[depth+9][(row+2)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 2 * 5 + 4] + _temp_67 + __temp_68;
     __temp_71 = __temp_49 + __temp_50;

     __temp_72 = _src[depth+9][(row+3)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 3 * 5 + 0] + _src[depth+9][(row+3)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 3 * 5 + 1];
     __temp_73 = _src[depth+9][(row+3)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 3 * 5 + 2] + _src[depth+9][(row+3)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 3 * 5 + 3];
     __temp_74 = _src[depth+9][(row+3)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 3 * 5 + 4] + __temp_71 + __temp_72;
     __temp_75 = __temp_73 + __temp_74;

     __temp_76 = _src[depth+9][(row+4)*12+(col+0)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 4 * 5 + 0] + _src[depth+9][(row+4)*12+(col+1)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 4 * 5 + 1];
     __temp_77 = _src[depth+9][(row+4)*12+(col+2)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 4 * 5 + 2] + _src[depth+9][(row+4)*12+(col+3)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 4 * 5 + 3];
     __temp_78 = _src[depth+9][(row+4)*12+(col+4)] * convolution_filer[feature * 25 * 20 + (depth+9) * 25 + 4 * 5 + 4] + __temp_76 + __temp_75;
     __temp_79 += __temp_77 + __temp_78;

    }

    dst[feature * 64 + row * 8 + col] = __temp_79;

   }
  }
 }

 

sorry code is very long. anyway the more unroll the loop, the more i get a low utilization of dsp, but i got a good performance.

manually unroll result speeds up 2x and result utilization is almost same, i don't know why this result happens.

i just think that manual unrolling code under pipeline makes fsm state clearly. is it ture ???

please tell me why this result is happened.

normal.png
0 Kudos
1 Reply
Observer wsun
Observer
4,356 Views
Registered: ‎05-26-2016

Re: unrolling loop vs manual unrolling loop

In Vivado HLS, when you put pipeline pragma, all enclosing loops will be completely unrolled. In your case, the "depth" loop, the "row_f" loop and "col_f" loop are all completed unrolled. Also keep in mind that unrolling and/or pipelining does not automatically give your performance. There is one resource limitation you have to consider which is the memory ports. 

0 Kudos