UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
640 Views
Registered: ‎10-06-2017

pipelining changes the loop latency and loop II

we are trying to synthesize a function func() (code is given below for reference) which contains a loop with count of 16 and we have also given tripcount of 16. The loop is pipelined and II = 1 is given in pragma directives.
Now if we synthesize this without "#pragma HLS pipeline II=1" directive for the function (please see below), it gives the latency of 22 clocks for function and II of 1 for the loop (loop latency of 17). It also says target II is 1 and achieved is 1. However, if we include "#pragma HLS pipeline II=1" directive in the function, the function latency changes to 37, loop latency is 32 and it says loop is not pipelined. (it does not show target and achieved values of II, they are simply blank).

We do not understand this behaviour.

Can someone explain why pipelining the function changes the loop latency and the loop II .

 

/////////////////////////////////////

////////////////func /////////////////

#define LOOP_COUNT 16
void func( descriptor_nrm nrm_desc[NRM_DESC_SZ],int20 new_nrm_desc_indx_in, int1 buy_flag,buy_struct buy_ds[BUY_DS_SZ], sell_struct sell_ds[SELL_DS_SZ],int17 id,int32 n_price,descriptor_dmy dmy_ds[DMY_DS_SZ], int2 *insert_loc_code, int14 *dmy_out )
{
#pragma HLS pipeline II=1
#pragma HLS inline off
#pragma HLS LATENCY min=1 max=16
#pragma HLS interface bram port=nrm_desc
//#pragma HLS interface bram port=buy_ds
//#pragma HLS interface bram port=sell_ds
#pragma HLS interface bram port=dmy_ds

int32 curr_price;
int16 curr_dmy;
printf("\n Inside func ...");

int ret = 0;
buy_struct b_tmp;
sell_struct s_tmp;
descriptor_dmy d_tmp;

if(buy_flag == BUY)
{

b_tmp = buy_ds[id];
d_tmp = dmy_ds[b_tmp.dummy_head_index];
curr_price = d_tmp.price;
curr_dmy = b_tmp.dummy_head_index;
}
else if (buy_flag == SELL)

{
s_tmp= sell_ds[id];
d_tmp = dmy_ds[s_tmp.dummy_head_index];
curr_price = d_tmp.price;
curr_dmy = s_tmp.dummy_head_index;
}

if(buy_flag == BUY)
{

   {
    for(int i=0; (i<LOOP_COUNT) && (ret ==0);i++){

#pragma HLS pipeline II=1
#pragma HLS loop_tripcount min=16 max=16


if((d_tmp.price > n_price)) // greater than
{
            if(d_tmp.Desc_nxt_vld == 0)

  {
*dmy_out  =  curr_dmy;
*insert_loc_code = RIGHT_INSERT;
ret = 1;
}
}
else if (d_tmp.price == n_price)//equal to
{
*dmy_out  =  curr_dmy;
*insert_loc_code = CURRENT_INSERT;
ret = 1;
}
else if(d_tmp.price < n_price) // less than
{
*dmy_out  =  curr_dmy;
*insert_loc_code = LEFT_INSERT;
ret = 1;
}
curr_dmy = d_tmp.Desc_nxt;
d_tmp = dmy_ds[curr_dmy];

}//end for

 if(ret == 1) return;
   }
} // end of if (buy_flag == BUY)
else if (buy_flag == SELL)
{

{
  for(int i=0;(i<LOOP_COUNT)&&(ret==0);i++)
{
#pragma HLS pipeline II=1
#pragma HLS loop_tripcount min=16 max=16
if(d_tmp.price < n_price)
//&& (d_tmp.Desc_nxt_vld == 0)) // less than
{

if(d_tmp.Desc_nxt_vld == 0)
           {
*dmy_out  =  curr_dmy;
*insert_loc_code = RIGHT_INSERT;
ret = 1;
}
    }

else if (d_tmp.price == n_price)//equal to
{
*dmy_out  =  curr_dmy;
*insert_loc_code = CURRENT_INSERT;
ret = 1;
}

else if(d_tmp.price > n_price) // greater than
{
*dmy_out  =  curr_dmy;
*insert_loc_code = LEFT_INSERT;
ret = 1;
}

curr_dmy = d_tmp.Desc_nxt;
d_tmp = dmy_ds[curr_dmy];



}//end for loop

if(ret == 1) return;

}


}// end if SELL

} // end of function

 

////////////////////////////////////

0 Kudos
1 Reply
Scholar u4223374
Scholar
635 Views
Registered: ‎04-26-2015

Re: pipelining changes the loop latency and loop II

If you ask HLS to pipeline something (loop, function, etc) then the first thing it does is flatten everything within that region. All functions get inlined, all loops get unrolled, etc. Obviously this strips off any pipeline pragmas applied to those loops or functions. This works very well for single loops, it can work well for carefully-organized nested loops, but when applied to a complex function it generally just makes a mess.

 

I expect that in this case, HLS has tried to pipeline the whole function (as you requested), stripped away all the pipeline pragmas for the loops, and then realised that it can't actually pipeline the function (I don't know why, but it's a pretty complex function to pipeline - or your pragma preventing inlining might have caused it). As a result it's gone back to its default processing, but now the loops are missing their pipeline directives, so everything gets much slower.

 

 

0 Kudos