cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
sneutrino
Visitor
Visitor
810 Views
Registered: ‎08-04-2014

Pipelining array access with variable offsets

Dear all,

While trying to understand another issue (pipelining of nested loops), we've tried to simplfy and examine the pipelining of array access with a variable offset within a single loop.  This is analagous to what is happening in the unrolled middle loop in the link above.  We're trying to get the loop to pipeline with an II of 1. 

The simplified code contains 8  array writes to unique locations, and the output array is completely partitioned.  We've tried to indicate false dependencies for the relevant variables.

void singleloop( const unsigned (&in1)[256],
                 unsigned (&out)[256] ) {

#pragma HLS ARRAY_PARTITION variable=in1 complete dim=0
#pragma HLS ARRAY_PARTITION variable=out complete dim=0
#pragma HLS DEPENDENCE variable=out inter false
#pragma HLS DEPENDENCE variable=out intra false

 loop1 : for( unsigned int i=0; i<3; i++ ) {

#pragma HLS PIPELINE //rewind // II=1 succeeds                                                                                                                 

    unsigned val = in1[i];

    //const unsigned int offset = 0; // succeeds with II=1                                                                                                     
    //const unsigned int offset = 1; // succeeds with II=1                                                                                                     

    //const unsigned int offset = i; // fails with II=1, II moved to 3                                                                                         
    //const unsigned int offset = i*2; // fails with II=1, II moved to 2                                                                                       
    //const unsigned int offset = i*4; // fails with II=1, II moved to 2                                                                                       

    const unsigned int offset = i*8; // succeeds with II=1                                                                                                     

#pragma HLS DEPENDENCE variable=offset inter false
#pragma HLS DEPENDENCE variable=offset intra false

      out[0+offset] = val*val+0;
      out[1+offset] = val*val+1;
      out[2+offset] = val*val+2;
      out[3+offset] = val*val+3;
      out[4+offset] = val*val+4;
      out[5+offset] = val*val+5;
      out[6+offset] = val*val+6;
      out[7+offset] = val*val+7;

#pragma HLS LATENCY min=4

  }
}

When 'offset' is either a constant or a multiple of i larger than or equal to 8, pipelining succeeds with II=1.  If the multiplier is <8, II is increased.  We've tried different numbers of writes and multipliers of i, but it seems II=1 can only be achieved if i >= the number of writes contained in the loop.  Basically, it seems that HLS wants to ensure that accesses in subsequent iterations of the loop will not partially overlap with each other.  A complete overlap (i=0) seems to be fine though and II=1 can be achieved.

Could someone shed some light on this behavior?  Or, even better, suggest a workaround to allow overlapping writes on consecutive iterations with II=1?

Thanks, Kristian

      

 

 

 

0 Kudos
2 Replies
sneutrino
Visitor
Visitor
801 Views
Registered: ‎08-04-2014

"A complete overlap (i=0) seems to be fine ...", sorry, I meant to say offset=0 here, not i=0.
0 Kudos
sneutrino
Visitor
Visitor
738 Views
Registered: ‎08-04-2014

I'm starting to suspect that this behavior may just be reflecting the limits of the HLS schdeuling algorithm.  This suspicion stems from a reading of doi:10.1109/TCAD.2017.2783363 (and references) and doi:10.1145/3020078.3021754 (and references).  This statement from the latter:

Conventional HLS pipelining typically leverages modulo
scheduling, a compile-time optimization which creates a static
schedule for a single loop iteration that can be repeated at a fixed ini-
tiation interval (II).

and the example given in Listing 2 of the former would seem to be relevant. The loop in my example has a similar dependence on the induction variable as in Listing 2, which I suppose leads to the sort "nonuniform" or "irregular" access patterns the papers descibe. 

Perhaps this is what's going on with Vivado  HLS?  I'd very much appreciate any insight!

Thanks, Kristian

0 Kudos