cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
rudolfngan
Visitor
Visitor
217 Views
Registered: ‎05-29-2017

How to regionally disable automatic loop merging optimization of HLS compiler?

Hello,

 

I have created a backend HLS module to offer table lookup service for frontend HLS modules. Frontend and backend can be set to communicate with AXIS 256, 512, or 1024 bit stream connection.

The byte lookup table is rather huge (65536 bytes) and as ROM_nP_LUTRAM type. This enables concurrent lookup of multiple elements in one single clock cycle. The core lookup loop is like:

 

#define SPREAD   256

#define SPREAD_BYTE   (SPREAD /

for(i = 0; i < SPREAD_BYTE; i++) {
#pragma HLS UNROLL
      dstx8b[i] = partTable[srcx8b[i]];
}

 

The HLS synthesizer builds as many ROM reading circuits as the loop limit (e.g. 32 in this code). When SPREAD is set to 1024 the synthesis builds 128 array partitions on the ROM table and it turns out Vivado fails to complete the implementation.

I would like to keep the inter-module stream as 1024 bit but process the lookup by splitting it into two or four batches. Hence using this code:

 

#define SPREAD   1024

#define SPREAD_BYTE   (SPREAD /

for(i = 0; i < SPREAD_BYTE/2; i++) {
#pragma HLS UNROLL
      dstx8b[i] = partTable[srcx8b[i]];
}

for(i = SPREAD_BYTE/2; i < SPREAD_BYTE; i++) {
#pragma HLS UNROLL
      dstx8b[i] = partTable[srcx8b[i]];
}

 

Surprisingly the HLS synthesizer merges the two loops into one and build 128 ROM partitions. Tried adding different types of code in between the two loops and modifying the look and feel of the loops but still the HLS is too clever to optimize and regard it as one single loop of 128 trips.

 

Is it possible to disable the loop merge optimization, so that the HLS outputs two loops processing 64 bytes each?

 

Thanks very much.

 

Rudolf.

 

0 Kudos
0 Replies