UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor kingshuk1000
Visitor
3,527 Views
Registered: ‎12-27-2016

improve loop initiation interval

Hi,

I am trying to achieve an initiation interval of one in the following loop. But I am not able to reduce the Initiation Interval below 11. How should I fix this?

 

 

#define numFU 100
#define osize 10
#define wsize 32
void datapath(float* data, float* wpacket, int* wsel, int* psel,int* outID,int* out){

int reg_psel[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_psel dim=1

int reg_wsel[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_wsel dim=1

float reg_data[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_data dim=1

float reg_wpacket[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_wpacket dim=1

float ram_po[numFU][osize];
#pragma HLS dependence variable=ram_po intra false
#pragma HLS ARRAY_PARTITION variable=ram_po dim=1
#pragma HLS RESOURCE variable=ram_po core=RAM_T2P_BRAM


float reg_outID[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_outID dim=1

float reg_out[numFU+1];
#pragma HLS ARRAY_PARTITION variable=reg_out dim=1

reg_out[0]=0;
float ram_weights[numFU][wsize];
#pragma HLS ARRAY_PARTITION variable=ram_weights dim=1

float temp[numFU];
#pragma HLS ARRAY_PARTITION variable=temp dim=1


int prev_addr[numFU];
#pragma HLS ARRAY_PARTITION variable=prev_addr dim=1


inf_loop:while(1){
#pragma HLS PIPELINE II=1
reg_data[0] = *data;
reg_wpacket[0] = *wpacket;
reg_psel[0] = *psel;
reg_wsel[0] = *wsel;
int i=0;
//update partial outputs
lbl_update:for (int i=numFU-1;i>=0;i--){
#pragma HLS UNROLL
#pragma HLS dependence variable=ram_po intra false
int po_addr=reg_psel[i];
int w_addr =reg_wsel[i];
reg_out[i+1] = reg_outID[i]==i?ram_po[i][po_addr]:reg_out[i];
float v = ram_weights[i][w_addr]*reg_data[i];
if (prev_addr[i]!=po_addr){
ram_po[i][prev_addr[i]] = reg_outID[i]==i? 0:temp[i];
}
temp[i]=(prev_addr[i]==po_addr?temp[i]:ram_po[i][po_addr])+v;
prev_addr[i] = po_addr;

}

*out=reg_out[numFU];
}

}

 

0 Kudos
1 Reply
Visitor kingshuk1000
Visitor
3,506 Views
Registered: ‎12-27-2016

Re: improve loop initiation interval

I updated the code a little to pinpoint the issue. 

#define numFU 100
#define osize 10
#define wsize 32
void datapath(float* data, float* wpacket, int* wsel, int* psel,int* outID,int* out){

int reg_psel[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_psel dim=1

int reg_wsel[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_wsel dim=1

float reg_data[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_data dim=1

float reg_wpacket[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_wpacket dim=1

float ram_po[numFU][osize];
#pragma HLS dependence variable=ram_po intra false
#pragma HLS ARRAY_PARTITION variable=ram_po dim=1
#pragma HLS RESOURCE variable=ram_po core=RAM_T2P_BRAM


float reg_outID[numFU];
#pragma HLS ARRAY_PARTITION variable=reg_outID dim=1

float reg_out[numFU+1];
#pragma HLS ARRAY_PARTITION variable=reg_out dim=1

reg_out[0]=0;
float ram_weights[numFU][wsize];
#pragma HLS ARRAY_PARTITION variable=ram_weights dim=1

float temp[numFU];
#pragma HLS ARRAY_PARTITION variable=temp dim=1


int prev_addr[numFU];
#pragma HLS ARRAY_PARTITION variable=prev_addr dim=1


inf_loop:while(1){
#pragma HLS PIPELINE II=1


reg_data[0] = *data;
reg_wpacket[0] = *wpacket;
reg_psel[0] = *psel;
reg_wsel[0] = *wsel;
int i=0;
//update partial outputs
lbl_update:for (int i=numFU-1;i>=0;i--){
#pragma HLS UNROLL
#pragma HLS dependence variable=ram_po intra false
//#pragma HLS dependence variable=temp intra false

int po_addr=reg_psel[i];
int w_addr =reg_wsel[i];
reg_out[i+1] = reg_outID[i]==i?ram_po[i][po_addr]:reg_out[i];
if (prev_addr[i]!=po_addr){
ram_po[i][prev_addr[i]] = reg_outID[i]==i? 0:temp[i];
temp[i]=ram_po[i][po_addr];
}
else{
float v = ram_weights[i][w_addr]*reg_data[i];
temp[i]=temp[i]+v;
}
prev_addr[i] = po_addr;

}

*out=reg_out[numFU];
}

}

The synthesis tool is complaining that it is "unable to enforce a carried dependency constraint between store operation on temp[0] at

temp[i]=ram_po[i][po_addr];

and load operation on temp[0] at

 

temp[i]=temp[i]+v;

The above lines are in if and else block respectively so two writes do not occur at the same time.

 

My question is that since I have partitioned temp, temp[0] should be just a register. In that case, shouldn't it be possible to do read and write on temp[0] in same clock cycle? 

 

0 Kudos