cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
akboken
Adventurer
Adventurer
1,990 Views
Registered: ‎10-19-2015

Reading and Writing BRAM on the same address with better timing ?

Hi All,

 

I am trying to achieve around 240 MHz clock frequency on series artix 200  device with following code. While, I can easily meet timeing if my target was 200 MHz, I can not meet 240 MHz timing. I know which part of code is causing the problem (making timing failure). So I am seeking your help to guide me to increase my timing. 

 

In the following code, if I read and write in the same iteration (see blue colors), I fail to achieve timing. If I remove the line (write), I can successfully achieve the 240 MHz timing. Is there way to force HLS to achieve higher timing in this case ?

 

 

for(int rows=0;rows<ROWS;rows++) {
    for(int i=0;i<16;i++) {
        #pragma HLS PIPELINE II=1

        for(int j=0;j<64;j++) {
             #pragma HLS PIPELINE II=1
             unsigned int ind =4*(j+1);
             temp.range(ind-1, ind-4) = ARRAY[rows][i*64+j];   // Read

 

             // reset the buffer. This line (or adding this line) is causing the problem. My timing gets worse (I achieve 5 ns or 200 MHz)
            ARRAY[rows][i*64+j] = 0 ;  // Write

}
OUT[rs][i] = temp;
}
}

 

Tags (4)
0 Kudos
1 Reply
hpoetzl
Voyager
Voyager
1,960 Views
Registered: ‎06-24-2013

Hello @akboken

 

I'm probably missing something, but the following code meets the timing just fine at 240MHz and should go up to 250MHz (untested):

#include <ap_int.h>

#define ROWS 64 void foo(ap_uint<4> ARRAY[ROWS][16*64], ap_uint<64*4> OUT[ROWS][16]) { #pragma HLS INTERFACE bram port=ARRAY #pragma HLS INTERFACE bram port=OUT #pragma HLS INTERFACE ap_ctrl_none port=return for(int rows=0; rows<ROWS; rows++) { for(int i=0; i<16; i++) { #pragma HLS PIPELINE II=1 ap_uint<64*4> temp; for(int j=0; j<64; j++) { #pragma HLS PIPELINE II=1 unsigned int ind =4*(j+1); temp.range(ind-1, ind-4) = ARRAY[rows][i*64+j]; // Read ARRAY[rows][i*64+j] = 0 ; // Write } OUT[rows][i] = temp; } } }

Here are the synthesis results:

#=== Post-Synthesis Resource usage ===
SLICE:            0
LUT:            449
FF:             718
DSP:              0
BRAM:             0
SRL:              0
#=== Final timing ===
CP required:    4.167
CP achieved post-synthesis:    4.060
Timing met

Best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos