cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
derickshi
Adventurer
Adventurer
1,083 Views
Registered: ‎11-25-2015

Error when applying DATAFLOW directive

Hello community,

 

I'm trying to implement a module to do a series of matrix-vector multiplications with Vivado HLS(2017.4). The result of previous matrix-vector multiplication will be the input of next matrix-vector multiplication. I'm planning to use DATAFLOW to the top level function to overlap the execution time of the sub-functions. Here is a simple code example:

 

void mat_vec_mult(int mat[16][16],int vec_in[16],int vec_out[16]){
  for(int i=0;i<16;i++){
    vec_out[i] = 0;
  }

  for(int i=0;i<16;i++){
    for(int j=0;j<16;j++){
        vec_out[i] += mat[i][j]*vec_in[j];
    }
  }
}

void mat_vec(int input_data[16],
             int mat_1[16][16],
             int mat_2[16][16],
             int mat_3[16][16],
             int output_data[16]
){
#pragma HLS DATAFLOW

  static int vec_out_1[16];
  static int vec_out_2[16];

  mat_vec_mult(mat_1,input_data,vec_out_1);
  mat_vec_mult(mat_2,vec_out_1,vec_out_2);
  mat_vec_mult(mat_3,vec_out_2,output_data);
}

 

I got the error when synthesis:

ERROR: [XFORM 203-711] Internal global variable 'vec_out_1' failed dataflow checking: it can only be read in one process
function. mat_vec_mult:solution1 Jun 20, 2018 7:16:57 PM

 

I understand that the cause of the error is that 'vec_out_1' is read by first 'mat_vec_mult' and it is also read by the second 'mat_vec_mult' function. So the error can be avoided by modify the function like this: 

 

void mat_vec_mult(int mat[16][16],int vec_in[16],int vec_out[16]){
  int temp[16];
  for(int i=0;i<16;i++){
    temp[i] = 0;
  }

  for(int i=0;i<16;i++){
    for(int j=0;j<16;j++){
        temp[i] += mat[i][j]*vec_in[j];
    }
  }

  for(int i=0;i<16;i++){
    vec_out[i] = temp[i];
  }
}

However, doing this will introduce twice memory resource to store the vectors. It is very significant when the design is large. According to UG902, DATAFLOW uses ping-pong buffer by default to store the intermediate results. I'm thinking that if ping-pong buffers are used, letting first 'mat_vec_mult' read 'vec_out_1' actually doesn't hurt because the second 'mat_vec_mult' is reading another set of ping-pong buffer at the same time.

 

So my question is, is there anyway to avoid inferring twice memory while we can still apply DATAFLOW to function like this?

 

Thanks in advance,

 

Derick

0 Kudos
1 Reply
scampbell
Moderator
Moderator
988 Views
Registered: ‎10-04-2011

Hello @derickshi,

 

Thank you for providing the example design with the workaround. That made it easy to work with the design. Like you mention, I agree the source of the DATAFLOW DRC error is the line:

 

vec_out[i] += mat[i][j]*vec_in[j];

 

where vec_out[i] is read, used in the running sum calculation, and then written back causing the DRC error. 

 

I see where the temp array variable in your workaround would add addition resources for large depths. 

 

I tried another code workaround, and then tested the results against each other to see the resource impact. The code workaround is still using a temp variable, but just a scalar. 

 

 

  int temp;

  for(int i=0;i<16;i++){
    temp = 0;

    for(int j=0;j<16;j++){
        temp += mat[i][j]*vec_in[j];
    }
    vec_out[i] = temp;
  }

 

 

And for resources I get:

 

 

			baseline2	       workaround	        temp1
BRAM_18K	        0			0			0	
DSP48E		        3			9			9
FF			272			794			632
LUT			502			1139		        797

 

While they are larger, the DATAFLOW directive was used in your workaround code and the temp1 I tried, while the baseline couldn't use the DATAFLOW due to the DRC. The DATAFLOW will add buffers between the stages like you said. This is the majority of the increase in resources. For comparison, running the workaround and temp1 without DATAFLOW results in :

 

            baseline2   workaround  temp1
BRAM_18K    0           0           0
DSP48E      3           3           3
FF          272         353         299
LUT         502         536         422

So without the DATAFLOW the resource difference is not that large. 

 

OK, I hope this helps,

Scott

 

 

 

 

 

 

 

0 Kudos