取消
显示结果 
搜索替代 
您的意思是: 
Highlighted
Adventurer
Adventurer
283 次查看
注册日期: ‎12-12-2018

sdsoc编程中,我一直有一个疑问,什么时候才在hw中缓存?

我看例程时,有时候在hw加速函数中会有

 

//.h文件
#pragma SDS data copy(in1[0:dim*dim], in2[0:dim*dim], out[0:dim*dim]) #pragma SDS data access_pattern(in1:RANDOM, in2:RANDOM, out:RANDOM) void mmult_accel( const int in1[MAX_SIZE * MAX_SIZE], // Read-Only Matrix 1 const int in2[MAX_SIZE * MAX_SIZE], // Read-Only Matrix 2 int out[MAX_SIZE * MAX_SIZE], // Output Result int dim // Size of one dimension of the matrices );

//hw.cpp文件中
void mmult_accel( const int in1[MAX_SIZE * MAX_SIZE], // Read-Only Matrix 1 const int in2[MAX_SIZE * MAX_SIZE], // Read-Only Matrix 2 int out[MAX_SIZE * MAX_SIZE], // Output Result int dim // Size of one dimension of the matrices ) { // Performs matrix multiply over matrices A and B and stores the result // in C. All the matrices are square matrices of the form (size x size) mmult1: for (int i = 0; i < dim ; i++) { #pragma HLS LOOP_TRIPCOUNT min=c_min max=c_max mmult2 : for (int j = 0; j < dim ; j++) { #pragma HLS LOOP_TRIPCOUNT min=c_min max=c_max int result = 0; mmult3: for (int k = 0; k < dim; k++) { #pragma HLS LOOP_TRIPCOUNT min=c_min max=c_max #pragma HLS PIPELINE result += in1[i * dim + k] * in2[k * dim + j]; } out[i * dim + j] = result; } } }

这种情况下不需要在local memory中将sds输入的数组缓存后,再进行计算,但是

//.h文件
// Define Test Matrix Size #define TEST_MATRIX_DIM 64 // Define max matrix dimension supported by accelerator #define MAX_MATRIX_DIM 128 //TRIPCOUNT identifiers const unsigned int c_dim_min = 1; const unsigned int c_dim_max = TEST_MATRIX_DIM; // Zero copy interface enabled #pragma SDS data zero_copy(a[0:dim*dim], b[0:dim*dim], c[0:dim*dim]) void mmult_accel(int *a, int *b, int *c, int dim);
//hw.cpp文件
void mmult_accel(int *a, int *b, int *c, int dim) { //2D Array is used to store input and output matrices int bufa[MAX_MATRIX_DIM][MAX_MATRIX_DIM]; int bufb[MAX_MATRIX_DIM][MAX_MATRIX_DIM]; int bufc[MAX_MATRIX_DIM][MAX_MATRIX_DIM]; int matrix_size = dim*dim; // Burst Read data from DDR memory and write into 2D local buffer for a & b. int x = 0, y = 0; read_data: for (int i = 0 ; i < matrix_size ; i++){ #pragma HLS PIPELINE #pragma HLS LOOP_TRIPCOUNT min=c_dim_min*c_dim_min max=c_dim_max*c_dim_max bufa[x][y] = a[i]; bufb[x][y] = b[i]; if (y == dim-1){ x++; y = 0; } else{ y++; } } // Calculate matrix multiplication using local data buffers // and write result into local buffer for c matrix_mult: for (int row = 0; row < dim; row++) { #pragma HLS LOOP_TRIPCOUNT min=c_dim_min max=c_dim_max for (int col = 0; col < dim; col++) { #pragma HLS LOOP_TRIPCOUNT min=c_dim_min max=c_dim_max int result = 0; for (int k = 0; k < dim; k++) { #pragma HLS LOOP_TRIPCOUNT min=c_dim_min max=c_dim_max #pragma HLS pipeline result += bufa[row][k] * bufb[k][col]; } bufc[row][col] = result; } } // Burst Write result to DDR memory from local buffer int m = 0, n = 0; write_data: for (int i = 0 ; i < matrix_size ; i++){ #pragma HLS LOOP_TRIPCOUNT min=c_dim_min*c_dim_min max=c_dim_max*c_dim_max #pragma HLS PIPELINE int tmpData_c = bufc[m][n]; c[i] = tmpData_c; if (n == dim-1){ m++; n = 0; }else{ n++; } } }

这时就需要缓存一遍输入进来的数组了,

这应该不仅仅是因为为了体现矩阵计算才这样做的吧

我现在遇到一个问题,我需要做一维的大数组【100w】的运算,那我还能按照第一种方法直接对输入进来的sds进行运算,而不使用缓存吗?

我现在的hw加速程序是

#pragma SDS data zero_copy(l_real[0:2 * R * 2 * R * rowNum * colNum])//【100w】
void fftshift_adaptive(float *l_real, int rowNum, int colNum, int R)//rownum=10,colnum=17 { for(int iter = 0;iter < rowNum *colNum;iter ++) { #pragma HLS pipeline int start = iter * R * 2 * R * 2;//R=32 for(int iterr = 0;iterr <(R * R *2);iterr ++) { int ref_row = iterr / (2 *R); int ref_col = iterr % (2 *R); int dst_col = (ref_col < R) ? ref_col + R : ref_col - R; int dst_row = ref_row + R; float temp; int exchenge1 = start + ref_row * R * 2 + ref_col; int exchenge2 = start + dst_row * R * 2 + dst_col; temp = l_real[exchenge1]; l_real[exchenge1] = l_real[exchenge2]; l_real[exchenge2] = temp; } } }

将输入的大数组进行分块,然后进行运算,但是即使是进行分块,我也没有将输入的数组重新缓存,

虽然程序可以正常运行,但是我生成图片跟vc++生成的有很大区别

image.pngimage.png

第一张是正确的图,第二张是我生成的图,我感觉应该是我的大数组进行计算的时候出了问题,但是我改了两周了,我一直找不到问题所在

 

0 项奖励
2 条回复2
Highlighted
Xilinx Employee
Xilinx Employee
202 次查看
注册日期: ‎04-15-2011

回复: sdsoc编程中,我一直有一个疑问,什么时候才在hw中缓存?

@mu_yu 

用不用缓存,以及什么时候用缓存,我理解都是根据你的数据流以及数据结构的特点决定的。

比如你下面的矩阵相乘的例子,应该是缓存了一片数据后,才能做相应的处理,并不是来一个数就处理一个数。

 

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 项奖励
Highlighted
Adventurer
Adventurer
169 次查看
注册日期: ‎12-12-2018

回复: sdsoc编程中,我一直有一个疑问,什么时候才在hw中缓存?

意思是,我要是计算可以一个数一个数,那就用不着缓存了?不管我的数组有多大?

0 项奖励