取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
Highlighted
Adventurer
Adventurer
345 次查看
注册日期: ‎12-12-2018

ram不够用,我应该如何去处理Design needs 22278 RAMB18 which is more than device capacity of 624

未申请local memory时的代码是这样的
void fpgaCrossPower(float *l_real, float *l_imag,float *r_real, float *r_imag,float *s_real,float *s_imag,int rows,int cols)
{
for(int iter = 0;iter < rows * cols;iter++){

		#pragma HLS PIPELINE
		float r,s_real_temp,s_imag_temp;
		s_real_temp = l_real[iter] * r_real[iter] + l_imag[iter] * r_imag[iter];
		s_imag_temp = l_imag[iter] * r_real[iter] - l_real[iter] * r_imag[iter];
		r = sqrtf(s_real_temp * s_real_temp + s_imag_temp * s_imag_temp) +0.001;
		s_real_temp = s_real_temp/r;
		s_imag[iter] = s_imag_temp/r;
	}*/
}
.h文件
#pragma SDS data zero_copy(l_real[0:rows * cols],l_imag[0:rows * cols],r_real[0:rows * cols],r_imag[0:rows * cols],s_real[0:rows * cols],s_imag[0:rows * cols])
void fpgaCrossPower(float *l_real, float *l_imag,float *r_real, float *r_imag,float *s_real,float *s_imag,int rows,int cols);



运行速度不够快,我想再进行优化加速,我看好多example中的加速函数使用了local memory

我就改成了

void fpgaCrossPower(float *l_real, float *l_imag,float *r_real, float *r_imag,float *s_real,float *s_imag,int rows,int cols)
{
	float rreal_local[NUM];
	float rimag_local[NUM];
	float lreal_local[NUM];
	float limag_local[NUM];
	float sreal_local[NUM];
	float simag_local[NUM];

	readCurrPt: for(int i = 0; i < rows * cols; i++){
	#pragma HLS PIPELINE
		rreal_local[i] = r_real[i];
		rimag_local[i] = r_imag[i];
		lreal_local[i] = l_real[i];
		limag_local[i] = l_imag[i];
	}

	for(int iter = 0;iter < rows * cols;iter++){

		#pragma HLS PIPELINE
		float r;
		sreal_local[iter] = lreal_local[iter] * rreal_local[iter] + limag_local[iter] * rimag_local[iter];
		simag_local[iter] = limag_local[iter] * rreal_local[iter] - lreal_local[iter] * rimag_local[iter];
		r = sqrtf(sreal_local[iter] * sreal_local[iter] + simag_local[iter] * simag_local[iter]) +0.001;
		sreal_local[iter] /= r;
		simag_local[iter] /= r;
	}
	writeCurrPt: for(int i = 0; i < rows * cols; i++){
	#pragma HLS PIPELINE
		s_real[i] = sreal_local[i];
		s_imag[i] = simag_local[i];
	}

}
.h文件
增加一行
#define NUM 2048*1024

我的数组长度比较长,是因为他是由照片像素个数决定的,

但是这样子改后出现了

Design needs 22278 RAMB18 which is more than device capacity of 624
ERROR: [VPL 17-69] Command failed: Vivado Synthesis failed
ERROR: [VPL 60-704] Integration error, One or more synthesis runs failed during dynamic region dcp generation
ERROR: [VPL 60-704] Integration error, run 'synth_1' couldn't start because one or more of the prerequisite runs failed
ERROR: [VPL 60-704] Integration error, run 'zcu104_fpgaCrossPower_1_0_synth_1' failed, please look at the run log file 'D:/project/xilinx18.3/cudatest3/Debug/_sds/p0/vivado/prj/prj.runs/zcu104_fpgaCrossPower_1_0_synth_1/runme.log' for more information
ERROR: [VPL 60-806] Failed to finish platform linker

这个报错,应该是我再local memory中申请了过大内存了

但是,如果不利用缓存的话,我还能如何进行优化加速呢?

有没有提供的例子参考?

0 项奖励
2 回复数
Xilinx Employee
Xilinx Employee
263 次查看
注册日期: ‎06-19-2019

这样定义会消耗大量的静态RAM资源,可以尝试合并使用,或者尝试开小一点的空间,然后增加迭代次数。

Capture.PNG

------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
如果提供的信息能解决您的问题,请标记为“接受为解决方案”。
如果您认为帖子有帮助,请点击“奖励”。谢谢!
-------------------------------------------------------------------
0 项奖励
Highlighted
Xilinx Employee
Xilinx Employee
243 次查看
注册日期: ‎03-24-2010

内部缓存太大了,器件放不下。

内部缓存使用的类似例子有https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/host/overlap_c/src/vector_addition.cpp

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 项奖励