08-09-2016 05:55 AM
Hi my name is soo and i have a problem.
i synthesised convolution_layer_1 function. and i got a this result.
void Convolution_Layer_1(float src[28*28], float convolution_filer[25*20], float dst[24*24*20])
int col, row, col_f, row_f;
#pragma HLS ARRAY_PARTITION variable=_src complete dim=2
#pragma HLS PIPELINE II=1
for (row = 0; row < 24; row++)
#pragma HLS PIPELINE II=1
for (col = 0; col < 24; col++)
temp = 0;
for (row_f = 0; row_f<5; row_f++)
for (col_f = 0; col_f<5; col_f++)
temp += _src[row][col] * convolution_filer[feature_map * 25 + row_f * 5 + col_f];
dst[feature_map * 24 * 24 + row * 24 + col] = temp;
the problem is that total time of this function is more than result. 200x more slow.
i don't know why i get this result.
i think data transfer is the problem, is it ture ?
transfer size is 3136(src) + 2000(convolution_filter) + 46080(dst) = 51216(Byte)
and i use axi-simple bus. i don't know the reason of this problem.
is there any way to get solve this problem ?
08-09-2016 09:03 AM
Do you have any SDSoC pragmas on the interface of the function (ie. access_pattern(SEQUENTIAL))?
Can you post the data motion network report?
08-09-2016 02:38 PM
Another thing to point out is the units. The HLS report is reporting in cycles running at whatever clock rate the accelerator will run at (most build-in SDSoC platforms default to 100MHz). The SDSoC performance estimation report gives processor cycles running at whatever frequency the ARM processor is running at (either 533, 666, or 800MHz depending on which board you're using, zc702 is 666MHz). So 12000 cycles @100MHz is about 80000 cycles at 666MHz.
Additionally, the SDSoC estimation also includes the time it takes to setup the accelerator and transfer the data to and from the accelerator. So out of the 220000 cycles the estimate gave, 80000 (or 36%) is for the actual accelerator execution. Probably 5000-10000 cycles for software to do AXI-Lite writes to initialize the accelerator and setup the DMAs for data transfer. The remaining 120000 cycles is being estimated for the data transfer time. Its likely that that cost is overly pessimistic and you may see lower actual execution time.
If you want to see exactly how long each of the events takes during actual execution, consider enabling the Trace feature. You can do this by clicking the "Enable Event Tracing" checkbox in the project overview. Just make sure you clean your project first, and remember that currently the incremental build does NOT work with trace (you'll need to do a manual clean, then build) anytime you make a change.
You can read more about the Trace feature in Chapter 13 of UG1027: http://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_2/ug1027-sdsoc-user-guide.pdf
And you can use the tutorials in Chapter 8 of UG1028 as a guide on what steps to follow: http://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_2/ug1028-intro-to-sdsoc.pdf