05-15-2019 03:31 AM
Hi Vivado community,
I'm trying to do a data transfer between an internal memory and a BRAM port with Vivado HLS 2017.3. The top function is like this:
void loopBack_hls_automatic_mcxx_wrapper(axiStream_t& inStream, axiStream_t& outStream, int *mcxx_a, int *mcxx_aa, int mcxx_a_external1[SIZE]) { #pragma HLS INTERFACE ap_ctrl_none port=return #pragma HLS INTERFACE axis port=inStream #pragma HLS INTERFACE axis port=outStream #pragma HLS INTERFACE m_axi port=mcxx_data #pragma HLS INTERFACE m_axi port=mcxx_a #pragma HLS INTERFACE m_axi port=mcxx_aa #pragma HLS INTERFACE bram port=mcxx_a_external1 ... ... static int a[SIZE]; ... ... READ_intToExt: for (i = 0; i < n; i++) { mcxx_a_external1[i] = a[i]; }
This works correctly, but I wanted to improve the perfomance by using the following pragmas:
#pragma HLS array_partition variable=a cyclic factor=4 #pragma HLS array_partition variable=mcxx_a_external1 cyclic factor=4
So both memories are partitioned the same way, and each cycle 4 elements can be read/written. But I do not know why this gives wrong results when I read the BRAM port, like:
n=32
Values of aa (expected: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ...)
5 6 7 8 21 22 23 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
It looks a bit strange coincidence that with factor partition 4 the first 4 elements are correct, then the next 3x4 elements are ignored, and then the other 4 elements are included.
Thank you,
Marc
05-16-2019 09:25 AM
Hi,
I have found the problem. I had the loop only with #pragma HLS PIPELINE II=1, and with both memories partitioned in 4 BRAMs. When I was reading from the internal memory to the BRAM port, it was expecting 4 elements in each cycle I guess, so I have added a #pragma HLS unroll factor=4 so it can perform 4 reads in each iteration, and now it works correctly. This is how the code looks now:
for (i = 0; i < n; i++) { #pragma HLS PIPELINE II=1 #pragma HLS unroll factor=4 mcxx_a_external1[i] = a[i]; }
05-15-2019 03:46 AM
Can you post the code you're using to read the BRAM? I suspect that it's just an indexing error there - a very easy mistake to make.
05-15-2019 03:56 AM
Hi,
I read the BRAM in the following way:
memcpy(mcxx_aa + __param/sizeof(int), (const int *)(mcxx_a_external1), n*sizeof(int));
Where mcxx_aa is just an axi port to the PS
05-16-2019 09:25 AM
Hi,
I have found the problem. I had the loop only with #pragma HLS PIPELINE II=1, and with both memories partitioned in 4 BRAMs. When I was reading from the internal memory to the BRAM port, it was expecting 4 elements in each cycle I guess, so I have added a #pragma HLS unroll factor=4 so it can perform 4 reads in each iteration, and now it works correctly. This is how the code looks now:
for (i = 0; i < n; i++) { #pragma HLS PIPELINE II=1 #pragma HLS unroll factor=4 mcxx_a_external1[i] = a[i]; }