cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
mateu94
Newbie
Newbie
838 Views
Registered: ‎03-11-2019

array_partition in BRAM port in Vivado HLS 2017.3

Jump to solution

Hi Vivado community,

I'm trying to do a data transfer between an internal memory and a BRAM port with Vivado HLS 2017.3. The top function is like this:

 

void loopBack_hls_automatic_mcxx_wrapper(axiStream_t& inStream, axiStream_t& outStream, int *mcxx_a, int *mcxx_aa, int mcxx_a_external1[SIZE]) {
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE axis port=inStream
#pragma HLS INTERFACE axis port=outStream
#pragma HLS INTERFACE m_axi port=mcxx_data
#pragma HLS INTERFACE m_axi port=mcxx_a
#pragma HLS INTERFACE m_axi port=mcxx_aa
#pragma HLS INTERFACE bram port=mcxx_a_external1
...
...
static int a[SIZE];
...
...
READ_intToExt:
for (i = 0; i < n; i++) 
{
     mcxx_a_external1[i] = a[i];
}

This works correctly, but I wanted to improve the perfomance by using the following pragmas:

 

 

#pragma HLS array_partition variable=a cyclic factor=4
#pragma HLS array_partition variable=mcxx_a_external1 cyclic factor=4

So both memories are partitioned the same way, and each cycle 4 elements can be read/written. But I do not know why this gives wrong results when I read the BRAM port, like:

 

 

n=32
Values of aa (expected: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ...)
5 6 7 8 21 22 23 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

 

It looks a bit strange coincidence that with factor partition 4 the first 4 elements are correct, then the next 3x4 elements are ignored, and then the other 4 elements are included.

Thank you,

Marc

0 Kudos
1 Solution

Accepted Solutions
mateu94
Newbie
Newbie
779 Views
Registered: ‎03-11-2019

Hi,

I have found the problem. I had the loop only with #pragma HLS PIPELINE II=1, and with both memories partitioned in 4 BRAMs. When I was reading from the internal memory to the BRAM port, it was expecting 4 elements in each cycle I guess, so I have added a #pragma HLS unroll factor=4 so it can perform 4 reads in each iteration, and now it works correctly. This is how the code looks now:

for (i = 0; i < n; i++) 
{
     #pragma HLS PIPELINE II=1
     #pragma HLS unroll factor=4
     mcxx_a_external1[i] = a[i];
}

View solution in original post

0 Kudos
3 Replies
u4223374
Advisor
Advisor
821 Views
Registered: ‎04-26-2015

Can you post the code you're using to read the BRAM? I suspect that it's just an indexing error there - a very easy mistake to make.

0 Kudos
mateu94
Newbie
Newbie
819 Views
Registered: ‎03-11-2019

Hi,

I read the BRAM in the following way:

memcpy(mcxx_aa + __param/sizeof(int), (const int *)(mcxx_a_external1), n*sizeof(int));

Where mcxx_aa is just an axi port to the PS

0 Kudos
mateu94
Newbie
Newbie
780 Views
Registered: ‎03-11-2019

Hi,

I have found the problem. I had the loop only with #pragma HLS PIPELINE II=1, and with both memories partitioned in 4 BRAMs. When I was reading from the internal memory to the BRAM port, it was expecting 4 elements in each cycle I guess, so I have added a #pragma HLS unroll factor=4 so it can perform 4 reads in each iteration, and now it works correctly. This is how the code looks now:

for (i = 0; i < n; i++) 
{
     #pragma HLS PIPELINE II=1
     #pragma HLS unroll factor=4
     mcxx_a_external1[i] = a[i];
}

View solution in original post

0 Kudos