We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Highlighted
Explorer
1,324 Views
Registered: ‎05-21-2017

## How to read concurrently multiple consecutive elements from a multidimensional array?

Hello,

let's say I have a multidimensional array, like this one:

data_t image_window0[3][3][512];

and I would like to read concurrently 16 consecutive elements.

Is this possible?

I tried to partition the array in dimension 0 by a factor of 16, but I got an error of illegal partition factor.

Is there a work around for achieving this?

Cheers,

Panos

Without proper software tools the hardware is unusable no matter how good and well designed it is.
1 Solution

Accepted Solutions
Scholar
1,756 Views
Registered: ‎04-26-2015

## Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Yes, it appears that your access patterns confuse it. If you pass in fchi/16 and we_pixel/16 instead of fchi and we_pixel, and perform the multiply inside the block (so HLS knows that it's definitely a multiple of 16) then it works fine:

```typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;

void top(int fchi, int we_pixel, int image_window[3][3][512], data_t _weights[512], data_t mul_buffer[16]) {

#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

for (int iw_h = 0; iw_h < 3; iw_h++) {
for (int iw_w = 0; iw_w < 3; iw_w++) {
#pragma HLS PIPELINE II=1

mindex_t fchi_tmp = fchi/16;
lindex_t we_pixel_tmp = we_pixel/16;

l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
{

data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];

mul_buffer[n] = tmp1 * tmp2;

}
}
}
}```

I don't know why. HLS should be able to figure this out for arbitrary fchi/we_pixel, even if it does take a whole lot of hardware to do the multiplexing.

6 Replies
Scholar
1,286 Views
Registered: ‎04-26-2015

## Re: How to read concurrently multiple consecutive elements from a multidimensional array?

HLS array dimensions go [1][2][3]. If you want to partition only the last dimension, you need to use dim=3 (dim=0 means partition every dimension).

Edit: if you're only going to be reading at 16-element increments (eg. reading 0:15 or 32:47, but never 4:19 or 7:22) then an array reshape might work better for you. Just depends how clever HLS is at detecting the access pattern.

Explorer
1,237 Views
Registered: ‎05-21-2017

## Re: How to read concurrently multiple consecutive elements from a multidimensional array?

@u4223374

Thank you for the explanation. I misunderstood the dim=0 option as considering the multidimensional array [3][3][512] as [3x3x512] and then partitioning it.

But, you are correct as this confirms it:

"`dim=<int>`: Specifies which dimension of a multi-dimensional array to partition. Specified as an integer from 0 to N, for an array with N dimensions:

• If a value of 0 is used, all dimensions of a multi-dimensional array are partitioned with the specified type and factor options."

Now, even if i reshape the last dimension of the array using the array reshape pragma, the HLS compiler cannot execute the code in parallel.

I'm trying multiply two arrays and get II=1:

```l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
{

mindex_t fchi_n = fchi + n;
lindex_t we_pixel_n = we_pixel + n;

data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_n ];
data_t tmp2 = _weights[ we_pixel_n ];

mul_buffer[n] = tmp1 * tmp2;

}```

even if I use the "array reshape block factor=16" at the third dimension of image_window and first dimension of _weights I cannot get II=1, even though i move in these two arrays by a factor of 16 each time.

Also using the array reshape pragma use a huge amount of resources (FFs and LUTs) meaning that even if I got II=1, the design is not implementable in my device.

Without proper software tools the hardware is unusable no matter how good and well designed it is.
Scholar
1,231 Views
Registered: ‎04-26-2015

## Re: How to read concurrently multiple consecutive elements from a multidimensional array?

@pmousoul

Ah, I think I have the solution: you need a cyclic partition, not a block partition. A block partition will take your [3][3][512] RAM and split that into separate RAMs with elements:

[3][3][0:15]

[3][3][16:31]

[3][3][32:47]

and so on.

If it wants to read elements [0:15] it's going to have to do 16 reads from the first RAM, and none from the others - which is exactly what you don't want.

A cyclic partition will do:

[3][3][0,16,32,48,64...]

[3][3][1,17,33,49,65...]

[3][3][2,18,34,50,66...]

Reading from this, it'll be able to pull out one element from each RAM.

Try a cyclic partition and see how that goes.

Explorer
1,222 Views
Registered: ‎05-21-2017

## Re: How to read concurrently multiple consecutive elements from a multidimensional array?

@u4223374

I tried just now this:

set_directive_array_partition -type cyclic -factor 16 -dim 1 "hw_conv" _weights0
set_directive_array_partition -type cyclic -factor 16 -dim 3 "hw_conv" image_window0

INFO: [SCHED 204-61] Pipelining function 'mul'.
WARNING: [SCHED 204-69] Unable to schedule 'load' operation ('image_window_2_V_lo_2', ../../code/SqueezeNet1.1_C++/hconv_test_w16/hw_conv_helper.cpp:409) on array 'image_window_2_V' due to limited memory ports.
INFO: [SCHED 204-61] Pipelining result: Target II: 1, Final II: 8, Depth: 9.

showing at this line:

`mindex_t fchi_n = fchi + n;`

Maybe I make the access pattern too complex for HLS to understand it?

Without proper software tools the hardware is unusable no matter how good and well designed it is.
Scholar
1,757 Views
Registered: ‎04-26-2015

## Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Yes, it appears that your access patterns confuse it. If you pass in fchi/16 and we_pixel/16 instead of fchi and we_pixel, and perform the multiply inside the block (so HLS knows that it's definitely a multiple of 16) then it works fine:

```typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;

void top(int fchi, int we_pixel, int image_window[3][3][512], data_t _weights[512], data_t mul_buffer[16]) {

#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

for (int iw_h = 0; iw_h < 3; iw_h++) {
for (int iw_w = 0; iw_w < 3; iw_w++) {
#pragma HLS PIPELINE II=1

mindex_t fchi_tmp = fchi/16;
lindex_t we_pixel_tmp = we_pixel/16;

l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
{

data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];

mul_buffer[n] = tmp1 * tmp2;

}
}
}
}```

I don't know why. HLS should be able to figure this out for arbitrary fchi/we_pixel, even if it does take a whole lot of hardware to do the multiplexing.

Explorer
1,202 Views
Registered: ‎05-21-2017