UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
1,329 Views
Registered: ‎05-21-2017

How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

Hello,

 

let's say I have a multidimensional array, like this one:

data_t image_window0[3][3][512];

 

and I would like to read concurrently 16 consecutive elements.

 

Is this possible?

 

I tried to partition the array in dimension 0 by a factor of 16, but I got an error of illegal partition factor.

 

Is there a work around for achieving this?

 

 

Cheers,

Panos

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos
1 Solution

Accepted Solutions
Scholar u4223374
Scholar
1,761 Views
Registered: ‎04-26-2015

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

Yes, it appears that your access patterns confuse it. If you pass in fchi/16 and we_pixel/16 instead of fchi and we_pixel, and perform the multiply inside the block (so HLS knows that it's definitely a multiple of 16) then it works fine:

 

typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;


void top(int fchi, int we_pixel, int image_window[3][3][512], data_t _weights[512], data_t mul_buffer[16]) {


#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

	for (int iw_h = 0; iw_h < 3; iw_h++) {
		for (int iw_w = 0; iw_w < 3; iw_w++) {
#pragma HLS PIPELINE II=1

			mindex_t fchi_tmp = fchi/16;
			lindex_t we_pixel_tmp = we_pixel/16;

			l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
			{

				data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
				data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];

				mul_buffer[n] = tmp1 * tmp2;

			}
		}
	}
}

I don't know why. HLS should be able to figure this out for arbitrary fchi/we_pixel, even if it does take a whole lot of hardware to do the multiplexing.

View solution in original post

6 Replies
Scholar u4223374
Scholar
1,291 Views
Registered: ‎04-26-2015

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

HLS array dimensions go [1][2][3]. If you want to partition only the last dimension, you need to use dim=3 (dim=0 means partition every dimension).

 

Edit: if you're only going to be reading at 16-element increments (eg. reading 0:15 or 32:47, but never 4:19 or 7:22) then an array reshape might work better for you. Just depends how clever HLS is at detecting the access pattern.

Explorer
Explorer
1,242 Views
Registered: ‎05-21-2017

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

@u4223374

 

Thank you for the explanation. I misunderstood the dim=0 option as considering the multidimensional array [3][3][512] as [3x3x512] and then partitioning it.

 

But, you are correct as this confirms it:

"dim=<int>: Specifies which dimension of a multi-dimensional array to partition. Specified as an integer from 0 to N, for an array with N dimensions:

  • If a value of 0 is used, all dimensions of a multi-dimensional array are partitioned with the specified type and factor options."

Now, even if i reshape the last dimension of the array using the array reshape pragma, the HLS compiler cannot execute the code in parallel.

 

I'm trying multiply two arrays and get II=1:

 

 

l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
{

	mindex_t fchi_n = fchi + n;
	lindex_t we_pixel_n = we_pixel + n;

	data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_n ];
	data_t tmp2 = _weights[ we_pixel_n ];

	mul_buffer[n] = tmp1 * tmp2;

}

 

even if I use the "array reshape block factor=16" at the third dimension of image_window and first dimension of _weights I cannot get II=1, even though i move in these two arrays by a factor of 16 each time.

 

Also using the array reshape pragma use a huge amount of resources (FFs and LUTs) meaning that even if I got II=1, the design is not implementable in my device.

 

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos
Scholar u4223374
Scholar
1,236 Views
Registered: ‎04-26-2015

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

@pmousoul

 

Ah, I think I have the solution: you need a cyclic partition, not a block partition. A block partition will take your [3][3][512] RAM and split that into separate RAMs with elements:

 

[3][3][0:15]

[3][3][16:31]

[3][3][32:47]

 

and so on.

 

If it wants to read elements [0:15] it's going to have to do 16 reads from the first RAM, and none from the others - which is exactly what you don't want.

 

A cyclic partition will do:

 

[3][3][0,16,32,48,64...]

[3][3][1,17,33,49,65...]

[3][3][2,18,34,50,66...]

 

Reading from this, it'll be able to pull out one element from each RAM.

 

Try a cyclic partition and see how that goes.

0 Kudos
Explorer
Explorer
1,227 Views
Registered: ‎05-21-2017

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

@u4223374

 

I tried just now this:

 

set_directive_array_partition -type cyclic -factor 16 -dim 1 "hw_conv" _weights0
set_directive_array_partition -type cyclic -factor 16 -dim 3 "hw_conv" image_window0

 

but HLS still complains about:

 

INFO: [SCHED 204-61] Pipelining function 'mul'.
WARNING: [SCHED 204-69] Unable to schedule 'load' operation ('image_window_2_V_lo_2', ../../code/SqueezeNet1.1_C++/hconv_test_w16/hw_conv_helper.cpp:409) on array 'image_window_2_V' due to limited memory ports.
INFO: [SCHED 204-61] Pipelining result: Target II: 1, Final II: 8, Depth: 9.

 

showing at this line:

mindex_t fchi_n = fchi + n;

 

Maybe I make the access pattern too complex for HLS to understand it?

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos
Scholar u4223374
Scholar
1,762 Views
Registered: ‎04-26-2015

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

Yes, it appears that your access patterns confuse it. If you pass in fchi/16 and we_pixel/16 instead of fchi and we_pixel, and perform the multiply inside the block (so HLS knows that it's definitely a multiple of 16) then it works fine:

 

typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;


void top(int fchi, int we_pixel, int image_window[3][3][512], data_t _weights[512], data_t mul_buffer[16]) {


#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

	for (int iw_h = 0; iw_h < 3; iw_h++) {
		for (int iw_w = 0; iw_w < 3; iw_w++) {
#pragma HLS PIPELINE II=1

			mindex_t fchi_tmp = fchi/16;
			lindex_t we_pixel_tmp = we_pixel/16;

			l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
			{

				data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
				data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];

				mul_buffer[n] = tmp1 * tmp2;

			}
		}
	}
}

I don't know why. HLS should be able to figure this out for arbitrary fchi/we_pixel, even if it does take a whole lot of hardware to do the multiplexing.

View solution in original post

Explorer
Explorer
1,207 Views
Registered: ‎05-21-2017

Re: How to read concurrently multiple consecutive elements from a multidimensional array?

Jump to solution

@u4223374

 

I would have never thought of this kind of coding would make it work. With your solution I don't even have to change my code!

 

Thank you very much!

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos