UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
1,672 Views
Registered: ‎05-21-2017

Array reshape example project?

Jump to solution

Hello,

 

I would like to know if there is somewhere an example C++ project showing the use of the array reshape pragma.

 

From this, I understand what this pragma does, but how do I use it in the code - do I have to modify my code to take advantage of it, or HLS is clever enough to use my existing code, understand the access pattern of my array in the code, and accelerate it?

 

 

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos
1 Solution

Accepted Solutions
Contributor
Contributor
2,257 Views
Registered: ‎03-13-2017

Re: Array reshape example project?

Jump to solution

[Edited: added references]

Here an example based on something that should be familiar to you ;).
It is derived from the solution of this post by u4223374.

typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;

void top(int fchi, int we_pixel, int image_window[3][3][512], data_t _weights[512], data_t mul_buffer[16]) {

   #pragma HLS ARRAY_RESHAPE variable=mul_buffer cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=_weights cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=image_window cyclic factor=16 dim=3

//#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

	for (int iw_h = 0; iw_h < 3; iw_h++) {
		for (int iw_w = 0; iw_w < 3; iw_w++) {
            #pragma HLS PIPELINE II=1

			mindex_t fchi_tmp = fchi/16;
			lindex_t we_pixel_tmp = we_pixel/16;

			l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
			{

				data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
				data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];
				mul_buffer[n] = tmp1 * tmp2;

			}
		}
	}
}

Being reshaped and not partitioned, it saves a lot of BRAM/input on interface.

The area can be further improved by small changes like

 

typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;

   #pragma HLS ARRAY_RESHAPE variable=mul_buffer cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=_weights cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=image_window cyclic factor=16 dim=3

//#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

	for (int iw_h = 0; iw_h < 3; iw_h++) {
		for (int iw_w = 0; iw_w < 3; iw_w++) {
            #pragma HLS PIPELINE II=1

			mindex_t fchi_tmp = fchi>>4; //fchi/16;
			lindex_t we_pixel_tmp = we_pixel>>4; //we_pixel/16;

			l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
			{

				data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
				data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];
				mul_buffer[n] = tmp1 * tmp2;

			}
		}
	}
}

(HLS should recognize when a division can be substituted by a shift, but not here)

 

All solution tested with 2017.2

+ Timing (ns): 
    * Summary: 
    +--------+-------+----------+------------+
    |  Clock | Target| Estimated| Uncertainty|
    +--------+-------+----------+------------+
    |ap_clk  |  20.00|     13.78|        2.50|
    +--------+-------+----------+------------+

+ Latency (clock cycles): (SAME IN ALL CASES)
    * Summary: 
    +-----+-----+-----+-----+---------+
    |  Latency  |  Interval | Pipeline|
    | min | max | min | max |   Type  |
    +-----+-----+-----+-----+---------+
    |   11|   11|   12|   12|   none  |
    +-----+-----+-----+-----+---------+

Original with PARTITION pragma
+-----------------+---------+-------+-------+-------+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  |
+-----------------+---------+-------+-------+-------+
|Total            |        0|     64|    506|   1067|

with RESHAPE pragma
+-----------------+---------+-------+-------+-------+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  |
+-----------------+---------+-------+-------+-------+
|Total            |        0|     64|    356|   1019|


with RESHAPE pragma and RIGHT shifts instead of divisions
+-----------------+---------+-------+-------+-------+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  |
+-----------------+---------+-------+-------+-------+
|Total            |        0|     64|    102|    907|

 

3 Replies
Contributor
Contributor
2,258 Views
Registered: ‎03-13-2017

Re: Array reshape example project?

Jump to solution

[Edited: added references]

Here an example based on something that should be familiar to you ;).
It is derived from the solution of this post by u4223374.

typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;

void top(int fchi, int we_pixel, int image_window[3][3][512], data_t _weights[512], data_t mul_buffer[16]) {

   #pragma HLS ARRAY_RESHAPE variable=mul_buffer cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=_weights cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=image_window cyclic factor=16 dim=3

//#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

	for (int iw_h = 0; iw_h < 3; iw_h++) {
		for (int iw_w = 0; iw_w < 3; iw_w++) {
            #pragma HLS PIPELINE II=1

			mindex_t fchi_tmp = fchi/16;
			lindex_t we_pixel_tmp = we_pixel/16;

			l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
			{

				data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
				data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];
				mul_buffer[n] = tmp1 * tmp2;

			}
		}
	}
}

Being reshaped and not partitioned, it saves a lot of BRAM/input on interface.

The area can be further improved by small changes like

 

typedef int mindex_t;
typedef int sindex_t;
typedef int lindex_t;
typedef int data_t;

   #pragma HLS ARRAY_RESHAPE variable=mul_buffer cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=_weights cyclic factor=16 dim=1
   #pragma HLS ARRAY_RESHAPE variable=image_window cyclic factor=16 dim=3

//#pragma HLS ARRAY_PARTITION variable=_weights dim=1 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=image_window dim=3 cyclic factor=16
//#pragma HLS ARRAY_PARTITION variable=mul_buffer dim=1 complete

	for (int iw_h = 0; iw_h < 3; iw_h++) {
		for (int iw_w = 0; iw_w < 3; iw_w++) {
            #pragma HLS PIPELINE II=1

			mindex_t fchi_tmp = fchi>>4; //fchi/16;
			lindex_t we_pixel_tmp = we_pixel>>4; //we_pixel/16;

			l_mul_chi: for ( sindex_t n = 0; n != 16; n++ )
			{

				data_t tmp1 = image_window[ iw_h ][ iw_w ][ fchi_tmp * 16 + n ];
				data_t tmp2 = _weights[ we_pixel_tmp * 16 + n ];
				mul_buffer[n] = tmp1 * tmp2;

			}
		}
	}
}

(HLS should recognize when a division can be substituted by a shift, but not here)

 

All solution tested with 2017.2

+ Timing (ns): 
    * Summary: 
    +--------+-------+----------+------------+
    |  Clock | Target| Estimated| Uncertainty|
    +--------+-------+----------+------------+
    |ap_clk  |  20.00|     13.78|        2.50|
    +--------+-------+----------+------------+

+ Latency (clock cycles): (SAME IN ALL CASES)
    * Summary: 
    +-----+-----+-----+-----+---------+
    |  Latency  |  Interval | Pipeline|
    | min | max | min | max |   Type  |
    +-----+-----+-----+-----+---------+
    |   11|   11|   12|   12|   none  |
    +-----+-----+-----+-----+---------+

Original with PARTITION pragma
+-----------------+---------+-------+-------+-------+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  |
+-----------------+---------+-------+-------+-------+
|Total            |        0|     64|    506|   1067|

with RESHAPE pragma
+-----------------+---------+-------+-------+-------+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  |
+-----------------+---------+-------+-------+-------+
|Total            |        0|     64|    356|   1019|


with RESHAPE pragma and RIGHT shifts instead of divisions
+-----------------+---------+-------+-------+-------+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  |
+-----------------+---------+-------+-------+-------+
|Total            |        0|     64|    102|    907|

 

Scholar u4223374
Scholar
1,607 Views
Registered: ‎04-26-2015

Re: Array reshape example project?

Jump to solution

@baltam Excellent catch on the division/shift! I didn't even check for that, and your change will save plenty of space.

 

I think it doesn't do the shift here because the values could potentially go negative, and the C standard behaviour for a negative division is not equivalent to a shift. If you do it "properly" (ie use an unsigned int of appropriate width) a shift will be inferred.

Explorer
Explorer
1,557 Views
Registered: ‎05-21-2017

Re: Array reshape example project?

Jump to solution

@baltam

 

Thank you very much for your example. Yes, it is clear to me now how to use the RESHAPE pragma.

 

The problem was that HLS could not understand the array access pattern. @u4223374 gave solution to this problem with his proposed coding style making more clear to HLS the array access pattern.

Unfortunately, @baltam's solution does not work in my case because the arrays are not my top functions arguments, but arguments in a sub-function. Attached are the synthesis results of two tests I did:

 

-In the partition-reshape test, solution "1" uses the PARTITION pragma and "2" the RESHAPE pragma. As you can see, it is just a resource allocation preference - sol "1" uses BRAMs and sol "2" uses FFs & LUTs.

 

-In the shift-division test, sol "1" uses shift operator and "2" uses the division operator - there is not much of a difference.

 

EDIT: the bellow results are generated with Vivado HLS 2017.2.

 

 

Thank you both for your answers!

Cheers,

Panos

Without proper software tools the hardware is unusable no matter how good and well designed it is.
partition-reshape.png
shift-division.png
0 Kudos