cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Highlighted
Visitor
179 Views
Registered: ‎09-04-2020

## opencl kernel: "casting" uint512_dt type to array of 16 floats

Hello,

I would like to use the 512bit data type on u250 to perform floating point operations.

I am modifying the wide_vadd tutorial code:

I am using the 512 bit type to make use of the 512 bit bus on the 250 to efficiently transfer 512 bits to local memory.

But I really want to perform 16 floating point operations on the 512 bit piece of data once it is stored in local memory.

Is this possible? How can I access the 512 bits as float ?

1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
118 Views
Registered: ‎08-17-2011

Hello  @boxerab ,

The general question that you ask could have several different solutions like reading the 512 bit bus and storing the values as float in an array that could be 1D or 2D; 2D array is the best to represent the solution in people's mind: myfloatstorage[..][16] the partition complete on dim 2, making each lane independent etc.

You can do the conversion raw bits to float using unions of unsigned int and float *but not* ap_uint as they are classes.

Some of that is (was ?) show in the Vivado HLS / Vitis HLS user guide

``````template <typename T> // ap_uint or unsigned int can be used
float uint_to_float(T x) {
union {
unsigned int i;
float f;
} conv;
conv.i=(unsigned int)x;
return conv.f;
}

unsigned int float_to_uint(float x) {
union {
unsigned int i;
float f;
} conv;
conv.f=x;
return conv.i;
}``````

Now in reference to the source code that you point to, the buffers are ap_uint<512> so let's not change that and tweak the answer by providing directly a vector add that would perform the vector add on slices of S bits or ranges of S bits from a larger bus of W bits directly.

Let's look first at the integer version that takes W=512 bits and adds by slices of say S=32 bits;

for this let's (re)use a templatized function :

``````template <int S, int W>
ap_uint<W> a,
ap_uint<W> b,
ap_uint<W> &res
) {
// this implicitly unroll the loops
#pragma HLS pipeline II=1
for(int i=0; i<W/S; i++) {
res.range( S*(i+1)-1, S*i ) = a.range( S*(i+1)-1, S*i )   +   b.range( S*(i+1)-1, S*i ) ;
}

// example usage
ap_uint<512> a;
ap_uint<512> b;

// adds 8 bits at the time, ie 512/8=64 adders of 8 bits data
// adds 32 bits at the time, ie 512/32=16 adders of 32 bits data

In the examples, above the compiler will be able to work out that W=512 from the function signature so only need to provide the S bits value (for the slice size) provide as template argument.

Now you can see how we could merge the 2 together, for example:

``````template <int W> // hardcoding in this version because float means S=32
ap_uint<W> a,
ap_uint<W> b,
ap_uint<W> &res
) {
// this implicitly unroll the loops
const int S =32;
#pragma HLS pipeline II=1
for(int i=0; i<W/S; i++) {
res.range( S*(i+1)-1, S*i ) = float_to_uint (
uint_to_float(a.range( S*(i+1)-1, S*i ) )
+   uint_to_float(b.range( S*(i+1)-1, S*i ) )
);
}

// example usage
ap_uint<512> a;
ap_uint<512> b;

// adds 32 bits at the time, ie 512/32=16 adders of 32 bits data

I hope this helps you and should fully answer your question, after maybe some small adaptations etc.

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls

* Give Kudos to a post which you think is helpful and reply oriented.
2 Replies
Highlighted
Xilinx Employee
119 Views
Registered: ‎08-17-2011

Hello  @boxerab ,

The general question that you ask could have several different solutions like reading the 512 bit bus and storing the values as float in an array that could be 1D or 2D; 2D array is the best to represent the solution in people's mind: myfloatstorage[..][16] the partition complete on dim 2, making each lane independent etc.

You can do the conversion raw bits to float using unions of unsigned int and float *but not* ap_uint as they are classes.

Some of that is (was ?) show in the Vivado HLS / Vitis HLS user guide

``````template <typename T> // ap_uint or unsigned int can be used
float uint_to_float(T x) {
union {
unsigned int i;
float f;
} conv;
conv.i=(unsigned int)x;
return conv.f;
}

unsigned int float_to_uint(float x) {
union {
unsigned int i;
float f;
} conv;
conv.f=x;
return conv.i;
}``````

Now in reference to the source code that you point to, the buffers are ap_uint<512> so let's not change that and tweak the answer by providing directly a vector add that would perform the vector add on slices of S bits or ranges of S bits from a larger bus of W bits directly.

Let's look first at the integer version that takes W=512 bits and adds by slices of say S=32 bits;

for this let's (re)use a templatized function :

``````template <int S, int W>
ap_uint<W> a,
ap_uint<W> b,
ap_uint<W> &res
) {
// this implicitly unroll the loops
#pragma HLS pipeline II=1
for(int i=0; i<W/S; i++) {
res.range( S*(i+1)-1, S*i ) = a.range( S*(i+1)-1, S*i )   +   b.range( S*(i+1)-1, S*i ) ;
}

// example usage
ap_uint<512> a;
ap_uint<512> b;

// adds 8 bits at the time, ie 512/8=64 adders of 8 bits data
// adds 32 bits at the time, ie 512/32=16 adders of 32 bits data

In the examples, above the compiler will be able to work out that W=512 from the function signature so only need to provide the S bits value (for the slice size) provide as template argument.

Now you can see how we could merge the 2 together, for example:

``````template <int W> // hardcoding in this version because float means S=32
ap_uint<W> a,
ap_uint<W> b,
ap_uint<W> &res
) {
// this implicitly unroll the loops
const int S =32;
#pragma HLS pipeline II=1
for(int i=0; i<W/S; i++) {
res.range( S*(i+1)-1, S*i ) = float_to_uint (
uint_to_float(a.range( S*(i+1)-1, S*i ) )
+   uint_to_float(b.range( S*(i+1)-1, S*i ) )
);
}

// example usage
ap_uint<512> a;
ap_uint<512> b;

// adds 32 bits at the time, ie 512/32=16 adders of 32 bits data

I hope this helps you and should fully answer your question, after maybe some small adaptations etc.

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls