cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
zxttao
Observer
Observer
888 Views
Registered: ‎06-12-2008

How to make opencl_unroll attribute work in float loop ?

Jump to solution

Hi, 

We tried to use the opencl_unroll to optimize the multiply c = a * b in a loop.  If the data is int (int c = int a  * int b ),  sdaccel works fine and uses 64 x multiplies to implement the loop in the RTL. But if changed the data from int to float ( float c  = float a * float b), we were assuming sdaccel would take 64 x float multiplies for the loop, but sdaceel only used 2 x fmult in the generated RTL design.  It looks like that unroll attribute doesn't work in the float loop.    

My question : Is there any missing in my opencl code ?  How to use the loop unroll optimization in the float loop  ? 

BTW, we tried the same code both in sdaccel 2017.4 & 2018.2 , fpga board : KCU1500.

 

Attached the source code & log files .  

#define LENGTH 64
__kernel void __attribute__ ((reqd_work_group_size(1, 1, 1)))
krnl_vmult(
__global float* a,
__global float* b,
__global float* c)
{
local float bufa[LENGTH];
local float bufb[LENGTH];
local float bufc[LENGTH];

for(int i = 0; i < LENGTH; i++) {
bufa[i] = a[i];
bufb[i] = b[i];
}
__attribute__((opencl_unroll_hint(64)))
for(int i = 0; i < LENGTH; i++) {
bufc[i] = bufa[i] * bufb[i];
}

for(int i = 0; i < LENGTH; i++) {
c[i] = bufc[i];
}
return;
}

0 Kudos
Reply
1 Solution

Accepted Solutions
heeran
Xilinx Employee
Xilinx Employee
939 Views
Registered: ‎07-18-2014

Hi @zxttao,

 

Did you also try to partition the array which are getting used inside unroll loop? 

 

YOu can refer below example which explain about unroll using partition:

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/cpu_to_fpga/04_partition_ocl

 

 

-Heera

View solution in original post

1 Reply
heeran
Xilinx Employee
Xilinx Employee
940 Views
Registered: ‎07-18-2014

Hi @zxttao,

 

Did you also try to partition the array which are getting used inside unroll loop? 

 

YOu can refer below example which explain about unroll using partition:

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/cpu_to_fpga/04_partition_ocl

 

 

-Heera

View solution in original post