cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
3,344 Views
Registered: ‎12-16-2013

Parallel work-items?

Jump to solution

Hi All,

 

As I understood from UG1207 document. SDAccel adds a conceptual loop nest around a kernel code to traverse the entire work-items in a work-group.

Then pipelining is the technique to optimise the execution of these nested loops.

 

I am wondering if there is any way to run work-items in parallel (not pipelined). For example unroll the conceptual loop nest.

 

Regards,

Mohammad

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
5,911 Views
Registered: ‎06-07-2016

Re: Parallel work-items?

Jump to solution

Hi Mohammad,

 

Currently, there is no xcl unroll attribute for work items.

 

You can use the opencl unroll attribute on a loop containing your kernel function as shown in this example:

https://www.xilinx.com/html_docs/xilinx2016_3/sdaccel_doc/index.html?q=/html_docs/xilinx2016_3/sdaccel_doc/topics/kernel-optimization/con-loop-unrolling.html

 

Or this complete example:

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/kernel_opt/lmem_2rw_ocl

 

Keep in mind with an FPGA you are not mapping your application to a fixed architecture. The conventional notion of process elements does not exist on the FPGA until you create them.

 

In some cases, the 3 nested loops can be helpful. In the case of unrolled work items, you can use __attribute__ ((reqd_work_group_size(1, 1, 1))) or clEnqueueTask then define your own index space loops and unroll them as you see fit.

 

Many of the examples on GitHub show this style of implementation.

 

Best,

-Dutch

View solution in original post

0 Kudos
1 Reply
Highlighted
Xilinx Employee
Xilinx Employee
5,912 Views
Registered: ‎06-07-2016

Re: Parallel work-items?

Jump to solution

Hi Mohammad,

 

Currently, there is no xcl unroll attribute for work items.

 

You can use the opencl unroll attribute on a loop containing your kernel function as shown in this example:

https://www.xilinx.com/html_docs/xilinx2016_3/sdaccel_doc/index.html?q=/html_docs/xilinx2016_3/sdaccel_doc/topics/kernel-optimization/con-loop-unrolling.html

 

Or this complete example:

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/kernel_opt/lmem_2rw_ocl

 

Keep in mind with an FPGA you are not mapping your application to a fixed architecture. The conventional notion of process elements does not exist on the FPGA until you create them.

 

In some cases, the 3 nested loops can be helpful. In the case of unrolled work items, you can use __attribute__ ((reqd_work_group_size(1, 1, 1))) or clEnqueueTask then define your own index space loops and unroll them as you see fit.

 

Many of the examples on GitHub show this style of implementation.

 

Best,

-Dutch

View solution in original post

0 Kudos