UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
934 Views
Registered: ‎02-28-2015

opencl kernel

Jump to solution

Hi Guys,

I am currently working on some OpenCL development targeting Xilinx FPGA by using SDAccel at AWS F1 instance. I have few question regarding OpenCL compilation via SDAccel.

1. If I don't specify the "reqd_work_item_size" attribute with the __kernel, what is SDAccel compiler going to do? and what is the work-group size?

2. to pipeline or unroll a loop (= to use xcl_pipeline_loop or opencl_unroll_hint), does the "reqd_work_item_size" attribute should be set as {1,1,1}?

3. to pipeline work-items, does the "reqd_work_item_size" attribute should be set as {x,y,z} (where x+y+z > 1)?

4. Let assume I have a __kernel without specifying "reqd_work_item_size" attribute. If I have clEnqueueNDRangeKernel with local work size as {32, 8} in my host code, then how is this going to execute?

5. If I use "reqd_work_item_size" attribute as {x, y, z}, then what should my local work size be at clEnqueueNDRangeKernel in my host code?

Thank you

Tags (1)
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
872 Views
Registered: ‎01-12-2017

Re: opencl kernel

Jump to solution

Hi @allien

 

Based on your queries related to NDRange and work group and work item structure I could emphasize that OpenCL kernel mechanism for GPUs are not directly applicable to FPGA, even if you try to enforce by specifying local and global work item/group grid for your OpenCL kernels using SDAccel it will result in compiler appending "for-loops" during opencl-hls conversion phase. 

Ideal way to extract every ounce performance out of your design is to go ahead with reqd_work_group(1,1,1). This has an advantage of keeping things under control and resulting design will be scalable. To learn more about these details please have a look at the link below. 

 

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started

 

Thanks

Kali

7 Replies
Moderator
Moderator
921 Views
Registered: ‎11-04-2010

Re: opencl kernel

Jump to solution

Hi, @allien ,

FPGA has different structure from GPU's structure and the in FPGA the work_item size is not fixed. So in FPGA, we recommend you always set NDrange to {1, ,1 ,1} to reduce overhead cost. 

I don't think you need to research the other NDrange situations in FPGA design.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Adventurer
Adventurer
909 Views
Registered: ‎02-28-2015

Re: opencl kernel

Jump to solution

Hi @hongh

Thanks for your quick reply.

1. "the work_item size is not fixed", is it work-item or work-group ?

2. "we recommend you always set NDrange to {1, ,1 ,1}", Did you mean the local work size ?

I want to understand how the compiler is working in different scenario. I highly appreciate if you can explain me regarding the question I have posted in first post?

Thank You

 

0 Kudos
Xilinx Employee
Xilinx Employee
873 Views
Registered: ‎01-12-2017

Re: opencl kernel

Jump to solution

Hi @allien

 

Based on your queries related to NDRange and work group and work item structure I could emphasize that OpenCL kernel mechanism for GPUs are not directly applicable to FPGA, even if you try to enforce by specifying local and global work item/group grid for your OpenCL kernels using SDAccel it will result in compiler appending "for-loops" during opencl-hls conversion phase. 

Ideal way to extract every ounce performance out of your design is to go ahead with reqd_work_group(1,1,1). This has an advantage of keeping things under control and resulting design will be scalable. To learn more about these details please have a look at the link below. 

 

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started

 

Thanks

Kali

Adventurer
Adventurer
851 Views
Registered: ‎02-28-2015

Re: opencl kernel

Jump to solution

Hi @kalib

Thank you so much,

I think I have a starting point to optimize my code now

0 Kudos
Adventurer
Adventurer
849 Views
Registered: ‎02-28-2015

Re: opencl kernel

Jump to solution

Hi @kalib,

in the following link @brucey has mentioned that "try to make global size/local size as small as possible. Ideally, it's to set global size to (1,1,1)" what does it mean?

https://forums.xilinx.com/t5/SDAccel/Opencl-Synthesis/m-p/885956/highlight/false#M2438

Thank you

0 Kudos
Xilinx Employee
Xilinx Employee
828 Views
Registered: ‎01-12-2017

Re: opencl kernel

Jump to solution

Hi @allien

 

It means that in ideal scenario use (1,1,1) even though SDAccel supports basic nomenclature of OpenCL which is predominantly aligned with GPU architecture we would strongly recommend you to go with reqd_work_group_size(1,1,1) at any cost.

Please follow the same and you can also refer to our example repository that I shared in my previous answer. If you would like learn more about optimizing a real world application please look at the link below, this will provide more insights.

https://github.com/Xilinx/applications

 

Thanks

Kali  

Adventurer
Adventurer
809 Views
Registered: ‎02-28-2015

Re: opencl kernel

Jump to solution

Hi @kalib,

Thank You so much

0 Kudos