We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Concurrency SDAcell

Posts: 9
Registered: ‎03-24-2017

Concurrency SDAcell

Good day.
I’m testing SDAccel Example debug_printf_ocl.
In source of kernel I changed this line


on this:


For easier understanding.

Ok, I’m define global size equal {16, 1, 1}, local size {1, 1, 1} for NDRrange,  Active Build configuration is Emulation-HW and Compute Units = 4
Application Timeline result is:
Question 1: Why is workItem in Compute Units doesn’t run immediately after ending previous workItem?

I’m change local size {2, 1, 1}
Application Timeline screenshot:
I’m change local size {4, 1, 1}
Application Timeline screenshot:
In host code I'm tried CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, but anything don't happening

Qustion 2: It’s seems like sequential working.
Can you explain this situation.

Thank you.


Xilinx Employee
Posts: 102
Registered: ‎07-18-2014

Re: Concurrency SDAcell

Hi @givinariys,


For small dataset, concurrency cannot be visualized correctly. Can you try some kernel which do lot of computation for each call. 

You can refer below example as well:




Actually runtime needs to configure each compute unit before starting it.  nd this runtime configuration is sequential. So the gap which you are seeing between two compute units is actually a configuration delay.


In your first case, global(16,1,1) and local(1,1,1), run time configure four compute unit one after another sequentially.  Due to this you are seeing this delay. 

If you run some other kernel with large computation, you could see concurrency between compute units.



Posts: 9
Registered: ‎03-24-2017

Re: Concurrency SDAcell

Hi heeran.

Thank you for replay.

After add large loop in kernel I can see concurrency.