cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
809 Views
Registered: ‎01-19-2018

Host Multiple Thread Kernel Enqueue

Jump to solution

Hi,

The below code fragment is how we enqueue task to kernel to perform simple calculation on the mapped buffer. If this code fragment runs in a normal function with single thread, it works fine. However, if we put the same code in a thread function and create multiple threads of this function to run in parallel, we see data corruptions.

Buffer buf (ctx, CL_MEM_READ_WRITE, bufsize);
K.setArg(0, buf);
int * ary = (int*)Q.enqueueMapBuffer(buf, CL_TRUE, CL_MAP_WRITE, 0, bufsize);
// populate ary
Q.enqueueUnmapMemObject(buf, ary);
Q.enqueueTask(K);
ary = (int*)Q.enqueueMapBuffer(buf, CL_TRUE, CL_MAP_READ, 0, bufsize);
// consume ary
Q.enqueueUnmapMemObject(buf, ary);

Is this code fragment thread safe? What is the proper way to enqueue tasks in parallel from a multi-threaded program?

Thanks,

kwan

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Observer
Observer
818 Views
Registered: ‎01-19-2018

Share kernel object K among threads can potentially be an issue. Best to use mutex for Q.enqueueTask(K);

View solution in original post

0 Kudos
2 Replies
Highlighted
Xilinx Employee
Xilinx Employee
785 Views
Registered: ‎01-12-2017

Hi @kwan.huen ,

 

Assuming that you want to launch enqueueTask on same kernel using multiple threads.

In your makefile have a macro which can be used as control for number threads/CUs being launched.

Your required thread count should be the compute unit count. It means your kernel code and Makefile must be configured accordingly.

 

Please have a look at this example,

https://github.com/Xilinx/Applications/blob/master/data_compression/xil_lz4/Makefile

https://github.com/Xilinx/Applications/blob/master/data_compression/xil_lz4/src/xil_lz4_compress_kernel.cpp

 

Steps:

1. Command queue must be created with out of order execution

https://github.com/Xilinx/Applications/blob/master/data_compression/xil_lz4/src/xil_lz4.cpp [Line:226]

2. Your thread count equals compute unit count, so instantiate those many kernels and synchronize using kernel/read/write events

https://github.com/Xilinx/Applications/blob/master/data_compression/xil_lz4/src/xil_lz4.cpp [Line:243]

3. While launching multiple threads try to index to each of these kernels as per your thread index in host.

 

I hope this helps.

 

Thanks

Kali

 

Highlighted
Observer
Observer
819 Views
Registered: ‎01-19-2018

Share kernel object K among threads can potentially be an issue. Best to use mutex for Q.enqueueTask(K);

View solution in original post

0 Kudos