cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
184 Views
Registered: ‎09-09-2019

Can I pass FPGA platform as C++ object to a threadpool?

Jump to solution

Hi all,

One idea I got recently are: if I can map the FPGA computing resource (for exp, multiple CU of one kernel) into a threadpool so I can push in the computing task into the threadpool to assign to each available CU.

To do that, it looks one thing I need figure out is to pass the FPGA platform (context, queue, command etc..) as a parameter to the threadpool--that looks like a very dirty job--but before trying that, hold a moment-- can I simply sealed all of them as a C++ object and then pass the pointer of the object into threadpool?

Search around it does not likely someone had done that before (on GPU some one indeed did something on CUDA, but not for FPGA) -- so I come here to ask the question-- is this a bad idea or something worth to do? Please  anyone can give comments, suggestion, either negative or positive...

Thansk so much.

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Adventurer
Adventurer
112 Views
Registered: ‎03-01-2020

I don't think you need 5 threads to manage 5 CPUs. You allocate 5 tokens, then you enqueue a tasks to each CU in a non-blocking fashion and take away one token each time. Then you enter into a while loop with the number of remaining tasks as its exit condition and "poll" the event from each enqueue and add one token for every event that shows completion of the task. Then you check the number of remaining tokens and if there is any left, you enqueue another task and reduce the number of remaining tasks by one. Each token will have to correspond to a specific CU, here, though, so that you can keep track of which CU is free. I think the code would look something like this which should work with just one thread:

 

allocate_token(tokens);

for i in tokens
  enqueue(i.CU);
  remove_token(i);

while (remaining_tasks > 0)
  for i in tokens
    if (i.event == CL_SUCCESS);
      add_token(i);
  for i in tokens
    if (i.available == TRUE)
      enqueu(i.CU);
      remove_token(i);
      remaining_tasks--;
done

 

 

View solution in original post

0 Kudos
5 Replies
Highlighted
Adventurer
Adventurer
155 Views
Registered: ‎03-01-2020

Kernel enqueue on the host is non-blocking. You can create a token pool, with the same number of tokens as you have CUs, and keep track of the CUs that become free using the OpenCL event returned from the kernel enqueue. No need to go through the trouble of creating threads and thread pools and passing the context and everything to all the threads, etc.

0 Kudos
Highlighted
Adventurer
Adventurer
138 Views
Registered: ‎09-09-2019

@HRZ 

Thanks so much.. I come across this threadpool idea because below problem --I have multiple CU but the computing task for each CU consumes quite different timing, it is hard to manage all CU activities in a single thread (i.e.though the kernel enqueue is non-blocking, but I have to synchronise at the loop end for all CU completed the task, for exp, if I have 5 CUs, CU/1/2/3/4 may finish the task within 10ms but CU5 may take 100ms, in a single thread I have to synchronise all CU at end of one round enqueue so finally I get all CU available at end of 100ms, that means I waste 90% time for CU1~4.) that is the reason that I am thinking about the thread pool. 

Yes, you give me another insight -- Do you suggest that I can use the kernel enqueue with each CU and been tracked by OpenCL event call back function? But even by this, I may not need a thread pool but I do I still need to create 5 thread to manage the 5 CU activities-- any suggestion?

0 Kudos
Highlighted
Adventurer
Adventurer
113 Views
Registered: ‎03-01-2020

I don't think you need 5 threads to manage 5 CPUs. You allocate 5 tokens, then you enqueue a tasks to each CU in a non-blocking fashion and take away one token each time. Then you enter into a while loop with the number of remaining tasks as its exit condition and "poll" the event from each enqueue and add one token for every event that shows completion of the task. Then you check the number of remaining tokens and if there is any left, you enqueue another task and reduce the number of remaining tasks by one. Each token will have to correspond to a specific CU, here, though, so that you can keep track of which CU is free. I think the code would look something like this which should work with just one thread:

 

allocate_token(tokens);

for i in tokens
  enqueue(i.CU);
  remove_token(i);

while (remaining_tasks > 0)
  for i in tokens
    if (i.event == CL_SUCCESS);
      add_token(i);
  for i in tokens
    if (i.available == TRUE)
      enqueu(i.CU);
      remove_token(i);
      remaining_tasks--;
done

 

 

View solution in original post

0 Kudos
Highlighted
Adventurer
Adventurer
88 Views
Registered: ‎09-09-2019

@HRZ Really appreciate.I do not fully understand the logic your pseudo code yet...will take some time to study.

On top-level, your suggestion is using a "token system" to manage all CU states (available<->working), given each CU will be working under non-blocking mode. I did not use such "token mechanism" before -- is there any example project can be a reference? it looks like some kinds of a struct with cuid/working state will do the job? 

 

0 Kudos
Highlighted
Adventurer
Adventurer
41 Views
Registered: ‎03-01-2020

I do not have any examples but yes, I think a simple struct will probably do the job.

0 Kudos