UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
310 Views
Registered: ‎07-08-2019

DPU task scheduling or pin cores?

Jump to solution

Hi,

I have a question regarding scheduling of two tasks on two dpu cores of ZCU104 board.

For example, I have task A and B takes 60ms respectively to inference. 

Right now I found that if I first set input node images to the tasks and then launch two threads to the rundputask, the two tasks go to core0 and core1 no problem.

However if I move the set input node images into the threads and maybe because the time difference of preprocessing (resize?), the two tasks tend to run only on one core which causes the latter task a significant delay.

My question is that if it is possible to pin the task to different dpu cores or if there is any other trick I could try to avoid the scheduling latency.

Thanks

0 Kudos
1 Solution

Accepted Solutions
149 Views
Registered: ‎07-08-2019

回复: DPU task scheduling or pin cores?

Jump to solution

Hi Jason,

You're right, I added barrier on the preprocessings and can see the tasks scheduled on two cores at the same time.

Thanks

View solution in original post

0 Kudos
5 Replies
Xilinx Employee
Xilinx Employee
238 Views
Registered: ‎03-27-2013

回复: DPU task scheduling or pin cores?

Jump to solution

Hi jiansheng@baidu.com ,

 

I agree with you that you should avoid combining pre-process and DPU run task in ths same thread.

I would suggest you to refer to our DNNDK examples. e.g. in this facedetect example: https://github.com/Xilinx/Edge-AI-Platform-Tutorials/blob/3.1/docs/DPU-Integration/reference-files/files/face_detection/face_detection.cc

It seperate the whole flow into 3 steps:

1. Reader thread : Read images from camera and put it to the input queue

2. Worker thread : Each worker thread repeats the following 3 steps util no images:
(1) get an image from input queue;
(2) process it using DenseBox model;
(3) put the processed image to the display queue.

3. Display thread : Get output image from queueShow and display it

And connect them with queues.

It would be more efficient and would take lower latency comparing with mix all the things together.

Hope this can help.

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
210 Views
Registered: ‎07-08-2019

回复: DPU task scheduling or pin cores?

Jump to solution

Hi Jason,

Thanks for your reply. I just checked the face_detection code you refer to. One difficulty is that our application is not really throughput oriented as your example, we want more deterministic behavior.

As the picture shows below, assuming a camera provides image every 100ms, we want high priority tasks (0 and 1) to run on two cores ASAP(using the same image) and followed by some lower priority light models.

And just to clarify, in previous posts by preprocessing I simply mean resize the image to fit the model's input size, the models' input sizes are different. dpu_schedule.JPG

Any further suggestions to tackle our application?

Thanks,

Jian

0 Kudos
Xilinx Employee
Xilinx Employee
187 Views
Registered: ‎03-27-2013

回复: DPU task scheduling or pin cores?

Jump to solution

Hi jiansheng@baidu.com ,

Yes. I agree with you that you may need to do more modification here to suit your design.

I think you can do some code profiling to check which part of the code would take heavy CPU loading(time). And I would suggest you NOT to put these heaving CPU loading code and DPU task into same thread so that CPU/DPU can work parallelly.

In my opinion if core0 is working on caculation and you successfully send a new DPU task here it should come to core1.

 

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
150 Views
Registered: ‎07-08-2019

回复: DPU task scheduling or pin cores?

Jump to solution

Hi Jason,

You're right, I added barrier on the preprocessings and can see the tasks scheduled on two cores at the same time.

Thanks

View solution in original post

0 Kudos
Xilinx Employee
Xilinx Employee
129 Views
Registered: ‎03-27-2013

回复: DPU task scheduling or pin cores?

Jump to solution

Hi jiansheng@baidu.com ,

 

Good to know that and thanks for sharing your test experience. :-)

 

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos