cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Newbie
Newbie
421 Views
Registered: ‎10-14-2020

Host Optimization - Overlapping Computation and data transfer from the Host (SDAccel Example: overlap_c)

Hi,

I am trying to run the Overlap Host example(https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/host/overlap_c) on the AWS(VU9P) platform using SDx version 2019.1. The example description claims that the Host Transfer and Kernel execution overlap, but when I tried executing the example, the Kernel execution and Host Data transfer happen sequentially. Is this the expected behavior?
I have attached two png files, the first file represents the result I got after executing the overlap_c code, while the second image is how the expected result should be.


Screenshot at 2020-10-15 02-20-51.png
overlap.PNG
0 Kudos
7 Replies
Highlighted
Visitor
Visitor
305 Views
Registered: ‎02-23-2020

I also ran this example and see similar behavior. Any thoughts from Xilinx developers?

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
273 Views
Registered: ‎06-04-2018

Hi @anujp10 ,

Thanks for reporting on this overlap design. Yes, this is indeed doing sequential execution. We are currently debugging on this issue. Will respond once the issue is resolved.

Thanks,

Vishnu

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
267 Views
Registered: ‎06-04-2018

Hi,

The issue had be resolved. Can you remove the waiting of read_events from line306 and run again. Remove the following line : 

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/host/overlap_c/src/host.cpp#L306

After that you should be seeing the overlap execution of host data transfer and Computation. Attaching the screenshot for reference.

@anujp10 Looks like you are using the older github repository(SDAccel_Examples). Please use the latest Vitis Accel Examples github respority which had more latest features and updates.

https://github.com/Xilinx/Vitis_Accel_Examples

Thanks,

Vishnu

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

 

overlap_fix.PNG
Highlighted
Newbie
Newbie
241 Views
Registered: ‎10-14-2020

Thanks for your reply, the changes that you suggested will execute two kernels concurrently and thus overlaps the data transfer and computation. Can't it be possible to execute one kernel and then overlap the computation and data transfer?

For instance, in the following host code (https://github.com/Xilinx/Vitis_Accel_Examples/blob/master/host/overlap/src/host.cpp) at line 54, suggests that only 1 kernel is executing. Can you please confirm this?

Regards,
Anuj

Capture.PNG
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
215 Views
Registered: ‎06-04-2018

Hi @anujp10 ,

If you look at line57 of the host code (https://github.com/Xilinx/Vitis_Accel_Examples/blob/master/host/overlap/src/host.cpp)  we are using different compute unit(compute1, compute2, ..) for each iteration.

Thanks,

Vishnu

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

0 Kudos
Highlighted
Visitor
Visitor
200 Views
Registered: ‎02-23-2020

Hi @bchebrol ,

 

So, the question is that is there any way to use one compute unit and overlap its computation with data transferring operation from the host?

 

Best Regards,

Arash

0 Kudos
Highlighted
Adventurer
Adventurer
140 Views
Registered: ‎03-01-2020

@arash1902 You can do that using double-buffering. In this case, you have one compute kernel, but two input buffers in the FPGA external memory. On the host side, you first fill buffer_1, then enqueue the kernel with buffer_1 as input and while the kernel is running, you fill buffer_2, then wait for kernel enqueue to finish and enqueue it again with buffer_2 as input and start filling buffer_1 again and so on.

There is another option which allows you to stream data directly from host to the FPGA via PCI-E. You can check the following article for more info:

https://forums.xilinx.com/t5/Adaptable-Advantage-Blog/Improve-Your-Data-Center-Performance-With-The-New-QDMA-Shell/ba-p/990371

0 Kudos