cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Zhuofan
Visitor
Visitor
177 Views
Registered: ‎07-13-2021

Question about the CU execution time

Jump to solution

Hi,

I ran hardware execution of several Vitis Acceleration Examples. However, the CU execution time is much longer than I expected.

Below is the application timeline of the hardware execution of kernel matmul (https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/cpp_kernels/array_partition). The kernel has a very simple structure, i.e., readA, readB, calculate matmul, writeC. So I think the CU execution should end after final write (t1). However, it won't end untill t2. And (t1-t0) just matches the overall kernel latency simulated by HLS. I am wondering what is the CU doing between t1 and t2? Is there any overhead during CU execution?

application timeline.PNG

Other kernels like simple add, L2 fft also have the similar timelines.  Please help me explain the CU timeline. Thanks.

My hardware settings are as follow,

setting.PNG

0 Kudos
1 Solution

Accepted Solutions
heeran
Xilinx Employee
Xilinx Employee
78 Views
Registered: ‎07-18-2014

Hi @Zhuofan ,

We have discussed this issue with tool team. The delay you’re seeing is expected on hardware and is due to the nature of how the whole system identifies when CUs end.  On the device, we’re snooping on the AXI-lite interface and we can only mark the end of a CU when we see a read of the done bit.  In hardware, this is done either by regular polling or by responding to an interrupt.  In either case, there will be a slight delay that you are noticing in the waveform from the actual completion of the internal HLS CU and when the rest of the system knows the CU is finished. 

However if you run hardware emulation, you not not notice this extra delay as tool have access to all of the internal signals and can see finer grained information, and the delays based on polling or responding to interrupts can be hidden in the emulation model.

-Heera 

View solution in original post

1 Reply
heeran
Xilinx Employee
Xilinx Employee
79 Views
Registered: ‎07-18-2014

Hi @Zhuofan ,

We have discussed this issue with tool team. The delay you’re seeing is expected on hardware and is due to the nature of how the whole system identifies when CUs end.  On the device, we’re snooping on the AXI-lite interface and we can only mark the end of a CU when we see a read of the done bit.  In hardware, this is done either by regular polling or by responding to an interrupt.  In either case, there will be a slight delay that you are noticing in the waveform from the actual completion of the internal HLS CU and when the rest of the system knows the CU is finished. 

However if you run hardware emulation, you not not notice this extra delay as tool have access to all of the internal signals and can see finer grained information, and the delays based on polling or responding to interrupts can be hidden in the emulation model.

-Heera 

View solution in original post