UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor wangjiasheng
Visitor
1,226 Views
Registered: ‎11-18-2017

Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

I am using overlapping to optimize the host code following the method of https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/host/overlap_ocl/src/host.cpp and I am confused about the results presented in sdaccel_profile_summary.csv. Why are the average time in Kernel Execution different from that in Compute Unit Utilization? What do the two results mean? 

 

question.png

 

The actual run time of my application is similar to the time from Compute Unit Utilization. But I do not understand why  the time from Kernel Execution is different.

 

When I disable overlapping and perform read, write and compute totally sequentially, the two times are nearly the same. When I use overlapping, the time from Kernel Execution nearly cuts in half and that from Compute Unit Utilization decreases a little, but the actual elapsed time of my application is similar to the Compute Unit Utilization one.

 

Can anybody explain the meaning of the two times and why they are different? Thank you!

0 Kudos
1 Solution

Accepted Solutions
Moderator
Moderator
1,149 Views
Registered: ‎11-04-2010

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

The mentioned 2 time doesn't describe the same target.

#Kernel Execution time- The time for kernel functions scheduled and executed.
#Compute Unit Utilization time - The time for compute units on the FPGA.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

0 Kudos
9 Replies
Moderator
Moderator
1,150 Views
Registered: ‎11-04-2010

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

The mentioned 2 time doesn't describe the same target.

#Kernel Execution time- The time for kernel functions scheduled and executed.
#Compute Unit Utilization time - The time for compute units on the FPGA.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

0 Kudos
Visitor wangjiasheng
Visitor
1,142 Views
Registered: ‎11-18-2017

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

@hongh Thanks for your reply! Besides the kernel execution, what other actions may consume the time for the compute unit ? Since I am testing the overlapping in the host side, does the overlapping or the out of order command queue of OpenCL affect the time for the compute unit ? According to my result, the kernel execution time decreases a lot compared to the no overlapping one, but the time for the compute unit changes little, so it seems that the overall performance doesn't benefit much from the overlapping.

 

0 Kudos
Moderator
Moderator
1,131 Views
Registered: ‎11-04-2010

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

Hi, @wangjiasheng ,

I suggest you use the latest Sdx 2018.2. The tool evolves dramatically in these year.

I don't think kernel execution will affect the time for the compute unit.

You can check "Application Timeline" to understand the relationship between "kernel execution" and "Compute Unit".

The overlapping(or the other action in host) will not affect "Compute Unit" much.

When it takes quite long time for transferring data, overlapping will benefit Kernel execution time much.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
compute_unit.png
0 Kudos
Visitor wangjiasheng
Visitor
1,121 Views
Registered: ‎11-18-2017

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

Hi, @hongh!

You say that "I don't think kernel execution will affect the time for the compute unit.". Do you mean the time for the "compute unit" doesn't include the kernel execution time?

I am using reqd_work_group_size(1, 1, 1).

I log the time when my application starts and ends in my host code, finding out that the performance of my application is similar to the time for the "compute unit". 

According to the sdaccel_profile_summary.csv, the overlapping indeed benefits the kernel execution time much, but it seems that the time for the "compute unit" is the key for the performance. The performance of my application is not effected by the decrement of the kernel execution time.

0 Kudos
Moderator
Moderator
1,096 Views
Registered: ‎11-04-2010

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

The time used for Compute Unit depends on Compute Unit's performance.

Compute Unit's performance depends on the optimization level of Kernel's logic, instead of the strategy of host code.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Visitor wangjiasheng
Visitor
1,073 Views
Registered: ‎11-18-2017

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

But why can the strategy of host code affect the kernel execution time ? Doesn't the kernel execution time also depend on the optimization level of Kernel's logic ?

0 Kudos
Moderator
Moderator
1,067 Views
Registered: ‎11-04-2010

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

The kernel execution time begins with launching the first kernel and ends up with completing the last kernel, including the time for waiting data transfer between host and device.

The good strategy of host code such as overlapping can help to reduce the total kernel time by tranferring data during the execution of the last kernel.

 

Of course, total kernel execution time also depends on the optimization level of Kernel's logic ( CU performance)

 

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Visitor wangjiasheng
Visitor
1,048 Views
Registered: ‎11-18-2017

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

Hi, @hongh !

Now I understand how the overlapping can affect the kernel exection time. But why is the kernel execution time less than the time for the compute unit? I am still confused about the relation between the kernel execution time and the time for the compute unit.

0 Kudos
Moderator
Moderator
1,019 Views
Registered: ‎11-04-2010

Re: Different run times in sdaccel_profile_summary.csv when using overlapping

Jump to solution

It also wired for me to see kernel execution average time is less than CU's average time. I suspect it's old version Sdx's bug.

You can try it in the Sdx 2018.2.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos