I have been using Vitis Analyzer and Vivido simulator to check the timing data, notice some differences. check the below two figures.
1. the 1st one is captured by Vividao under HW Emulation at runtime . from chart I can say the single kernel execution time is 41.8us.
2. the 2nd one is using Vitis Analyzer to anaylze the "timeline_trace.csv" generated. from chart I can say the single kernel execution time is 53us.
Q1: where the difference come from? 41.8us vs 53us difference is about 20%， which is huge..
Q2: Is the timing data I captured under HW Emulation mode is enough precise to real case (HW mode)? So far I can not get data for HW mode.
Q3: the chart2 also contains the Host execution timing (which is a little longer than kernel execution, say 53.8us). I also capture CPU timing Host code, by that way I get typically one round time (including the writebuffer+enqueueTask+readbuffer) will be likely about ms level, which is much bigger than above data.. so which one shall I trust?