03-04-2021 06:16 PM - edited 03-04-2021 09:39 PM
Here I have a project which I designed to implement multiple compute units from one same C kernel (sha256 of vitis library) design.
When I run the test under HW emulation, I can see the tasks enqueued to multiple CU aligned with the expectation--see below chart:
When running on HW, I get the below chart by using vitis_anaylzer, my questions are:
1. I can not find the meaning of "row0" "row1" ect..--what is that defined?
2. Most of the Kernel Enqueues are happen on row0 and only a few on row1--actually if with enough task numbers I would see row 2/3 as well but with much less occur. Is that the correct behavior?
3. The kernel execution efficiency looks not very high (large timing interval between each kernel execution) -- Is that due to the above#2 kernelEnqueues on the same row? what can be done to improve this?
Thanks a lot for helping!
04-07-2021 09:31 PM
In the Vitis analyzer, interpreting guidance data is a key part. This guidance view places each entry in a separate row. Each row might contain the name of the guidance rule, threshold value, actual value, and a brief but specific description of the rule.
In the case of kernel execution, the number of rows depends on the number of overlapping kernel executions. Overlapping of the kernels should not be mistaken for actual parallel execution on the device as the process might not be ready to execute right away.