10-10-2019 12:11 AM
I am using dpu integration tutorial to run vgg16 model on zcu102. I built the hardware and petalinux using tutorial. I am using dpu2.0 version, 3 core design and low dsp configuration.
The profile graph is shown as below in figure 1.png
But when I use already provided image for zcu102 (xilinx-zcu102-prod-dpu1.4-2018.3-desktop-buster-2019-04-24.img.zip ) to run the same model, the profile graph shown is different as shown in figure 2.png
The question is that why there is difference in the performance. I expected same result in both the cases
10-18-2019 03:27 AM
Can anyone provide any help on this. I think there is some problem with the driver code. In first image, vgg16 task is only scheduled once on core 3, but in second image, the task is scheduled equally on 3 cores. Why is this happening.
I am building hardware file and petalinux from scratch using dpu integration tutorial in 1st image.
Whereas in 2nd image, I used already provided image for zcu102
Any help on this?
10-18-2019 09:23 AM
do you see the same problem on latest DPU and DNNDK release?
I have just tried Resnet50 first on the TRD design generated image zcu102-dpu-trd-2019-1-190809.zip and then on petalinux-user-image-zcu102-zynqmp-sd-20190802.img.gz : the behaviour is very similar.
10-18-2019 09:48 PM
The problem to try the latest trd is that we dont have dpu driver code available for it. As it is written in dpu integration tutorial that it is for dpu version 2.0
Where can i get the dpu driver code for latest trd?
11-03-2019 03:12 AM
I checked the behaviour with dpu version 3.0 as well. It is similar to what I observed in version 2.0
11-13-2019 01:12 AM
Can anyone provide help on this?
I am still stuck at this performance problem.