04-21-2020 02:12 PM
Hello
I'm currently writing thesis about running Yolov3 on Zedboard. I got it working, but now I'm benchmarking and measuring its performance. I need to create roofline model to show performance and efficiency of solution for analysis. I got the default Yolov3 application to work by using this guide: https://github.com/Xilinx/Edge-AI-Platform-Tutorials/tree/3.1/docs/Darknet-Caffe-Conversion.
I also used the DPU profiler to get benchmarking results when running Yolov3 application on Zedboard:
Total Nodes In Avg:
workload: 140691.89 GOP
memory: 267.38 1504.94
performance 93.5 MOP/sec workload: 90.2% memory: 177.7
I need to confirm these claims profiler pointed out.
How does DNNC and profiler calculate total workload for given workload ? I have deploy.protxt about caffeemodel of Yolov3, If guided I could calculate by hand total calculations needed, but is there a better and more reliable way to it ?
Also how does profiler calculate performance and workload ? Does it have its own information about what is this DPU's maximum theoretical performance and calculating workload based on that ? If now, How can I calculate theoretical maximum performance of DPU (RUnning on Avnet Zedboard, DPU arch B1152, Clock speed 90MHz, cores 1) to compare results? Is this performance linearly based on clock cycle (Theoretical max speed of DPU with arch B1152 running on Z7020 with 200MHz is 230GOP/sec) so at 90MHz it is (200/90) * 230GOp/sec = 103.5GOP/sec or is it more complicated?
04-23-2020 07:24 AM - edited 04-23-2020 07:25 AM
Update:
I just run same system with DPU clocked down to 50MHz, performance output was:
57.5GOP/sec with 90.2% efficiency.
By calculating from previous result with 90MHz and assuming at 200MHz 230Gop/sec is maximum performance, it makes sense and works.
I also tried same profiling with Resnet50, giving:
76.6GOp/sec with 73.9% efficiency, assuming mac performance is 103.5GOP/sec (then calculating would come 74%) it also makes sense.
So Assuming DNNC CAN calculate workload properly I can trust efficiency values, but still.
How can I manually calculate workload for these neural networks to double-check?