cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Visitor
Visitor
297 Views
Registered: ‎03-24-2020

How does DNNC calculate total workload of convolutional neural network?

Hello

I'm currently writing thesis about running Yolov3 on Zedboard. I got it working, but now I'm benchmarking and measuring its performance. I need to create roofline model to show performance and efficiency of solution for analysis. I got the default Yolov3 application to work  by using this guide: https://github.com/Xilinx/Edge-AI-Platform-Tutorials/tree/3.1/docs/Darknet-Caffe-Conversion.
I also used the DPU profiler to get benchmarking results when running  Yolov3 application on Zedboard:

Total Nodes In Avg:

workload: 140691.89 GOP 
memory: 267.38
    1504.94      
performance
93.5 MOP/sec      workload:  90.2%  memory: 177.7


I need to confirm these claims profiler pointed out.
How does DNNC and profiler calculate total workload for given workload ? I have deploy.protxt about caffeemodel of Yolov3, If guided I could calculate by hand total calculations needed, but is there a better and more reliable way to it ?
Also how does profiler calculate performance and workload ? Does it have its own information about what is this DPU's maximum theoretical performance and calculating workload based on that ? If now, How can I calculate theoretical maximum performance of DPU (RUnning on Avnet Zedboard, DPU arch B1152, Clock speed 90MHz, cores 1) to compare results? Is this performance linearly based on clock cycle (Theoretical max speed of DPU with arch B1152 running on Z7020 with 200MHz is 230GOP/sec) so at 90MHz it is (200/90) * 230GOp/sec =  103.5GOP/sec or is it more complicated?

Tags (3)
0 Kudos
Reply
1 Reply
Visitor
Visitor
222 Views
Registered: ‎03-24-2020

Update:

I just run same system with DPU clocked down to 50MHz, performance output was:
57.5GOP/sec with 90.2% efficiency.

By calculating from previous result with 90MHz and assuming  at 200MHz 230Gop/sec is maximum performance, it makes sense and works.

I also tried same profiling with Resnet50, giving:
76.6GOp/sec with 73.9% efficiency, assuming mac performance is 103.5GOP/sec (then calculating would come 74%) it also makes sense.

So Assuming DNNC CAN calculate workload properly I can trust efficiency values, but still.

How can I manually calculate workload for these neural networks to double-check?

0 Kudos
Reply