04-27-2021 07:05 AM - edited 04-27-2021 07:07 AM
I wan to build more than 3 DPUs for ZCU102. If I take 512 DPU, I can implement up to 8 DPUs on ZCU102. I built the vitis project with 8 512 DPUs. When I check on board using explorer, I see only 3 DPUs. Can anyone tell me what are changes I need to make to implement 8 DPUs of 512?
04-28-2021 04:27 PM
I do not recommend using more than 3 DPUs on the ZCU102 regardless of the DPU size. You are going to run into memory BW utilization issues if they all run simultaneously, and will probably not see a performance increase vs, using 3 DPUs.
Is there a reason you wanted to use more than 3? You can target multiple models to the same DPU.
05-06-2021 12:54 PM
But this may not be true. Because we have seen that different CNNs have different bandwidth requirements. Also, as we go to smaller DPU sizes, we see that bandwidth requirement in smaller DPU sizes is less as compared to larger DPU sizes. So, based on this, if we increase the DPU number for smaller DPU sizes like 512, the more number of DPUs should be able to see some performance bandwidth.
I am attaching the screenshot as well. The graph shows the bandwidth requirement for different CNNs across different DPU sizes. This data is taken from the profile tool of DPU.
I am also referring to our paper where we show the effect of interference on the runtime. We are able to show that for smaller DPU sizes, we can see further performance improvement if we increase the number of DPUs for smaller DPU size. Through our estimation tool, we can predict the runtime for increased number of DPUs.
Shikha Goel, Rajesh Kedia, Rijurekha Sen and M. Balakrishnan. "INFER: INterFerence-aware Estimation of Runtime for Concurrent CNN Execution on DPUs". In International Conference on Field Programmable Technology (FPT), 2020.
05-16-2021 12:25 PM
Can any help be provided on the same? Is there anyone who could implement more than 3 DPUs and help me through the workflow. I need this urgently for my work for deep analysis.