04-18-2019 06:09 PM - edited 04-18-2019 06:12 PM
I encountered an internal error when running tests using DNNDK and the convolutional neural network DPU. It only appears after issuing a fairly large number of tasks to the DPU, but it consistently happens at the same point of execution. Does this mean that I am using the API wrong, that my hardware setup has problems, or is this a legitimate bug in the framework? If this is a bug, what is a good workaround for it?
I'm using a ZCU100 Revision C (a.k.a. Ultra96) board, but I'm not using the evaluation image provided by DeePhi. Instead, I built a PetaLinux boot image and included the DPU IP from the TRD provided by Xilinx. Dexplorer reports the following:
$ dexplorer -v DNNDK version 2.08 beta Copyright @ 2016-2018 DeePhi Inc. All Rights Reserved. DExplorer version 1.5 Build Label: Dec 11 2018 21:15:32 DSight version 1.4 Build Label: Dec 11 2018 21:15:33 N2Cube Core library version 2.2 Build Label: Dec 11 2018 21:15:54 DPU Driver version 2.1.0 Build Label: Apr 17 2019 22:00:24 $ dexplorer -w [DPU IP Spec] IP Timestamp : 2019-04-18 16:30:00 DPU Core Count : 1 IRQ Base 0 : 121 IRQ Base 1 : 136 [DPU Core List] DPU Core : #0 DPU Enabled : Yes DPU Arch : B2304F DPU Target : v1.3.0 DPU Freqency : 333 MHz DPU IRQ : 138 DPU Features : Leaky ReLU
A snippet of the code I ran (it fails on iteration 509):
for (int i = 0; i < 1200; i++) { cout << i << endl; DPUTask *task = dpuCreateTask(kernel, 0); Mat img = imread("demo_img.jpg", CV_LOAD_IMAGE_COLOR); Mat resizedImg = resizeKeepAspectRatio( img, Size(YOLO_SIZE, YOLO_SIZE), Scalar(127, 127, 127)); runYolov3Tiny(task, resizedImg, cout); dpuDestroyTask(task); } exit(0);
The error message is:
DeePhi DPU Runtime system internal error. Please contact dnndk-support@deephi.tech with the following info: Debug info - Cond:"fd", File:/root/DNNDK_All_V021_Package_ZCU104/n2cube/ src/dpu.cpp, Function:dpuRuntimeMode, Line:344
04-19-2019 10:13 AM
I think you may be running into a mismatch between the DNNC version and the DPU version. All the Ultra96 examples are based off version 1.3.7 of the DPU and use version 1.3.7 of the DNNC compiler. The TRD you are using uses version 1.3.0.
The DNNK Ultra96 examples are not going to work with the TRD DPU.
If you are compiling your own neural network using DNNC make sure you are using 1.3.0. If you ran ./install Ultra96 in the host_x86 folder it enabled version 1.3.7.
To enable version 1.3.0 of DNNC, in the host_x86 folder run: ./install.sh ZCU102
04-19-2019 10:44 AM
04-21-2019 04:45 PM
I've run my code on the official boot image, and still get this internal error. This means it's probably not related to my hardware setup; either I'm using the DPU wrong and causing some kind of undefined behavior (not sure what's wrong though, especially since it worked for 500+ iterations), or there's a bug in DeePhi's drivers / libraries.