cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
715 Views
Registered: ‎03-23-2019

DeePhi Runtime System Internal Error

I encountered an internal error when running tests using DNNDK and the convolutional neural network DPU. It only appears after issuing a fairly large number of tasks to the DPU, but it consistently happens at the same point of execution. Does this mean that I am using the API wrong, that my hardware setup has problems, or is this a legitimate bug in the framework? If this is a bug, what is a good workaround for it?

I'm using a ZCU100 Revision C (a.k.a. Ultra96) board, but I'm not using the evaluation image provided by DeePhi. Instead, I built a PetaLinux boot image and included the DPU IP from the TRD provided by Xilinx. Dexplorer reports the following:

$ dexplorer -v 
DNNDK version  2.08 beta
Copyright @ 2016-2018 DeePhi Inc. All Rights Reserved.                          
                                                                                
DExplorer version 1.5                                                           
Build Label: Dec 11 2018 21:15:32                                               
                                                                                
DSight version 1.4                                                              
Build Label: Dec 11 2018 21:15:33                                               
                                                                                
N2Cube Core library version 2.2                                                 
Build Label: Dec 11 2018 21:15:54                                               
                                                                                
DPU Driver version 2.1.0                                                        
Build Label: Apr 17 2019 22:00:24

$ dexplorer -w
[DPU IP Spec]                                                             
IP  Timestamp   : 2019-04-18 16:30:00                                     
DPU Core Count  : 1                                                       
IRQ Base 0      : 121                                                     
IRQ Base 1      : 136                                                     
                                                                          
[DPU Core List]                                                           
DPU Core        : #0                                                      
DPU Enabled     : Yes                                                     
DPU Arch        : B2304F                                                  
DPU Target      : v1.3.0                                                  
DPU Freqency    : 333 MHz                                                 
DPU IRQ         : 138                                                     
DPU Features    : Leaky ReLU

A snippet of the code I ran (it fails on iteration 509):

for (int i = 0; i < 1200; i++) {
    cout << i << endl;
    DPUTask *task = dpuCreateTask(kernel, 0);
    Mat img = imread("demo_img.jpg", CV_LOAD_IMAGE_COLOR);
    Mat resizedImg = resizeKeepAspectRatio(
        img, Size(YOLO_SIZE, YOLO_SIZE), Scalar(127, 127, 127));
    runYolov3Tiny(task, resizedImg, cout);
    dpuDestroyTask(task);
}
exit(0);

The error message is:

DeePhi DPU Runtime system internal error.
Please contact dnndk-support@deephi.tech with the following info:
        Debug info - Cond:"fd", File:/root/DNNDK_All_V021_Package_ZCU104/n2cube/
src/dpu.cpp, Function:dpuRuntimeMode, Line:344

 

 

0 Kudos
Reply
3 Replies
jheaton
Xilinx Employee
Xilinx Employee
675 Views
Registered: ‎03-21-2008

I think you may be running into a mismatch between the DNNC version and the DPU version. All the Ultra96 examples are based off version 1.3.7 of the DPU and use version 1.3.7 of the DNNC compiler. The TRD you are using uses version 1.3.0.

The DNNK Ultra96 examples are not going  to work with the TRD DPU.

If you are compiling your own neural network using DNNC make sure you are using 1.3.0. If you ran ./install Ultra96 in the host_x86 folder it enabled version 1.3.7.

To enable version 1.3.0 of DNNC, in the host_x86 folder run:  ./install.sh ZCU102

 

 

0 Kudos
Reply
669 Views
Registered: ‎03-23-2019

It actually will explicitly error on initialization for the mismatch between 1.3.0 and 1.3.7. I've used dpu-dnnc-1.3.0 to recompile my model on top of the (previously generated) decent output, so my elf should be compatible with 1.3.0.
0 Kudos
Reply
636 Views
Registered: ‎03-23-2019

I've run my code on the official boot image, and still get this internal error. This means it's probably not related to my hardware setup; either I'm using the DPU wrong and causing some kind of undefined behavior (not sure what's wrong though, especially since it worked for 500+ iterations), or there's a bug in DeePhi's drivers / libraries.

0 Kudos
Reply