cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
iif979631337
Observer
Observer
1,354 Views
Registered: ‎07-16-2018

vai_c_caffe fatal error in Vitis-AI-Tutorials

I am trying caffe Segmentation in Vitis-AI-Tutorials(ug1394).

The caffe model attached to the tutorial can be quantize and comiple without any problems. However, in the caffemodel learned with the dataset prepared independently, the following log was obtained.

I don't know the cause so I would like somone to tell me.

###########################################################

# Last part of vai_q_caffe Log ("workspace/Segment/VAI/FPN/quantize/quantize.txt")

###########################################################

I0802 16:41:20.219183  4855 vai_q.cpp:360] Start Deploy
W0802 16:41:20.283365  4855 convert_proto.cpp:1355] [DEPLOY WARNING] Layer inception_4c/3x3_reduce's output blob is all zero, this may cause error for DNNC compiler. Please check the float model.
W0802 16:41:20.283382  4855 convert_proto.cpp:1355] [DEPLOY WARNING] Layer inception_4e/3x3_reduce's output blob is all zero, this may cause error for DNNC compiler. Please check the float model.
W0802 16:41:20.283385  4855 convert_proto.cpp:1355] [DEPLOY WARNING] Layer inception_5a/3x3_reduce's output blob is all zero, this may cause error for DNNC compiler. Please check the float model.
I0802 16:41:20.287456  4855 vai_q.cpp:368] Deploy Done!
--------------------------------------------------
Output Quantized Train&Test Model:   "quantize/quantize_train_test.prototxt"
Output Quantized Train&Test Weights: "quantize/quantize_train_test.caffemodel"
Output Deploy Weights: "quantize/deploy.caffemodel"
Output Deploy Model:   "quantize/deploy.prototxt"

 

###########################################################

# vai_c_caffe Log ("workspace/Segment/VAI/FPN/compile/compile.txt")

###########################################################

[VAI_C][Warning] layer [score] (type: Softmax) is not supported in DPU, deploy it in CPU instead.
[VAI_C-BACKEND][FATAL][/home/xbuild/conda-bld/dnnc_1592904456005/work/submodules/asicv2com/include/Dpu/DpuOp.imp:4472][VALUE_UNMATCH][The value is not supposed!] 9: 1-10-15 Field is too long!
*** Check failure stack trace: ***
**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************

Vitis-AI version : 1.2
segmentation model : FPN

My modification in caffe train at "/workspace/Segment/workspace/model/FPN"

  solver.prototxt : original

  train_val.prototxt : original (modified only image data source path)

My modifiaction in vai_c_quantize at "/workspace/Segment/VAI/FPN"

   float.prototxt : original (modified only calibration file path)

   float.caffemodel : my trained model with my dataset. 

With my trainded caffemodel, the inference tests were fine.

The dataset used for training uses only two labels(backgournd, person).

0 Kudos
10 Replies
jasonwu
Moderator
Moderator
1,261 Views
Registered: ‎03-27-2013

Hi @iif979631337 ,

 

Have you tried to evaluate the quantized model on GPU first?

If it still can't work it would be more like a issue caused by quantization or even the design input.

 

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos
iif979631337
Observer
Observer
1,217 Views
Registered: ‎07-16-2018

I'm sorry for the late reply.

The inference performed on the quantized model is much less accurate than the unquantized model. Some images may not be detected at all.

The prototxt file used for the input of the quantization uses the one of the tutorial(modification is only calibration path), and the caffemodel file is the one I learned by myself.

0 Kudos
iif979631337
Observer
Observer
1,166 Views
Registered: ‎07-16-2018

Hi, @jasonwu

 Since I want to solve it at an early stage, is it possible to send the quantization input file to you for analysis?
If possible, could you tell me how to send it?

0 Kudos
jasonwu
Moderator
Moderator
1,132 Views
Registered: ‎03-27-2013

Hi @iif979631337,

 

What is your DPU Architecture? e.g. B4096/B1052

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos
iif979631337
Observer
Observer
1,101 Views
Registered: ‎07-16-2018

Hi, Jason.

DPU archtecture is probably B4096.
I use tutorials's script(quantize_and_compile.sh).

vai_c_caffe's option is below.(it is not modified)
 -arch=/opt/vitis_ai/compiler/arch/DPUCZDX8G/ZCU102/arch.json

I have only changed the parts related to training data and classes from caffe segmentation tutorial.

0 Kudos
jasonwu
Moderator
Moderator
993 Views
Registered: ‎03-27-2013

Hi @iif979631337 ,

 

Thanks for your update. I am afraid that I may miss the reminding email on your new post.

I did some search on the error message it is more like issue associated with kernel size limitation on DPU.

https://forums.xilinx.com/t5/AI-and-Vitis-AI/VALUE-UNMATCH-The-value-is-not-supposed-0-2-24-31-Field-is-too/td-p/1143090

Since you already use the largest arch I would suggest you to check if there is any violation on kernel size limitaion mentioned on "Table 1. Deep Neural Network Features and Parameters Supported by the DPU" of

https://www.xilinx.com/html_docs/vitis_ai/1_2/dau1565107081597.html

And 2 more constraints here:

input channel <= 256* channel_parallel

input channel * kernel_w * kernel_h <= 2048 * channel_parallel

And you can find the channel_parallel for Arch B4096 here:

https://www.xilinx.com/html_docs/vitis_ai/1_2/zgu1565107078839.html

parallelism.PNG

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos
iif979631337
Observer
Observer
962 Views
Registered: ‎07-16-2018

Hi, @jasonwu

Thanks for your reply.

I'm using the tutorial network unchanged, so it's unlikely that I'll violate the constraints.
Only the training data is changed in the tutorial environment.
I feel that a compilation error is occurring depending on the training data.


Is there anything else I can check other than the DPU constraints?
If you can't find the item to check, I'll send you a file, so can you check it?

0 Kudos
jasonwu
Moderator
Moderator
950 Views
Registered: ‎03-27-2013

Hi @iif979631337 ,

 

Thanks for your update. It sound strange to me.

If you are using same model the only difference I can think about here is the model weights.

Sure, I would need the detailed flow from training to deployment and necessary files (include your dataset, all the script codes) to reproduce this issue.

And since there are more information you provided I would like to know:

1. Have you tried with the datset mentioned in UG1394? Did you meet the same issue when using that dataset?

2. If not did you do any modification on the files provided by github repo to meet your own dataset?

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos
iif979631337
Observer
Observer
904 Views
Registered: ‎07-16-2018

Hi, @jasonwu .

Thank you for reply.
Yes, I also wonder .
1. Using the cityscape dataset as described in UG1394, training, quantization and compilation can be done successfully without any errors.
2.In the cityscape environment of UG1394, the file path for my dataset is changed.
How do I share a file?
Datasets cannot be posted to this forum.

0 Kudos
jasonwu
Moderator
Moderator
900 Views
Registered: ‎03-27-2013

Hi @iif979631337 ,

 

Would you please send me your email?

I would try to send you a request so that you can transfer big files through Xilinx EZMOVE.

Please feel free to send with private message if necessary.

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos