cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
rbriegel
Contributor
Contributor
2,421 Views
Registered: ‎09-11-2018

[V3.0_190624, TF, 16.04LTS] Decent_q quantization does not happen, although no erros thrown

Jump to solution

Hello together,

Im running into a problem with the quantization of my custom but pretty vanilla CNN, made in TF/Keras ( for more info see 'CNN'). The generation of the deploy_model.pb and the quantize_eval_model.pb executes without errors, but when I run inference with some frames on the quantize_eval_model.pb the output tensors are all identical no matter what input I give. Compared to that, the original model produces a valid output tensor. I printed the weights of both the quantized and the original model and they are identical, with no sign of any quantization happening (see 'Weights and Output Tensors') . 

 

CNN

The NN I'm trying to implement with the DNNDK is the feature-extracting part of a bigger NN, which takes 224x224 single channel images as input, and gives a tensor as output which will be fed into a LSTM, together with other output tensors from different timesteps (the Implementation of the LSTM is not clear yet, for now I'm trying to get the CNN part of the NN working with the DPU). The bigger NN was implemented and trained in Keras, so in order to get only the CNN part of that network, I made a new NN in Keras with only the Layers I wanted to implement (see below), copied the weights for those layers and saved the whole thing into an .h5 file. In the next step i take the .h5 and convert it into a .pb file. This .pb file is what I refered to as the "original model" and what I feed into decent. As mentioned above, Inference on that .pb seems to work fine, as in I get what appears to be meaningful outputs for different images I run inference on (see Weights & Output Tensors). 

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (3, 218, 218, 8)          400       
_________________________________________________________________
batch_normalization_1 (Batch (3, 218, 218, 8)          32        
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (3, 109, 109, 8)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (3, 105, 105, 16)         3216      
_________________________________________________________________
batch_normalization_2 (Batch (3, 105, 105, 16)         64        
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (3, 52, 52, 16)           0         
_________________________________________________________________
conv2d_3 (Conv2D)            (3, 50, 50, 32)           4640      
_________________________________________________________________
batch_normalization_3 (Batch (3, 50, 50, 32)           128       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (3, 25, 25, 32)           0         
_________________________________________________________________
conv2d_4 (Conv2D)            (3, 23, 23, 64)           18496     
_________________________________________________________________
batch_normalization_4 (Batch (3, 23, 23, 64)           256       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (3, 11, 11, 64)           0         
=================================================================
Total params: 27,232
Trainable params: 0
Non-trainable params: 27,232
_________________________________________________________________
None

 

System

The DNNDK (Build 190624) toolchain runs on VM with a fresh Ubuntu 16.04. LTS installed and no GPU present. Decent runs under Anaconda, with the Python 3.6 CPU .whl and the additional packages installed in the virtual environment. So everything according to UG1327 V.1.5, afaik. I've got the ZCU104, on which I plan to implement the NN.

 

Decent

decent quantization command:

 

$ decent_q quantize
 --input_frozen_graph CNN_only_new.pb 
 --input_nodes conv2d_1_input --input_shapes 3,224,224,1
 --output_nodes max_pooling2d_4/MaxPool
 --input_fn input_fn.calib_input
 --calib_iter 300 

input_fn.calib_input:
I've created a custom input function, so that the 900 calibration Images are fed into decent the same way as while in training the original CNN-LSTM

 

 

import cv2
import numpy as np 

calib_image_dir = "./Frames/merged/"
calib_image_list = "./Frames/filenames.txt"
calib_batch_size = 3
def calib_input(iter):
	images = []
	line = open(calib_image_list).readlines()
	for index in range(0, calib_batch_size):
		curline = line[iter * calib_batch_size + index]
		calib_image_name = curline.strip()
		image = cv2.imread(calib_image_dir + calib_image_name)
		img_gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
		blur = cv2.GaussianBlur(img_gray, (5, 5), 0)
		_, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
		img_out = th3
		img_out = cv2.resize(img_out, (224, 224))
		img_out=img_out[:,:,np.newaxis]
		images.append(img_out)
	return {"conv2d_1_input": images}

Decent console output:

 

 

INFO: Start Float Graph Check
Optimizing fused batch norm node name: "batch_normalization_1/FusedBatchNorm_1"
op: "FusedBatchNorm"
input: "conv2d_1/Relu"
input: "batch_normalization_1/gamma"
input: "batch_normalization_1/beta"
input: "batch_normalization_1/moving_mean"
input: "batch_normalization_1/moving_variance"
device: "/job:localhost/replica:0/task:0/device:CPU:0"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "data_format"
  value {
    s: "NHWC"
  }
}
attr {
  key: "epsilon"
  value {
    f: 0.001
  }
}
attr {
  key: "is_training"
  value {
    b: false
  }
}

Optimizing fused batch norm node name: "batch_normalization_2/FusedBatchNorm_1"
op: "FusedBatchNorm"
input: "conv2d_2/Relu"
input: "batch_normalization_2/gamma"
input: "batch_normalization_2/beta"
input: "batch_normalization_2/moving_mean"
input: "batch_normalization_2/moving_variance"
device: "/job:localhost/replica:0/task:0/device:CPU:0"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "data_format"
  value {
    s: "NHWC"
  }
}
attr {
  key: "epsilon"
  value {
    f: 0.001
  }
}
attr {
  key: "is_training"
  value {
    b: false
  }
}

Optimizing fused batch norm node name: "batch_normalization_3/FusedBatchNorm_1"
op: "FusedBatchNorm"
input: "conv2d_3/Relu"
input: "batch_normalization_3/gamma"
input: "batch_normalization_3/beta"
input: "batch_normalization_3/moving_mean"
input: "batch_normalization_3/moving_variance"
device: "/job:localhost/replica:0/task:0/device:CPU:0"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "data_format"
  value {
    s: "NHWC"
  }
}
attr {
  key: "epsilon"
  value {
    f: 0.001
  }
}
attr {
  key: "is_training"
  value {
    b: false
  }
}

Optimizing fused batch norm node name: "batch_normalization_4/FusedBatchNorm_1"
op: "FusedBatchNorm"
input: "conv2d_4/Relu"
input: "batch_normalization_4/gamma"
input: "batch_normalization_4/beta"
input: "batch_normalization_4/moving_mean"
input: "batch_normalization_4/moving_variance"
device: "/job:localhost/replica:0/task:0/device:CPU:0"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "data_format"
  value {
    s: "NHWC"
  }
}
attr {
  key: "epsilon"
  value {
    f: 0.001
  }
}
attr {
  key: "is_training"
  value {
    b: false
  }
}

INFO: Done Float Graph Check
INFO: Start Calibration for 300 iterations:
100% (300 of 300) |######################| Elapsed Time: 0:00:33 Time:  0:00:33
INFO: Done Calibration
INFO: Start Generate Deploy Model
INFO: End Generate Deploy Model
********************* Quantization Summary *********************
INFO: Output:
  quantize_eval_model: ./quantize_results/quantize_eval_model.pb
  deploy_model: ./quantize_results/deploy_model.pb
- |#  

 

Weights & Output Tensors

Attached to this post, you will find the weights for each layer for the original and the quantized model, as well as the output tensor for an calibration image on which I let the networks run inference on. Note that no matter what image I feed into the quantized model, the output tensor stays the same as in the inference_output_quantized_model.txt file, whereas the output tensor of the original model is different from image to image, as it should be.

 

Conclusion?

As for what is causing these issues, I have no clue. I hope somebody here can point me in the right direction. Thanks for reading this until here!


Best Regards

Robert

 

 

 

1 Solution

Accepted Solutions
rbriegel
Contributor
Contributor
2,041 Views
Registered: ‎09-11-2018

For anyone wondering, the solution to this problem was to rearrange the layers of the model. CONV2D->RELU->BN was causing the problem and had to be rearranged to CONV2D->BN->RELU.

Cheers!

View solution in original post

0 Kudos
8 Replies
rbriegel
Contributor
Contributor
2,318 Views
Registered: ‎09-11-2018

Im still getting the same bad output as described before..

Meanwhile I tried:

  • Using only tensorflow.keras instead of keras
  • Using tensorflow.keras.model instead of sequential
  • switching from fixed batch size (3) to None (?)
  • Adding an InputLayer
  • saving the weights from the original model in numpy arrays and loading the weights from there (so that I will only have one graph in my session)
  • eliminating the intermediate step of saving to an .h5, instead freeze and save directly into .pb

Attached, you can find the Tensorboard Visualization of my model.pb and the quantize_eval_model.pb
Decent_q seems to place an "aquant" Operation after each Operation in the model. Is this how it's supposed to be?

I will happily provide more info and the whole codebase (via pm), if there is somebody out there who can help me!

Cheers

model.pb.png
quantize_eval_model.pb.png
0 Kudos
rbriegel
Contributor
Contributor
2,296 Views
Registered: ‎09-11-2018

I just figured out that when increasing the weight and activation bits, from 14 bits upwards im getting meaningful output and the generated deploy_model.pb can be read by dnnc. Of course I know that the DPU only supports 8 bit models at the moment, so this is not really a success. It might help tracking down the issue though!

0 Kudos
rbriegel
Contributor
Contributor
2,260 Views
Registered: ‎09-11-2018

In my latest attempt to solve this problem, I switched to a machine with a GPU present and installed the toolchain according to UG1327. Results have not changed.

Cheers

Unbenannt.PNG
0 Kudos
quentonh
Xilinx Employee
Xilinx Employee
2,239 Views
Registered: ‎05-24-2019

@rbriegel  First off, I am sorry that we haven't been able to provide a good answer to the questions that you have asked (thus far).  I double-checked your layer stack this evening and do not see any issue with the layer types used and ordering of the layers.

Can you elaborate further on your comment "from 14 bits upwards im getting meaningful output and the generated deploy_model.pb can be read by dnnc".  Should we understand this to mean:

[1] That the generated deploy_model.pb could not previously be read by dnnc?  If so, what messages were generated?

[2] That you are scaling up the weights and activations by some factor, and that is resulting in meaningful output from the last layer of the network while evaluating the network post quantization?

Have you had opportunity to compare the results in hardware, or only through evaluation of the quantized graph?

--Quenton

 

0 Kudos
rbriegel
Contributor
Contributor
2,227 Views
Registered: ‎09-11-2018

Hi Quenton ,

Thanks for getting back to me!

"[1] That the generated deploy_model.pb could not previously be read by dnnc?  If so, what messages were generated?

 Yes, the generated deploy_model.pb can not be read by dnnc. Dnnc will throw the

Assertion `shift_cut >= 0' failed.
Aborted (core dumped)

error, which as I read somewhere else here on this forum is an indicator, that the quantization did not happen correctly, which makes sense.

Because of this I was not able to go further down the toolchain. My Understanding is, that quantizing with the weight and activation bits parameter set to e.g. 14 or higher, which works and also throws no dnnc error, will lead to problems further down the implementation because the DPU does not actually support this, right?

0 Kudos
quentonh
Xilinx Employee
Xilinx Employee
2,179 Views
Registered: ‎05-24-2019

@rbriegelSorry for the delayed response.  Yes, you are correct.  Though DECENT will allow you to select different quantizations, DNNC and the hardware today support INT8 only.

I would like to suggest a test.  Specifically, what I would like to suggest is that you try quantizing the model to INT8, but use the DECENT_Q ignore_nodes argument to ignore all of the BN layers during quantization.  This will quantize the rest of the model, but leave the BN layers as FP32.  This will not be deployable in hardware, but may help us to ascertain if the quantization issue stems from the quantization of the BN layers.

--Quenton

0 Kudos
rbriegel
Contributor
Contributor
2,160 Views
Registered: ‎09-11-2018

Hi Quenton,

I tried your suggestion to ignore all batchnormalization nodes when quantizing, via the following command:

decent_q quantize
 --input_frozen_graph frozen.pb  
--input_nodes Input  --input_shapes ?,224,224,1 
--output_nodes max_pooling2d_3/MaxPool 
--input_fn input_fn.calib_input 
--calib_iter 1000 --method 1 
--ignore_nodes "batch_normalization/gamma,
batch_normalization/beta,
batch_normalization/moving_mean,
batch_normalization/moving_variance,
batch_normalization/ReadVariableOp,
batch_normalization/ReadVariableOp_1,
batch_normalization/FusedBatchNorm/ReadVariableOp,
batch_normalization/FusedBatchNorm/ReadVariableOp_1,
batch_normalization/FusedBatchNorm,
batch_normalization_1/gamma,
batch_normalization_1/beta,
batch_normalization_1/moving_mean,
batch_normalization_1/moving_variance,
batch_normalization_1/ReadVariableOp,batch_normalization_1/ReadVariableOp_1,
batch_normalization_1/FusedBatchNorm/ReadVariableOp,
batch_normalization_1/FusedBatchNorm/ReadVariableOp_1,
batch_normalization_1/FusedBatchNorm,
batch_normalization_2/gamma,
batch_normalization_2/beta,
batch_normalization_2/moving_mean,
batch_normalization_2/moving_variance, batch_normalization_2/ReadVariableOp,
batch_normalization_2/ReadVariableOp_1, batch_normalization_2/FusedBatchNorm/ReadVariableOp, batch_normalization_2/FusedBatchNorm/ReadVariableOp_1, batch_normalization_2/FusedBatchNorm,
batch_normalization_3/gamma, batch_normalization_3/beta, batch_normalization_3/moving_mean, batch_normalization_3/moving_variance,
batch_normalization_3/ReadVariableOp, batch_normalization_3/ReadVariableOp_1, batch_normalization_3/FusedBatchNorm/ReadVariableOp, batch_normalization_3/FusedBatchNorm/ReadVariableOp_1, batch_normalization_3/FusedBatchNorm"

but I still get the error in decent:

tmp/DNNC_V010_Package/dnnc/submodules/asicv2com/src/SlNode/SlNodeDptConv.cpp:83:
void SlNodeDptConv::generate_dptconvinit_op(const YAggregationType&, const YAggregationType&, uint32_t, uint32_t):
Assertion `shift_cut >= 0' failed.

I also tried skipping the float graph check by giving the "--skip_check 1" attribute to decent because during the float graph check the BN nodes seem to be "optimized" in some way according to the output (this is printed for all 4 BN layers):

Optimizing fused batch norm node name: "batch_normalization_3/FusedBatchNorm"
op: "FusedBatchNorm"
input: "conv2d_3/Relu"
input: "batch_normalization_3/gamma"
input: "batch_normalization_3/beta"
input: "batch_normalization_3/moving_mean"
input: "batch_normalization_3/moving_variance"
device: "/job:localhost/replica:0/task:0/device:GPU:0"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "data_format"
  value {
    s: "NHWC"
  }
}
attr {
  key: "epsilon"
  value {
    f: 0.001
  }
}
attr {
  key: "is_training"
  value {
    b: false
  }
}

  But the skipping did also produce the same results.

Cheers

0 Kudos
rbriegel
Contributor
Contributor
2,042 Views
Registered: ‎09-11-2018

For anyone wondering, the solution to this problem was to rearrange the layers of the model. CONV2D->RELU->BN was causing the problem and had to be rearranged to CONV2D->BN->RELU.

Cheers!

View solution in original post

0 Kudos