02-15-2019 11:59 PM
I am quantizing the resnet model to 8 bit int using Decent tool.
I have few questions regarding the quantisation.
1. Can we have 8 bit fixed point quantisation instead of 8 bit int using Decent tool.
2. Also, I tried quantising the resnet model to 16 bit int. The output is generated as deploy.prototxt and deploy.caffemodel. The problem is that when I read weights in deploy.caffemodel, they are still float 32 bit. What does this means? I supposed the weights to be int 16 bit. Also, when I see the size of input caffemodel(float.caffemodel) and output caffemodel(deploy.caffemodel), both are exactly of same size i.e. 102.1 MB.
I want to use this quantised caffemodel(8 bit or 16 bit) as input for some network but the model generated by decent is still 32 bit float. What is the problem in this case? Is this the tool problem?
02-28-2019 10:17 PM
Actually user guide says that it supports quantisation to any bit width and we can set it in decent using data_bit and weights_bit.
03-01-2019 09:25 AM
03-18-2019 05:26 AM
What about the size of the float.caffemodel and the deploy.caffemodel? As far as I understood deploy.caffemodel should be smaller in size, no? All the 32 bit weights have been reduced to 8 bit, so the file size should be smaller. It remains the same for me.
03-18-2019 06:14 AM
Inside Decent, 32-bit float point values are used to represent quantized values, thats why the output files of decent are still float and are of the same size of the float models.
06-05-2019 10:22 AM
I'm using DNNDK v3.0 which was released after this thread. I tried running DECENT with 12-bit widths. The tool didn't complain, but my network didn't behave at all properly.
Are different bit widths supposed to be supported in the current DECENT? If not, any idea when this support will be available?
06-06-2019 05:42 AM
I tried the quantization using both methods (non-overflow and min-diffs). They behave differently. One does better on some images and the other on different ones. But neither gets me close to the performance I get with my Caffe model.
It looks like adding bits (especially if I can specify how many bits after the decimal point) would be the most likely way to improve the performance. Is there any way with DNNDK v3.0 to do this?
06-07-2019 08:55 AM - edited 06-07-2019 09:02 AM
So I have quantized a custom network with 12 bits of precission and then compiled it with dnnc. I have also compiled my hybrid executable application. There is nowhere a warning that the parameters of my model are 8 bits.
As I read above, the actual model that is used has 8 bit precission?
If arbitrary bit width is not supported for deployment, at least state it somewhere in the user guide. My network is useless with 8 bits and I have lost a week trying to learn the DNNDK framework.