cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
sitting
Explorer
Explorer
351 Views
Registered: ‎05-04-2014

Low softmax performance on custom resnet50

Jump to solution

Hi,

Here is out last model summary.

softmax.PNG

It has warning during compiling and the log is as following, it seems the DPU can not support the huge softmax.

 

 

[UNILOG][INFO] The compiler log will be dumped at "/tmp/vitis-ai-user/log/xcompiler-20210311-063205-1619"
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B4096_MAX_BG2
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: function
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B4096_MAX_BG2
[UNILOG][INFO] Graph name: model, with op num: 415
[UNILOG][INFO] Begin to compile...
[UNILOG][WARNING] xir::Op{name = quant_softmax(TransferMatMulToConv2d), type = conv2d-fix} has been assigned to CPU: [Weights(2, 8, 8, 1568) is too large to be loaded into parameter buffer. "kernel_h  kernel_w  input_channel" is supposed to be less equal than 32768].
[UNILOG][INFO] Total device subgraph number 3, DPU subgraph number 1
[UNILOG][INFO] Compile done.
[UNILOG][INFO] The meta json is saved to "/workspace/Tensorflow2_resnet50/file/compiled_model/meta.json"
[UNILOG][INFO] The compiled xmodel is saved to "/workspace/Tensorflow2_resnet50/file/compiled_model/resnet50_compiled_aid_new_quan.xmodel"

However, the performance is very low when CPU calculate softmax, is there any method to improve?

Is it possible to increase softmax core in dpu ip? For example, set softmax core to 2 or 3

Thanks

Sitting

0 Kudos
1 Solution

Accepted Solutions
idiotic_genius
Adventurer
Adventurer
223 Views
Registered: ‎07-20-2017

It seems Vitis AI compiler is converting the Flatten+Softmax operation combination into a Conv2D operation. The  CONV2D kernel size is directly equal to the input feature map size of Flatten layer and the input feature map size cannot exceed the limitation of the kernel size of CONV(kernel_w * kernel_h * input_channel / channel_parallel <= bank_depth, where channel_parallel is 16 and bank_depth is 2048 for DPU architecture DPUCZDX8G_ISA0_B4 096_MAX_BG2).

You will have to decrease the feature map(kernel_w*kernel_h*input_channel) to less than 32768 to make the it work. 

 

Regards, abhidan@logictronix.com
Please mark the Answer as "Accept as solution" if information provided solves your query. Give Kudos if you think it was helpful and reply oriented.

View solution in original post

3 Replies
gguasti
Moderator
Moderator
309 Views
Registered: ‎11-29-2007

Hi

the DPU can be created with softmax accelerator included
please refer to DPU TRD: https://github.com/Xilinx/Vitis-AI/blob/master/dsa/DPU-TRD/prj/Vitis/README.md


Softmax
The TRD support the softmax function. The TRD has included the softmax rtl kernel.

Only use the DPU
make KERNEL=DPU

Use the DPU and Softmax
make KERNEL=DPU_SM 

but I think that just one softmax core is possible

0 Kudos
sitting
Explorer
Explorer
277 Views
Registered: ‎05-04-2014

Hi @gguasti 

I have already enabled softmax core in dpu. But the compiling warning is still shown the quant_softmax(TransferMatMulToConv2d) is too large to be loaded into parameter buffer?

What is the warning meaning? How to improve it?

 

Thanks

Sitting

0 Kudos
idiotic_genius
Adventurer
Adventurer
224 Views
Registered: ‎07-20-2017

It seems Vitis AI compiler is converting the Flatten+Softmax operation combination into a Conv2D operation. The  CONV2D kernel size is directly equal to the input feature map size of Flatten layer and the input feature map size cannot exceed the limitation of the kernel size of CONV(kernel_w * kernel_h * input_channel / channel_parallel <= bank_depth, where channel_parallel is 16 and bank_depth is 2048 for DPU architecture DPUCZDX8G_ISA0_B4 096_MAX_BG2).

You will have to decrease the feature map(kernel_w*kernel_h*input_channel) to less than 32768 to make the it work. 

 

Regards, abhidan@logictronix.com
Please mark the Answer as "Accept as solution" if information provided solves your query. Give Kudos if you think it was helpful and reply oriented.

View solution in original post