cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
asobeih
Explorer
Explorer
452 Views
Registered: ‎02-13-2016

[URGENT] DPU Produces Same Output for Different Images on The Same Model

Hi,

I have quantized and compiled my model successfully using Vitis AI stack to run it on the DPU implemented on ZCU104. The model is split into 2 DPU kernels. Accordingly, there are 2 files generation from vai_c_tensorflow, each one represents part of the model;

  • dpu_nn_0.elf
  • dpu_nn_2.elf

Where kernels 1 and 3 are to run on the CPU. Check the "kernel.info" file for more information about the generated kernels.

After that, I merged the 2 .elf files into a single .SO file using the following command:

 

aarch64-linux-gnu-gcc --sysroot=/home/user/petalinux_sdk/sysroots/aarch64-xilinx-linux -fPIC -shared dpu_nn_*.elf -o libdpumodelnn.so

 

 

To run my model on the DPU, I modified and used a python API that was part of the examples

 

 

'''
Copyright 2019 Xilinx Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''

import graph_input_fn
from dnndk import n2cube
import numpy as np
from numpy import float32
import os
import time

"""
Created on Tue Oct  6 17:32:32 2020

@author: Ahmed Anwar
"""
import numpy as np

#-----------------------
def GAP(feature_map):
  
  """

  Input: 4 dimensional numpy array

  Output: 2 dimensional numpy array representing the average of every channel across different batches
  
  """

  assert len(feature_map.shape) == 4, "Input to GAP layer must be 4 dimensional numpy array of shape (batch_size, num_rows, num_columns, num_channels)"

  num_rows = feature_map.shape[0]
  num_columns = feature_map.shape[-1]
  out_array = np.zeros((num_rows, num_columns))
  for i in range(0, num_rows):
    for j in range(0, num_columns):
      temp = feature_map[i,:,:,j]
      temp_average = np.mean(temp)
      out_array[i, j] = temp_average
  return out_array
  
#--------------------------------------------------------------------------------------------------------------------

"""DPU Kernel Name for miniResNet"""
KERNEL_CONV_1="nn_0"
CONV_INPUT_NODE_1="block1_conv1_1_convolution"
CONV_OUTPUT_NODE_1="block_2_add_concat"


KERNEL_CONV_2="nn_2"

CONV_INPUT_NODE_2="predictions_1_MatMul"
CONV_OUTPUT_NODE_2="predictions_1_MatMul"

def get_script_directory():
    path = os.getcwd()
    return path

SCRIPT_DIR = get_script_directory()
calib_image_dir  = SCRIPT_DIR + "/../dataset/"
calib_image_list = calib_image_dir +  "words.txt"

def TopK(dataInput, filePath):
    """
    Get top k results according to its probability
    """
    cnt = [i for i in range(10)]
    pair = zip(dataInput, cnt)
    pair = sorted(pair, reverse=True)
    softmax_new, cnt_new = zip(*pair)
    #print(softmax_new,'\n',cnt_new)
    fp = open(filePath, "r")
    data1 = fp.readlines()
    fp.close()
    for i in range(2):
        flag = 0
        for line in data1:
            if flag == cnt_new[i]:
                print("Top[%d] %f %s" %(i, (softmax_new[i]),(line.strip)("\n")))
            flag = flag + 1

def main():
    print ("Starting....\n")
    """ Attach to DPU driver and prepare for running """
    n2cube.dpuOpen()
    print ("DPU has been opened....\n")
	
    """ Create DPU Kernels for CONV NODE"""
    kernel_1 = n2cube.dpuLoadKernel(KERNEL_CONV_1)
    print ("Kernel 1 has been created....\n")
    #kernel_2 = n2cube.dpuLoadKernel(KERNEL_CONV_2)
    """ Create DPU Tasks for CONV NODE"""
    task_1 = n2cube.dpuCreateTask(kernel_1, 0)
    print ("Task 1 has been created....\n")
    #task_2 = n2cube.dpuCreateTask(kernel_2, 0)
	
    listimage = os.listdir(calib_image_dir)

    for i in range(len(listimage)):
        path = os.path.join(calib_image_dir, listimage[i])
        print ("Lading image....\n")
        if os.path.splitext(path)[1] != ".jpg":
            continue
        print("Loading %s" %listimage[i])

        """ Load image and Set image into CONV Task """
        
        imageRun=graph_input_fn.calib_input(path)
        print ("Image has been loaded....\n")
        imageRun=imageRun.reshape((imageRun.shape[0]*imageRun.shape[1]*imageRun.shape[2]))
        input_len=len(imageRun)
        n2cube.dpuSetInputTensorInHWCFP32(task_1,CONV_INPUT_NODE_1,imageRun,input_len)
        n2cube.dpuRunTask(task_1)
        size = n2cube.dpuGetOutputTensorSize(task_1, CONV_OUTPUT_NODE_1)
        print ("size is \t:" + str(size) + "\n\n")
        output_tensor = n2cube.dpuGetOutputTensorInHWCFP32(task_1, CONV_OUTPUT_NODE_1 , size)
        output_tensor_reshaped = output_tensor.reshape(-1,256,256,32)
        print (output_tensor_reshaped)
        """ Get output scale of CONV  """
        scale = n2cube.dpuGetOutputTensorScale(task_1, CONV_OUTPUT_NODE_1)
        print ("scale is \t:" + str(scale) + "\n\n")
        """ Get output tensor address of CONV """
        conf = n2cube.dpuGetOutputTensorAddress(task_1, CONV_OUTPUT_NODE_1)
        """ Get output channel of CONV  """
        channel = n2cube.dpuGetOutputTensorChannel(task_1, CONV_OUTPUT_NODE_1)
        print ("channel is \t:" + str(channel) + "\n\n")
        """ Get output size of CONV  """
        size = n2cube.dpuGetOutputTensorSize(task_1, CONV_OUTPUT_NODE_1)
        print ("size is \t:" + str(size) + "\n\n")
        #output_GAP = GAP(output_tensor_reshaped)
    """ Destroy DPU Tasks & free resources """
    n2cube.dpuDestroyTask(task_1)
    """ Destroy DPU Kernels & free resources """
    rtn = n2cube.dpuDestroyKernel(kernel_1)
    """ Dettach from DPU driver & free resources """
    n2cube.dpuClose()
if __name__ == "__main__":
    main()

 

 

 

As I print the outputs, whether the size, conf, scale or output_tensor, their values remain the same regardless of the input. I even tried to feed a random numpy array but same values always return

What wrong did I make? I am okay to share any other information.

Thanks.

0 Kudos
5 Replies
asobeih
Explorer
Explorer
445 Views
Registered: ‎02-13-2016

Hi @jasonwu,
Any help with that?
0 Kudos
asobeih
Explorer
Explorer
383 Views
Registered: ‎02-13-2016

Hello,
I still have not got any response. Anyone, please help me!

Thanks.
0 Kudos
jbeckwi
Xilinx Employee
Xilinx Employee
376 Views
Registered: ‎08-30-2011

Which examples did you refer to?  The CIFAR10 app may be a good place to refer to:

https://github.com/Xilinx/Vitis-AI-Tutorials/blob/CIFAR10-Classification-with-TensorFlow/files/target/cifar10_app.py

 

I'm curious why you use the calib input function to pre-process the images?  This function is intended for use during the quantizer, but I'm not sure it's necessary (or helpful) in your inference code

0 Kudos
asobeih
Explorer
Explorer
350 Views
Registered: ‎02-13-2016

Hi @jbeckwi ,

Thanks for your prompt response.

Well, I referred to the mini-resnet in the Vitis Ai DNNDK Samples on the ZCU104 image (check file attached).

As for the calib input function, that was part of the mini-resnet example. I modified its implementation inside the graph_input_fn.py to pre-process the images in the way required before running my model on the DPU.

I just have some following up questions that I need your help urgently with each of them:

  1. in the "cifar10_app.py" code, there is an argument I have to provide to the API, which is the meta.json file. I made a post earlier about this issue that when I compile my model, the meta.json file does not get generated. Below is the command I used to compile my model after quantizing it:
     vai_c_tensorflow --frozen_pb freeze/qoutput/deploy_model.pb --arch /opt/vitis_ai/compiler/arch/dpuv2/ZCU104/ZCU104.json --output_dir zcu104_itip/out_new/ --net_name nn -e "{'save_kernel' : 'kernel.txt', 'dump' : 'all'}"
    I am following this tutorial Accelerating Medical Applications with Xilinx Vitis AI, and it clearly mentions that the meta.json file is generated out of compiling the deployment model. I know about the .json files that exist in this directory "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/arch/dpuv2/ZCU104", yet, to my understanding, there should be another .json file to be generated.

  2. Inside the "cifar10_app.py" code,how do you take output of certain layers? In my case, my model is split into 4 kernels;
    DPU,
    CPU for Global Average Pool,
    DPU,
    CPU to perform softmax.

    So, in this "cifar10_app.py", I will have to modify it to take the output of the first DPU kernel, pass this output to a Global Average Pool function that runs on the CPU, takes the output of the Global Avg. Pool function and pass it again to the DPU kernel to perform Dense operations, then finally perform Softmax on the CPU.
    Given that, how should I modify the "cifar10_app.py" code to handle this matter? In the mini_resnet example, I modified it to handle this 2 kernels behavior by creating 2 DPU kernels and 2 DPU tasks, a pair for each DPU kernel, as demonstrated at the code included in my original post here. So, how to do this with efficiently with the "cifar10_app.py"?

  3. If the scenario proposed in question #2 above is not correct, what should I do with this case?

More questions to follow. Looking forward to your prompt response, and thanks in advance for your help. 

0 Kudos
asobeih
Explorer
Explorer
320 Views
Registered: ‎02-13-2016

Hi @jbeckwi ,

I am still looking forward to your prompt response.

Thanks.

0 Kudos