取消
显示结果 
搜索替代 
您的意思是: 
Highlighted
Observer
Observer
193 次查看
注册日期: ‎05-20-2020

Vitis_Accel_Examples如何添加新的cpp,如何解决ERROR:kernel 'vadd' not found

在新的application(采用https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/cpp_kernels/array_partition代码)中添加新的vadd.cpp,编译运行时报错找不到对应的函数??

错误提示如下:

../src/host.cpp:180 Error calling matmul_kernel = cl::Kernel(program, "vadd", &err), error code is: -46
XRT build version: 2.5.309
Build hash: 9a03790c11f066a5597b133db737cf4683ad84c8
Build date: 2020-02-24 02:54:37
Git branch: 2019.2_PU2
PID: 125762
UID: 0
[Sat May 23 11:23:53 2020]
HOST: comput5
EXE: /home/wzy/workspace/hmm-test/Emulation-SW/hmm-test
[XRT] ERROR: kernel 'vadd' not found
[XRT] WARNING: Profiling may contain incomplete information. Please ensure all OpenCL objects are released by your host code (e.g., clReleaseProgram()).

host.cpp代码如下:

/**********
Copyright (c) 2019, Xilinx, Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********/

//OpenCL utility layer include
#include "xcl2.hpp"
#include <algorithm>
#include <cstdio>
#include <random>
#include <vector>

using std::default_random_engine;
using std::generate;
using std::uniform_int_distribution;
using std::vector;

void matmul(int *C, int *A, int *B, int M) {
    for (int k = 0; k < M; k++) {
        for (int j = 0; j < M; j++) {
            for (int i = 0; i < M; i++) {
                C[k * M + j] += A[k * M + i] * B[i * M + j];
            }
        }
    }
}

void matadd(int *C, int *A, int *B, int column,int row) {
    for (int k = 0; k < row; k++) {
        for (int j = 0; j < column; j++) {
			C[k * column + j] += A[k * column + j] + B[k * column + j];
        }
    }
}

int gen_random() {
    static default_random_engine e;
    static uniform_int_distribution<int> dist(0, 10);

    return dist(e);
}

void print(int *data, int columns, int rows) {
    vector<int> out(columns * rows);
    for (int r = 0; r < 10; r++) {
        for (int c = 0; c < 10; c++) {
            printf("%4d ", data[r * columns + c]);
        }
        printf("\u2026\n");
    }
    for (int r = 0; r < 10; r++) {
        printf("   %s ", "\u2026");
    }
    printf("\u22f1\n\n");
}

void verify(vector<int, aligned_allocator<int>> &gold,
            vector<int, aligned_allocator<int>> &output) {
    for (int i = 0; i < (int)output.size(); i++) {
        if (output[i] != gold[i]) {
            printf("Mismatch %d: gold: %d device: %d\n", i, gold[i], output[i]);
            print(output.data(), 16, 16);
            exit(EXIT_FAILURE);
        }
    }
}

// This example illustrates how to use array partitioning attributes in HLS
// kernels for FPGA devices using matmul.
int main(int argc, char **argv) {
    if (argc != 2) {
        std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
        return EXIT_FAILURE;
    }
    std::string binaryFile = argv[1];
    static const int columns = 16;
    static const int rows = 16;
    cl_int err;
    cl::CommandQueue q;
    cl::Context context;
    cl::Kernel matmul_kernel, matmul_partition_kernel;
    cl::Program program;
    vector<int, aligned_allocator<int>> A(columns * rows);
    vector<int, aligned_allocator<int>> B(columns * rows);
    vector<int, aligned_allocator<int>> gold(columns * rows, 0);
    vector<int, aligned_allocator<int>> C(columns * rows, 0);
    generate(begin(A), end(A), gen_random);
    generate(begin(B), end(B), gen_random);

    printf("A:\n");
    print(A.data(), columns, rows);
    printf("B:\n");
    print(B.data(), columns, rows);
//    matmul(gold.data(), A.data(), B.data(), columns);
    matadd(gold.data(), A.data(), B.data(), columns, rows);

    printf("Gold:\n");
    print(gold.data(), columns, rows);
    auto devices = xcl::get_xil_devices();

    // read_binary_file() is a utility API which will load the binaryFile
    // and will return the pointer to file buffer.
    auto fileBuf = xcl::read_binary_file(binaryFile);
    cl::Program::Binaries bins{{fileBuf.data(), fileBuf.size()}};
    int valid_device = 0;
    for (unsigned int i = 0; i < devices.size(); i++) {
        auto device = devices[i];
        // Creating Context and Command Queue for selected Device
        OCL_CHECK(err, context = cl::Context(device, NULL, NULL, NULL, &err));
        OCL_CHECK(err,
                  q = cl::CommandQueue(
                      context, device, CL_QUEUE_PROFILING_ENABLE, &err));

        std::cout << "Trying to program device[" << i
                  << "]: " << device.getInfo<CL_DEVICE_NAME>() << std::endl;
        program = cl::Program(context, {device}, bins, NULL, &err);
        if (err != CL_SUCCESS) {
            std::cout << "Failed to program device[" << i
                      << "] with xclbin file!\n";
        } else {
            std::cout << "Device[" << i << "]: program successful!\n";
            valid_device++;
            break; // we break because we found a valid device
        }
    }
    if (valid_device == 0) {
        std::cout << "Failed to program any device found, exit!\n";
        exit(EXIT_FAILURE);
    }

    // compute the size of array in bytes
    size_t array_size_bytes = columns * rows * sizeof(int);
    OCL_CHECK(err,
              cl::Buffer buffer_a(context,
                                  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY,
                                  array_size_bytes,
                                  A.data(),
                                  &err));
    OCL_CHECK(err,
              cl::Buffer buffer_b(context,
                                  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY,
                                  array_size_bytes,
                                  B.data(),
                                  &err));
    OCL_CHECK(err,
              cl::Buffer buffer_c(context,
                                  CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY,
                                  array_size_bytes,
                                  C.data(),
                                  &err));

    printf("|-------------------------+-------------------------|\n"
           "| Kernel                  |    Wall-Clock Time (ns) |\n"
           "|-------------------------+-------------------------|\n");

    OCL_CHECK(err, matmul_kernel = cl::Kernel(program, "vadd", &err));
    OCL_CHECK(err, err = matmul_kernel.setArg(0, buffer_a));
    OCL_CHECK(err, err = matmul_kernel.setArg(1, buffer_b));
    OCL_CHECK(err, err = matmul_kernel.setArg(2, buffer_c));
    OCL_CHECK(err, err = matmul_kernel.setArg(3, columns));

    OCL_CHECK(err,
              err = q.enqueueMigrateMemObjects({buffer_a, buffer_b},
                                               0 /* 0 means from host*/));

    cl::Event event;
    uint64_t nstimestart, nstimeend;

    OCL_CHECK(err, err = q.enqueueTask(matmul_kernel, NULL, &event));
    OCL_CHECK(err,
              err = q.enqueueMigrateMemObjects({buffer_c},
                                               CL_MIGRATE_MEM_OBJECT_HOST));
    q.finish();

    OCL_CHECK(err,
              err = event.getProfilingInfo<uint64_t>(CL_PROFILING_COMMAND_START,
                                                     &nstimestart));
    OCL_CHECK(err,
              err = event.getProfilingInfo<uint64_t>(CL_PROFILING_COMMAND_END,
                                                     &nstimeend));
    auto matmul_time = nstimeend - nstimestart;

    verify(gold, C);
    printf("| %-23s | %23lu |\n", "matmul: ", matmul_time);

//    OCL_CHECK(err,
//              matmul_partition_kernel =
//                  cl::Kernel(program, "matmul_partition", &err));
//
//    OCL_CHECK(err, err = matmul_partition_kernel.setArg(0, buffer_a));
//    OCL_CHECK(err, err = matmul_partition_kernel.setArg(1, buffer_b));
//    OCL_CHECK(err, err = matmul_partition_kernel.setArg(2, buffer_c));
//    OCL_CHECK(err, err = matmul_partition_kernel.setArg(3, columns));
//
//    OCL_CHECK(err, err = q.enqueueTask(matmul_partition_kernel, NULL, &event));
//    OCL_CHECK(err,
//              err = q.enqueueMigrateMemObjects({buffer_c},
//                                               CL_MIGRATE_MEM_OBJECT_HOST));
//    q.finish();
//
//    OCL_CHECK(err,
//              err = event.getProfilingInfo<uint64_t>(CL_PROFILING_COMMAND_START,
//                                                     &nstimestart));
//    OCL_CHECK(err,
//              err = event.getProfilingInfo<uint64_t>(CL_PROFILING_COMMAND_END,
//                                                     &nstimeend));
//    auto matmul_partition_time = nstimeend - nstimestart;
//
//    verify(gold, C);
//
//    printf("| %-23s | %23lu |\n", "matmul: partition", matmul_partition_time);

    printf("|-------------------------+-------------------------|\n");
    printf("Note: Wall Clock Time is meaningful for real hardware execution "
           "only, not for emulation.\n");
    printf("Please refer to profile summary for kernel execution time for "
           "hardware emulation.\n");
    printf("TEST PASSED\n\n");

    return EXIT_SUCCESS;
}

 vadd.cpp代码如下:

/**********
Copyright (c) 2019, Xilinx, Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********/

/*******************************************************************************
Description:
    HLS pragmas can be used to optimize the design : improve throughput, reduce latency and
    device resource utilization of the resulting RTL code
    This is vector addition example to demonstrate how HLS optimizations are used in kernel.
*******************************************************************************/

#define BUFFER_SIZE 128
#define DATA_SIZE 4096

//TRIPCOUNT identifier
const unsigned int c_len = DATA_SIZE / BUFFER_SIZE;
const unsigned int c_size = BUFFER_SIZE;

/*
    Vector Addition Kernel Implementation
    Arguments:
        in1   (input)     --> Input Vector1
        in2   (input)     --> Input Vector2
        out_r   (output)    --> Output Vector
        size  (input)     --> Size of Vector in Integer
*/

extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
          const unsigned int *in2, // Read-Only Vector 2
          unsigned int *out_r,     // Output Result
          int size                 // Size in integer
) {
// Here Vitis kernel contains one s_axilite interface which will be used by host application to configure the kernel.
// Here bundle control is defined which is s_axilite interface and associated with all the arguments (in1, in2, out_r and size),
// control interface must also be associated with "return".
// All the global memory access arguments must be associated to one m_axi(AXI Master Interface). Here all three arguments(in1, in2, out_r) are
// associated to bundle gmem which means that a AXI master interface named "gmem" will be created in Kernel and all these variables will be
// accessing global memory through this interface.
// Multiple interfaces can also be created based on the requirements. For example when multiple memory accessing arguments need access to
// global memory simultaneously, user can create multiple master interfaces and can connect to different arguments.
#pragma HLS INTERFACE m_axi port = in1 offset = slave bundle = gmem
#pragma HLS INTERFACE m_axi port = in2 offset = slave bundle = gmem
#pragma HLS INTERFACE m_axi port = out_r offset = slave bundle = gmem
#pragma HLS INTERFACE s_axilite port = in1 bundle = control
#pragma HLS INTERFACE s_axilite port = in2 bundle = control
#pragma HLS INTERFACE s_axilite port = out_r bundle = control
#pragma HLS INTERFACE s_axilite port = size bundle = control
#pragma HLS INTERFACE s_axilite port = return bundle = control

    unsigned int v1_buffer[BUFFER_SIZE];   // Local memory to store vector1
    unsigned int v2_buffer[BUFFER_SIZE];   // Local memory to store vector2
    unsigned int vout_buffer[BUFFER_SIZE]; // Local Memory to store result

    //Per iteration of this loop perform BUFFER_SIZE vector addition
    for (int i = 0; i < size; i += BUFFER_SIZE) {
       #pragma HLS LOOP_TRIPCOUNT min=c_len max=c_len
        int chunk_size = BUFFER_SIZE;
        //boundary checks
        if ((i + BUFFER_SIZE) > size)
            chunk_size = size - i;

        // Transferring data in bursts hides the memory access latency as well as improves bandwidth utilization and efficiency of the memory controller.
        // It is recommended to infer burst transfers from successive requests of data from consecutive address locations.
        // A local memory vl_local is used for buffering the data from a single burst. The entire input vector is read in multiple bursts.
        // The choice of LOCAL_MEM_SIZE depends on the specific applications and available on-chip memory on target FPGA.
        // burst read of v1 and v2 vector from global memory

    read1:
        for (int j = 0; j < chunk_size; j++) {
           #pragma HLS LOOP_TRIPCOUNT min=c_size max=c_size
           #pragma HLS PIPELINE II=1
            v1_buffer[j] = in1[i + j];
        }

    read2:
        for (int j = 0; j < chunk_size; j++) {
           #pragma HLS LOOP_TRIPCOUNT min=c_size max=c_size
           #pragma HLS PIPELINE II=1
            v2_buffer[j] = in2[i + j];
        }

        // PIPELINE pragma reduces the initiation interval for loop by allowing the
        // concurrent executions of operations
    vadd:
        for (int j = 0; j < chunk_size; j++) {
           #pragma HLS LOOP_TRIPCOUNT min=c_size max=c_size
           #pragma HLS PIPELINE II=1
            //perform vector addition
            vout_buffer[j] = v1_buffer[j] + v2_buffer[j];
        }

    //burst write the result
    write:
        for (int j = 0; j < chunk_size; j++) {
           #pragma HLS LOOP_TRIPCOUNT min=c_size max=c_size
           #pragma HLS PIPELINE II=1
            out_r[i + j] = vout_buffer[j];
        }
    }
}
}
06f1dd04aa9781d266f0adafd0978a8.png
0 项奖励
3 条回复3
Highlighted
Xilinx Employee
Xilinx Employee
155 次查看
注册日期: ‎03-24-2010

回复: Vitis_Accel_Examples如何添加新的cpp,如何解决ERROR:kernel 'vadd' not found

看起来vadd没有链接到平台文件中

需要用V++ -c命令编译vadd.cpp生成xo文件,再将这个文件用到原来例子中的v++ -l链接命令中,生成xclbin。

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 项奖励
Highlighted
Observer
Observer
86 次查看
注册日期: ‎05-20-2020

回复: Vitis_Accel_Examples如何添加新的cpp,如何解决ERROR:kernel 'vadd' not found

v++ 应该在哪个文件夹下使用呢?找不到这个命令,生成的xo文件应该放在哪里呢?

 

0 项奖励
Highlighted
Xilinx Employee
Xilinx Employee
70 次查看
注册日期: ‎07-17-2008

回复: Vitis_Accel_Examples如何添加新的cpp,如何解决ERROR:kernel 'vadd' not found

v++是Vitis的命令,建立vitis的环境变量后可以直接执行。

Vitis Example的Makefile里面也是用的v++去compile和link kernel。

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 项奖励