UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor mchirila
Visitor
1,029 Views
Registered: ‎05-03-2018

Multiple kernels in one .cl file Error

Jump to solution

Hi,

I am trying to implement my own project derived from the SDAccel_Examples/getting_started/cpu_to_fpga/04_partition_ocl code. After implementing 3 kernels in 3 separate projects, I am attempting to merge them into the "mmult.cl" file and run it from a modified version of the "host.cpp" program.

 

The "mmult.cl" file contains 3 custom kernels: rKleene_mmultA, rKleene_mmultB, mmult.

 

I have appropriately modified the "mmult_fpga" function from the host program, by adding the additional kernels queues, buffers and setting the buffers and arguments according to the examples.

 

However, when I run the host code after compilation I get the following error:

 

XCL_EMULATION_MODE=sw_emu ./host
Found Platform
Platform Name: Xilinx
XCLBIN File Name: mmult
INFO: Importing xclbin/mmult.sw_emu.xilinx_kcu1500_dynamic_5_0.xclbin

Loading: 'xclbin/mmult.sw_emu.xilinx_kcu1500_dynamic_5_0.xclbin'
ERROR: kernel 'rKleene_mmultA' not found
ERROR: kernel 'rKleene_mmultB' not found
WARNING: Profiling may contain incomplete information. Please ensure all OpenCL objects are released by your host code (e.g., clReleaseProgram()).
Segmentation fault (core dumped)

 

All the compiling and running was done using the same makefile provided in the aforementioned example.

 

Strangely enough, I had a different name initially for my "mmult" kernel, however the compiler threw me this error:

"fatal error: error in backend: Top function not found: there is no function named 'mmult'"

 

Is there something else I need to modify in addition to the fpga setup function inside the host program?

Or should I steer away from including multiple kernels into one file altogether? And if so why does having a kernel name different from the ".cl" filename represent a problem?

 

Mind you, I believe I have declared the kernels properly in the host program:

    cl::Kernel kernel1(program,"rKleene_mmultA");
    cl::Kernel kernel2(program,"rKleene_mmultB");
    cl::Kernel kernel3(program,"mmult");

 

Thanks.

0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
969 Views
Registered: ‎01-12-2017

Re: Multiple kernels in one .cl file Error

Jump to solution

Hi @mchirila,

 

Your make file must be updated to support compilation of all the three kernels that you are using.

For your reference please have a look into documentation below (look for -k option). 

 

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_4/ug1023-sdaccel-user-guide.pdf (Pages: 41-43)

 

Thanks

Kali

3 Replies
Xilinx Employee
Xilinx Employee
996 Views
Registered: ‎01-12-2017

Re: Multiple kernels in one .cl file Error

Jump to solution

Hi @mchirila

 

Can you please share your Makefile and OCL setup host code ?

 

Thanks

Kali

0 Kudos
Visitor mchirila
Visitor
977 Views
Registered: ‎05-03-2018

Re: Multiple kernels in one .cl file Error

Jump to solution

Makefile (unmodified from the cpu_to_fpga 04 example):

#+-------------------------------------------------------------------------------
# The following parameters are assigned with default values. These parameters can
# be overridden through the make command line
#+-------------------------------------------------------------------------------

# Run Target:
#   hw  - Compile for hardware
#   sw_emu/hw_emu - Compile for software/hardware emulation
# FPGA Board Platform (Default ~ ku115)

include utils.mk
REPORT := no
PROFILE := no
DEBUG := no

TARGETS := hw
TARGET := $(TARGETS)
DEVICES := xilinx_kcu1500_dynamic_5_0
DEVICE := $(DEVICES)
XCLBIN := ./xclbin
DSA := $(call device2sandsa, $(DEVICE))

CXX := $(XILINX_SDX)/bin/xcpp
XOCC := $(XILINX_SDX)/bin/xocc

# Points to Utility Directory
COMMON_REPO = ../../../
ABS_COMMON_REPO = $(shell readlink -f $(COMMON_REPO))

CXXFLAGS := $(opencl_CXXFLAGS) -Wall -O0 -g -std=c++14
LDFLAGS := $(opencl_LDFLAGS)

HOST_SRCS = src/host.cpp

# Host compiler global settings
CXXFLAGS = -I $(XILINX_SDX)/runtime/include/1_2/ -I/$(XILINX_SDX)/Vivado_HLS/include/ -O0 -g -Wall -fmessage-length=0 -std=c++14
LDFLAGS = -lOpenCL -lpthread -lrt -lstdc++ -L$(XILINX_SDX)/runtime/lib/x86_64

# Kernel compiler global settings
CLFLAGS = -t $(TARGET) --platform $(DEVICE) --save-temps 
CLFLAGS += --xp "param:compiler.preserveHlsOutput=1" --xp "param:compiler.generateExtraRunData=true"


#'estimate' for estimate report generation
#'system' for system report generation
ifneq ($(REPORT), no)
CLFLAGS += --report estimate
CLLDFLAGS += --report system
endif

#Generates profile summary report
ifeq ($(PROFILE), yes)
CLFLAGS += --profile_kernel data:all:all:all
endif

#Generates debug summary report
ifeq ($(DEBUG), yes)
CLFLAGS += --dk protocol:all:all:all
endif

EXECUTABLE = host

BINARY_CONTAINERS += $(XCLBIN)/mmult.$(TARGET).$(DSA).xclbin
BINARY_CONTAINER_1_OBJS += $(XCLBIN)/mmult.$(TARGET).$(DSA).xo
ALL_KERNEL_OBJS += $(XCLBIN)/mmult.$(TARGET).$(DSA).xo

#Include Libraries
include $(ABS_COMMON_REPO)/libs/opencl/opencl.mk
include $(ABS_COMMON_REPO)/libs/xcl2/xcl2.mk
CXXFLAGS += $(xcl2_CXXFLAGS)
LDFLAGS += $(xcl2_LDFLAGS)
HOST_SRCS += $(xcl2_SRCS)

CP = cp -rf

.PHONY: all clean cleanall docs
all: $(EXECUTABLE) $(BINARY_CONTAINERS)

.PHONY: exe
exe: $(EXECUTABLE)

# Building kernel
$(XCLBIN)/mmult.$(TARGET).$(DSA).xo: ./src/mmult.cl
	mkdir -p $(XCLBIN)
	$(XOCC) $(CLFLAGS) -c -k mmult -I'$(<D)' -o'$@' '$<'

$(XCLBIN)/mmult.$(TARGET).$(DSA).xclbin: $(BINARY_CONTAINER_1_OBJS)
	$(XOCC) $(CLFLAGS) -l $(LDCLFLAGS) --nk mmult:1 -o'$@' $(+)

# Building Host
$(EXECUTABLE): $(HOST_SRCS)
	mkdir -p $(XCLBIN)
	$(CXX) $(CXXFLAGS) $(HOST_SRCS) -o '$@' $(LDFLAGS)

check: all
ifeq ($(TARGET),$(filter $(TARGET),sw_emu hw_emu))
	emconfigutil --platform $(DEVICE) --od .
	XCL_EMULATION_MODE=$(TARGET) ./$(EXECUTABLE)
	sdx_analyze profile -i sdaccel_profile_summary.csv -f html
endif

# Cleaning stuff
RM = rm -f
RMDIR = rm -rf
clean:
	-$(RMDIR) $(EXECUTABLE) $(XCLBIN)/{*sw_emu*,*hw_emu*} 
	-$(RMDIR) sdaccel_* TempConfig system_estimate.xtxt *.rpt
	-$(RMDIR) src/*.ll _xocc_* .Xil emconfig.json dltmp* xmltmp* *.log *.jou *.wcfg *.wdb

cleanall: clean
	-$(RMDIR) $(XCLBIN)
	-$(RMDIR) ./_x

ECHO:= @echo

.PHONY: help

help::
	$(ECHO) "Makefile Usage:"
	$(ECHO) "  make all TARGET=<sw_emu/hw_emu/hw> DEVICE=<FPGA platform>"
	$(ECHO) "      Command to generate the design for specified Target and Device."
	$(ECHO) ""
	$(ECHO) "  make clean "
	$(ECHO) "      Command to remove the generated non-hardware files."
	$(ECHO) ""
	$(ECHO) "  make cleanall"
	$(ECHO) "      Command to remove all the generated files."
	$(ECHO) ""
	$(ECHO) "  make check TARGET=<sw_emu/hw_emu/hw> DEVICE=<FPGA platform>"
	$(ECHO) "      Command to run application in emulation."
	$(ECHO) ""

docs: README.md

README.md: description.json
	$(ABS_COMMON_REPO)/utility/readme_gen/readme_gen.py description.json

OCL Setup function (modified from "mmult_fpga" from the same example):

//Functionality to setup OpenCL context and trigger the Kernel
uint64_t RKleene_fpga (
    std::vector<int,aligned_allocator<int>>& source_in1,   //Input Matrix 1
    std::vector<int,aligned_allocator<int>>& source_fpga_results,    //Output Matrix
    int dim                                         //One dimension of matrix
)
{
    int size = dim, halfsize = size/2;    
    size_t matrix_size_bytes = sizeof(int) * size * size;

    cl::Event event;
    uint64_t kernel_duration = 0;

    //The get_xil_devices will return vector of Xilinx Devices 
    std::vector<cl::Device> devices = xcl::get_xil_devices();
    cl::Device device = devices[0];

    //Creating Context and Command Queue for selected Device
    cl::Context context(device);
    cl::CommandQueue q1(context, device, CL_QUEUE_PROFILING_ENABLE);
    cl::CommandQueue q2(context, device, CL_QUEUE_PROFILING_ENABLE);
    std::string device_name = device.getInfo<CL_DEVICE_NAME>(); 

    //import_binary() command will find the OpenCL binary file created using the 
    //xocc compiler load into OpenCL Binary and return as Binaries
    //OpenCL and it can contain many functions which can be executed on the
    //device.
    std::string binaryFile = xcl::find_binary_file(device_name,"mmult");
    cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
    devices.resize(1);
    cl::Program program(context, devices, bins);

    //This call will extract a kernel out of the program we loaded in the
    //previous line. A kernel is an OpenCL function that is executed on the
    //FPGA. This function is defined in the src/mmult.cl file.
    cl::Kernel kernel1(program,"rKleene_mmultA");
    cl::Kernel kernel2(program,"rKleene_mmultB");
    cl::Kernel kernel3(program,"mmult");

    //check if block is small enough for FW, else do R-Kleene.
    if(source_in1.size() <= BSIZE * BSIZE){
	/*********************************************
			    FW
	*********************************************/
	//These commands will allocate memory on the FPGA. The cl::Buffer
	//objects can be used to reference the memory locations on the device.
	//The cl::Buffer object cannot be referenced directly and must be passed
	//to other OpenCL functions.
	cl::Buffer buffer_in1(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,source_in1.data());    
	cl::Buffer buffer_output(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,source_fpga_results.data());

    	//These commands will load the source_in1 and source_in2 vectors from the host
    	//application into the buffer_in1 and buffer_in2 cl::Buffer objects. The data
    	//will be be transferred from system memory over PCIe to the FPGA on-board
    	//DDR memory.
    	q1.enqueueMigrateMemObjects({buffer_in1},0/* 0 means from host*/);

    	//Set the kernel arguments
    	int narg = 0;
    	kernel3.setArg(narg++, buffer_in1);
    	kernel3.setArg(narg++, buffer_output);
    	kernel3.setArg(narg++, size);

    	//Launch the kernel
    	q1.enqueueTask(kernel3, NULL, &event);

    	//The result of the previous kernel execution will need to be retrieved in
    	//order to view the results. This call will write the data from the
    	//buffer_output cl_mem object to the source_fpga_results vector
    	q1.enqueueMigrateMemObjects({buffer_output},CL_MIGRATE_MEM_OBJECT_HOST);
    	q1.finish();

	kernel_duration += get_duration_ns(event);
    }else{
	int Aind = 0, Bind = 0, Cind = 0, Dind = 0;
	std::vector<int,aligned_allocator<int>> A, B, C, D,
         Atmp, Btmp, Ctmp, Dtmp;
	//Initialize A, B, C & D
    	for(int i = 0; i < dim; i++) {
            for(int j = 0; j < dim; j++) {
		if(i < dim/2 && j < dim/2){
		    A[Aind] = source_in1[i * dim + j];
		    Aind++;
		}
		if(i < dim/2 && j >= dim/2){
		    B[Bind] = source_in1[i * dim + j];
		    Bind++;
		}
		if(i >= dim/2 && j <= dim/2){
		    C[Cind] = source_in1[i * dim + j];
		    Cind++;
		}
		if(i >= dim/2 && j >= dim/2){
		    D[Dind] = source_in1[i * dim + j];
		    Dind++;
		}
	    }
    	}
	//Perform R-Kleene computations
	//kernel_duration += RKleene_fpga(A, Atmp, dim/2);

	/*********************************************
			r-kleene_mmultA
	*********************************************/
	//These commands will allocate memory on the FPGA. The cl::Buffer
	//objects can be used to reference the memory locations on the device.
	//The cl::Buffer object cannot be referenced directly and must be passed
	//to other OpenCL functions.
	cl::Buffer buffer_inA(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,Atmp.data());    
	cl::Buffer buffer_inB(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,B.data()); 
	cl::Buffer buffer_inC(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,C.data()); 
	cl::Buffer buffer_inD(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,D.data()); 
	cl::Buffer buffer_outB(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,Btmp.data());
	cl::Buffer buffer_outC(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,Ctmp.data());
	cl::Buffer buffer_outD(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,Dtmp.data());

    	//These commands will load the source_in1 and source_in2 vectors from the host
    	//application into the buffer_in1 and buffer_in2 cl::Buffer objects. The data
    	//will be be transferred from system memory over PCIe to the FPGA on-board
    	//DDR memory.
    	q2.enqueueMigrateMemObjects({buffer_inA, buffer_inB, buffer_inC, buffer_inD},0/* 0 means from host*/);

    	//Set the kernel arguments
    	int narg = 0;
    	kernel1.setArg(narg++, buffer_inA);
    	kernel1.setArg(narg++, buffer_inB);
    	kernel1.setArg(narg++, buffer_inC);
    	kernel1.setArg(narg++, buffer_inD);
    	kernel1.setArg(narg++, buffer_outB);
    	kernel1.setArg(narg++, buffer_outC);
    	kernel1.setArg(narg++, buffer_outD);
    	kernel1.setArg(narg++, halfsize);

    	//Launch the kernel
    	q2.enqueueTask(kernel1, NULL, &event);

    	//The result of the previous kernel execution will need to be retrieved in
    	//order to view the results. This call will write the data from the
    	//buffer_output cl_mem object to the source_fpga_results vector
    	q2.enqueueMigrateMemObjects({buffer_outB, buffer_outC, buffer_outD},CL_MIGRATE_MEM_OBJECT_HOST);
    	q2.finish();//?

	kernel_duration += get_duration_ns(event);

	//kernel_duration += RKleene_fpga(Dtmp, D, dim/2);

	/*********************************************
			r-kleene_mmultB
	*********************************************/
	//These commands will allocate memory on the FPGA. The cl::Buffer
	//objects can be used to reference the memory locations on the device.
	//The cl::Buffer object cannot be referenced directly and must be passed
	//to other OpenCL functions.
	cl::Buffer buffer2_inA(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,Atmp.data());    
	cl::Buffer buffer2_inB(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,Btmp.data()); 
	cl::Buffer buffer2_inC(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,Ctmp.data()); 
	cl::Buffer buffer2_inD(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            matrix_size_bytes,D.data()); 
	cl::Buffer buffer2_outB(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,B.data());
	cl::Buffer buffer2_outC(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,C.data());
	cl::Buffer buffer2_outA(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 
            matrix_size_bytes,A.data());

    	//These commands will load the source_in1 and source_in2 vectors from the host
    	//application into the buffer_in1 and buffer_in2 cl::Buffer objects. The data
    	//will be be transferred from system memory over PCIe to the FPGA on-board
    	//DDR memory.
    	q2.enqueueMigrateMemObjects({buffer_inA, buffer_inB, buffer_inC, buffer_inD},0/* 0 means from host*/);

    	//Set the kernel arguments
    	int narg2 = 0;
    	kernel2.setArg(narg2++, buffer2_inA);
    	kernel2.setArg(narg2++, buffer2_inB);
    	kernel2.setArg(narg2++, buffer2_inC);
    	kernel2.setArg(narg2++, buffer2_inD);
    	kernel2.setArg(narg2++, buffer2_outB);
    	kernel2.setArg(narg2++, buffer2_outC);
    	kernel2.setArg(narg2++, buffer2_outA);
    	kernel2.setArg(narg2++, halfsize);

    	//Launch the kernel
    	q2.enqueueTask(kernel2, NULL, &event);

    	//The result of the previous kernel execution will need to be retrieved in
    	//order to view the results. This call will write the data from the
    	//buffer_output cl_mem object to the source_fpga_results vector
    	q2.enqueueMigrateMemObjects({buffer2_outB, buffer2_outC, buffer2_outA},CL_MIGRATE_MEM_OBJECT_HOST);
    	q2.finish();

	kernel_duration += get_duration_ns(event);
    }
    return kernel_duration;
}

Thanks,

0 Kudos
Xilinx Employee
Xilinx Employee
970 Views
Registered: ‎01-12-2017

Re: Multiple kernels in one .cl file Error

Jump to solution

Hi @mchirila,

 

Your make file must be updated to support compilation of all the three kernels that you are using.

For your reference please have a look into documentation below (look for -k option). 

 

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_4/ug1023-sdaccel-user-guide.pdf (Pages: 41-43)

 

Thanks

Kali