UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
2,669 Views
Registered: ‎06-17-2012

failed to extend data width of a stream design

Jump to solution

I have a stream design and works fine. 

In order to further improve the bandwidth utilization, I tried to 

extend the design with wider data width. The extended design 

is all right in sw_emu mode, but fails to complete in hw_emu and hw mode

which seems to be an infinite loop.

 

In hw_mode, it shows that the kernel fails to get data from gmem. And I got the following 

information. Also I checked the timeline file, but it is empty.

INFO: [SDx-EM 22] [Wall clock time: 00:21, Emulation time: 12.12 ms] Data transfer between kernel(s) and global memory(s)
BANK0 RD = 0.000 KB WR = 0.000 KB

 

The stream design includes three function read_input(), compute() and write_back(). Both the read_input() and write_back() has been used in similar design. Really appreciated for any suggestions on this problem.

 

The kernel code is attached here.

//Includes 
#include <hls_stream.h>
#include <ap_int.h>
#include <stdio.h>

typedef ap_uint<1> uint1_dt;
typedef ap_int<512> int512_dt;

#define BUFFER_SIZE 128
#define WORD_NUM 16 //# of integer in a wide data

static void read_input(
		int512_dt *in, 
		hls::stream<int512_dt> &in_stream,
        int len)
{
    seq_depth: for (int i = 0; i < len; i++){
#pragma HLS pipeline
        in_stream << in[i];
    }
}


static void compute(
		hls::stream<int512_dt> &in_stream, 
		hls::stream<uint1_dt> &done_stream,
		hls::stream<int> &out_stream,
		int len)
{
	int512_dt data;
	int item;
    compute: for (int i = 0; i < len; i++){
#pragma HLS pipeline
		data = in_stream.read();
		for(int j = 0; j < WORD_NUM; j++){
			item = data.range((j+1)*32-1, j*32);
			if(item > 10){
				out_stream << (item + 10);
			}	
			if((i == len - 1) && (j == WORD_NUM -1)){
				done_stream << 1;
			}
		}
	}
}


static void write_back(
		hls::stream<int> &out_stream,
		hls::stream<uint1_dt> &done_stream,
		int *out
		)
{
	int idx = 0;
	int count = 0;
	uint1_dt done = 0;
	uint1_dt done_empty = 0;
	uint1_dt stream_empty = 0;
	int buffer[BUFFER_SIZE];

	while((stream_empty != 1) || (done != 1)){
		stream_empty = out_stream.empty();
		done_empty = done_stream.empty();
		if(stream_empty != 1){
			buffer[count++] = out_stream.read();
		}

		if(done_empty != 1){
			done = done_stream.read();
		}

		if((count == BUFFER_SIZE) || ((count > 0) && (count < BUFFER_SIZE) && (done == 1))){
			for(int i = 0; i < count; i++){
#pragma HLS pipeline
				out[idx + i] = buffer[i];
			}
			idx += count;
			count = 0;
		}
	}
}

extern "C" {
void cnd_stream(int512_dt *in, int *out, int size){
#pragma HLS INTERFACE m_axi port=in  offset=slave bundle=gmem
#pragma HLS INTERFACE m_axi port=out offset=slave bundle=gmem
#pragma HLS INTERFACE s_axilite port=in  bundle=control
#pragma HLS INTERFACE s_axilite port=out bundle=control
#pragma HLS INTERFACE s_axilite port=len bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control

        hls::stream<int512_dt> in_stream;
	hls::stream<int> out_stream;
	hls::stream<uint1_dt> done_stream;

	int len = size / WORD_NUM;

#pragma HLS STREAM variable=in_stream depth=16
#pragma HLS STREAM variable=out_stream depth=64
#pragma HLS STREAM variable=done_stream depth=16

#pragma HLS dataflow
    //dataflow pragma instruct compiler to run following three APIs in parallel
    read_input(in, in_stream, len);
    compute(in_stream, done_stream, out_stream, len);
    write_back(out_stream, done_stream, out);
}
}

 

0 Kudos
1 Solution

Accepted Solutions
Moderator
Moderator
4,699 Views
Registered: ‎03-27-2012

Re: failed to extend data width of a stream design

Jump to solution

Although use arguments of different bitwidth is common in software design, currently it will not work on hardware if these arguments are bundled in same AXI master interface. In this case, port in of ap_int<512> and port out of int32 are bundled together. As a result, both ports don't work, no data is read in and hw_emu stalls. 

 

Another problem is the depth of stream. As the data width is extended, read_input module read far more data than before in one call, inadequate stream depth will also stall the function when the stream is full.

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

4 Replies
Moderator
Moderator
2,628 Views
Registered: ‎03-27-2012

Re: failed to extend data width of a stream design

Jump to solution

Hi Liucheng,

 

Can you also attach the host code and makefile?

 

Regards,

Sean

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Explorer
Explorer
2,611 Views
Registered: ‎06-17-2012

Re: failed to extend data width of a stream design

Jump to solution

hi, @seanz

 

Thanks for the reply.

Here is the host code and makefile.

 

//host code

#include <iostream>
#include <cstring>
#include <cstdlib>

//OpenCL utility layer include
#include "xcl.h"

#define DATA_SIZE (16*1024*1024)
#define INC 10

int main(int argc, char** argv)
{
    //Allocate Memory in Host Memory
    size_t vector_size_bytes = sizeof(int) * DATA_SIZE;

    int *source_input       = (int *) malloc(vector_size_bytes);
    int *source_hw_results  = (int *) malloc(vector_size_bytes);
    int *source_sw_results  = (int *) malloc(vector_size_bytes);

    // Create the test data and Software Result 
    for(int i = 0 ; i < DATA_SIZE ; i++){
        source_input[i] = rand()%100;
        source_sw_results[i] = -1;
        source_hw_results[i] = -1;
    }

	int idx = 0;
	for(int i = 0; i < DATA_SIZE; i++){
		if(source_input[i] > INC){
			source_sw_results[idx] = source_input[i] + INC;
			idx++;
		}
	}
	std::cout << "# of write back: " << idx << std::endl;

//OPENCL HOST CODE AREA START
    //Create Program and Kernel
    xcl_world world = xcl_world_single();
    cl_program program = xcl_import_binary(world, "cnd_stream");
    cl_kernel krnl_cnd_stream = xcl_get_kernel(program, "cnd_stream");

    //Allocate Buffer in Global Memory
    cl_mem buffer_input  = xcl_malloc(world, CL_MEM_READ_ONLY, vector_size_bytes);
    cl_mem buffer_output = xcl_malloc(world, CL_MEM_READ_WRITE, vector_size_bytes);

    //Copy input data to device global memory
    xcl_memcpy_to_device(world,buffer_input,source_input,vector_size_bytes);
    xcl_memcpy_to_device(world,buffer_output,source_hw_results,vector_size_bytes);

    int size = DATA_SIZE;
    //Set the Kernel Arguments
    xcl_set_kernel_arg(krnl_cnd_stream,0,sizeof(cl_mem),&buffer_input);
    xcl_set_kernel_arg(krnl_cnd_stream,1,sizeof(cl_mem),&buffer_output);
    xcl_set_kernel_arg(krnl_cnd_stream,2,sizeof(int),&size);

	std::cout << "start launching the program." << std::endl;
    //Launch the Kernel
    unsigned long duration = xcl_run_kernel3d(world,krnl_cnd_stream,1,1,1);

    //Copy Result from Device Global Memory to Host Local Memory
    xcl_memcpy_from_device(world, source_hw_results, buffer_output,vector_size_bytes);
    clFinish(world.command_queue);

	double bandwidth = ((DATA_SIZE + idx) * sizeof(int) / 1024.0 / 1024.0) / (duration * 1.0 / 1000000000);
	std::cout << "Measured bandwidth is " << bandwidth << " MB/s" << std::endl; 

    //Release Device Memories and Kernels
    clReleaseMemObject(buffer_input);
    clReleaseMemObject(buffer_output);
    clReleaseKernel(krnl_cnd_stream);
    clReleaseProgram(program);
    xcl_release_world(world);
//OPENCL HOST CODE AREA END
    
    // Compare the results of the Device to the simulation
    int match = 0;
    for (int i = 0 ; i < DATA_SIZE ; i++){
        if (source_hw_results[i] != source_sw_results[i]){
            std::cout << "Error: Result mismatch" << std::endl;
            std::cout << "i = " << i << " CPU result = " << source_sw_results[i]
                << " Device result = " << source_hw_results[i] << std::endl;
            match = 1;
			break;
        }
    }

    /* Release Memory from Host Memory*/
    free(source_input);
    free(source_hw_results);
    free(source_sw_results);

    if (match){
        std::cout << "TEST FAILED." << std::endl; 
        return EXIT_FAILURE;
    }
    std::cout << "TEST PASSED." << std::endl; 
    return EXIT_SUCCESS; 
}

Here is the Makefile

COMMON_REPO := ../../../

include $(COMMON_REPO)/utility/boards.mk
include $(COMMON_REPO)/libs/xcl/xcl.mk
include $(COMMON_REPO)/libs/opencl/opencl.mk

# Host Application
host_SRCS=./src/host.cpp $(xcl_SRCS)
host_HDRS=$(xcl_HDRS)
host_CXXFLAGS=-I./src/ $(xcl_CXXFLAGS) $(opencl_CXXFLAGS) --debug 
host_LDFLAGS=$(opencl_LDFLAGS) 
EXES=host

# Kernel
cnd_stream_SRCS=./src/cnd_stream.cpp
cnd_stream_CLFLAGS= --kernel cnd_stream 
XOS=cnd_stream

# xclbin
cnd_stream_XOS=cnd_stream

XCLBINS=cnd_stream

# check
check_EXE=host
check_XCLBINS=cnd_stream
DEVICES=xilinx:adm-pcie-7v3:1ddr:3.0
TARGETS=sw_emu

CHECKS=check

include $(COMMON_REPO)/utility/rules.mk

Regards,

Cheng Liu

0 Kudos
Moderator
Moderator
4,700 Views
Registered: ‎03-27-2012

Re: failed to extend data width of a stream design

Jump to solution

Although use arguments of different bitwidth is common in software design, currently it will not work on hardware if these arguments are bundled in same AXI master interface. In this case, port in of ap_int<512> and port out of int32 are bundled together. As a result, both ports don't work, no data is read in and hw_emu stalls. 

 

Another problem is the depth of stream. As the data width is extended, read_input module read far more data than before in one call, inadequate stream depth will also stall the function when the stream is full.

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

Explorer
Explorer
2,568 Views
Registered: ‎06-17-2012

Re: failed to extend data width of a stream design

Jump to solution

Hi, @seanz,

 

Thank you very much for the help.

Yes, here it is caused by the different data width setup bundled in the same AXI master.

After bundling the two arguments to separate axi masters, the design works as expected.

 

Sure, I will be more careful while setting up the stream fifo depth. In this design, the setup 

works just fine. 

 

Regards,

Cheng Liu

0 Kudos