cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Contributor
Contributor
9,719 Views
Registered: ‎02-27-2014

How to use vivado hls::mat with AXI-Stream interfaces (not AXI4 video stream) ?

Jump to solution

  Hello, everyone. I am trying to design a image processing IP core with vivado hls 2014.4. From xapp1167, I have known that video functions provided by vivado hls should be used with AXI4 video stream and VDMA. However, I want to write/read image data to/from the Ip core through AXI stream interfaces and AXI-DMA for some special reasons.

  To verify the feasibility, a test IP core named detectTest was designed as follows. The function of this IP core is reading a 320x240 8 bit gray image (bit 7-0 of INPUT_STREAM_TDATA) from the axis port "INPUT_STREAM” and then output it with no changes. I fabricated a vivado project of zedboard and then test the IP core with a AXI-DMA. Experimental results show that the IP core works normally. So it seems possible to use hls::mat with axis. 

 

#include "hls_video.h"
#include "hls_math.h"

typedef ap_axiu<32, 1, 1, 1> AXI_VAL;
typedef hls::Scalar<HLS_MAT_CN(HLS_8U), HLS_TNAME(HLS_8U)> GRAY_PIXEL;
typedef hls::Mat<240, 320, HLS_8U> GRAY_IMAGE;

#define HEIGHT 240
#define WIDTH  320
#define COMPRESS_SIZE 2

template<typename T, int U, int TI, int TD>
inline T pop_stream(ap_axiu<sizeof(T) * 8, U, TI, TD> const &e) {
#pragma HLS INLINE off
	assert(sizeof(T) == sizeof(int));
	union {
		int ival;
		T oval;
	} converter;
	converter.ival = e.data;
	T ret = converter.oval;

	volatile ap_uint<sizeof(T)> strb = e.strb;
	volatile ap_uint<sizeof(T)> keep = e.keep;
	volatile ap_uint<U> user = e.user;
	volatile ap_uint<1> last = e.last;
	volatile ap_uint<TI> id = e.id;
	volatile ap_uint<TD> dest = e.dest;

	return ret;
}

template<typename T, int U, int TI, int TD>
inline ap_axiu<sizeof(T) * 8, U, TI, TD> push_stream(T const &v, bool last =
		false) {
#pragma HLS INLINE off
	ap_axiu<sizeof(T) * 8, U, TI, TD> e;

	assert(sizeof(T) == sizeof(int));
	union {
		int oval;
		T ival;
	} converter;
	converter.ival = v;
	e.data = converter.oval;

	// set it to sizeof(T) ones
	e.strb = -1;
	e.keep = 15; //e.strb;
	e.user = 0;
	e.last = last ? 1 : 0;
	e.id = 0;
	e.dest = 0;
	return e;
}

GRAY_IMAGE mframe(HEIGHT, WIDTH);

void detectTest(AXI_VAL INPUT_STREAM[HEIGHT * WIDTH], AXI_VAL RESULT_STREAM[HEIGHT * WIDTH]) { #pragma HLS INTERFACE ap_fifo port=RESULT_STREAM #pragma HLS INTERFACE ap_fifo port=INPUT_STREAM #pragma HLS RESOURCE variable=RESULT_STREAM core=AXI4Stream metadata="-bus_bundle RESULT_STREAM" #pragma HLS RESOURCE variable=INPUT_STREAM core=AXI4Stream metadata="-bus_bundle INPUT_STREAM" #pragma HLS RESOURCE variable=return core=AXI4LiteS metadata="-bus_bundle CONTROL_STREAM" int i, j; for (i = 0; i < HEIGHT * WIDTH; i++) { unsigned int instream_value = pop_stream<unsigned int, 1, 1, 1>(INPUT_STREAM[i]); hls::Scalar<HLS_MAT_CN(HLS_8U), HLS_TNAME(HLS_8U)> pixel_in; *(pixel_in.val) = (unsigned char) instream_value; mframe << pixel_in; hls::Scalar<HLS_MAT_CN(HLS_8U), HLS_TNAME(HLS_8U)> pixel_out; mframe >> pixel_out; unsigned int outstream_value = (unsigned int) *(pixel_out.val); RESULT_STREAM[i] = push_stream<unsigned int, 1, 1, 1>( (unsigned int) outstream_value, i == HEIGHT * WIDTH - 1); } return; }

  Then I tried to modify the function of detectTest as follow. The function of the modified IP core is resizing the input image and then recoverying its original size. However, it did not work fine in the AXI-DMA test. The waveform captured by chipscope show that the ready signal of INPUT_STREAM was cleared after recieving servel pixels. 

 

GRAY_IMAGE mframe(HEIGHT, WIDTH);
GRAY_IMAGE mframe_resize(HEIGHT / COMPRESS_SIZE, WIDTH / COMPRESS_SIZE);

void detectTest(AXI_VAL INPUT_STREAM[HEIGHT * WIDTH], AXI_VAL RESULT_STREAM[HEIGHT * WIDTH]) {
#pragma HLS INTERFACE ap_fifo port=RESULT_STREAM
#pragma HLS INTERFACE ap_fifo port=INPUT_STREAM

#pragma HLS RESOURCE variable=RESULT_STREAM core=AXI4Stream metadata="-bus_bundle RESULT_STREAM"
#pragma HLS RESOURCE variable=INPUT_STREAM core=AXI4Stream metadata="-bus_bundle INPUT_STREAM"
#pragma HLS RESOURCE variable=return core=AXI4LiteS metadata="-bus_bundle CONTROL_STREAM"

	int i, j;

	for (i = 0; i < HEIGHT * WIDTH; i++) {//receiving block
		unsigned int instream_value = pop_stream<unsigned int, 1, 1, 1>(INPUT_STREAM[i]);
		hls::Scalar<HLS_MAT_CN(HLS_8U), HLS_TNAME(HLS_8U)> pixel_in;
		*(pixel_in.val) = (unsigned char) instream_value;
		mframe << pixel_in;
	}
hls::Resize(mframe, mframe_resize);
hls::Resize(mframe_resize, mframe); for (i = 0; i < HEIGHT * WIDTH; i++) {//transmitting block hls::Scalar<HLS_MAT_CN(HLS_8U), HLS_TNAME(HLS_8U)> pixel_out; mframe>>pixel_out; unsigned char outstream_value=*(pixel_out.val); RESULT_STREAM[i] = push_stream<unsigned int, 1, 1, 1>((unsigned int) outstream_value, i == HEIGHT * WIDTH - 1); } return; }

  I also tried to delete or modify the following 2 lines in the modified IP core. But the transmitting problem existed too. It seems that the IP core cannot work normally if the receiving block and the transmitting block in different "for" loops. But if I did not solve this problem, the image processing functions cannot be added into the IP core either. The document of xapp1167 mentioned that " the hls::Mat<> datatype used to model images is internally defined as a stream of pixels". Does that caused the problem? And how can I solve this problem? Thanks a lot !

hls::Resize(mframe, mframe_resize);
hls::Resize(mframe_resize, mframe);

 

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
17,192 Views
Registered: ‎08-17-2011

Hello @pan_shaowu

 

 

 

So the major concept that you need to learn/remember is that hls::Mat<> is basically "only" an hls stream -- hls::stream<> -- It's actually an array of N channels (and you have N=1).

 

Next, streams are fifos; in software that's modeled as infinite queues but in HW they have finite size.

The default value is a depth of 2 (IIRC)

in your first code you do :

for all pixels loop {

  .. something to read pixel_in

   mframe takes pixel_in

   pixel_out is read from mframe

   .. wirte out pixel_out

} // end loop

 

If you notice, mframe has never more than one pixel element inside since as soon as you write to it, you unload it. in other terms mframe never contains a full frame of pixel (but a full frame flow through it!).

 

In your second coding, mframe has to actually contain all the pixels as you have 2 for loops and you don't start unloading the pixels unless you have the first loop complete.

Needless to say that your fifo had a depth of 2 so actually you never read more than 3 pixels in.

That's why you see that the ready signal of the iput stream drops after a few pixels; that's the back pressure being applied by the VHLS block.

 

Where to go from there?

 

Well first stop doing FPGA tests and chipscope if you did not run cosim first and that it passed.

you would have done cosim and it had failed - or got stuck - then you would have debugged there, rather than waiting for a bitstream to implement.

 

Check UG902 about cosim and self checking testbench. maybe for video you can't have selfchecking so at least you need to have visual checks of generated pictures - you can adapt XAPP1167 for that.

 

For your design, you could increased the depth of the stream - the XAPP1167 explains that, but here it's impractical or sometimes impossible to buffer a full size frame.

If you check carefully the XAPP, the design operates in "dataflow" mode; check UG902 as to what this means.

In short, dataflow means that the HW functions will operate in parallel, and here the second loop will start executing as soon as data has been generated in the first loop - if you understand, the links between the loops is a stream / fifo, so as soon as a data is generated in the first loop, the second loop could process that; this is possible because the processing happens in sequential order.

 

Well I leave you to read more.

 

I hope this helps....

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.

View solution in original post

0 Kudos
4 Replies
Highlighted
Contributor
Contributor
9,609 Views
Registered: ‎02-27-2014

I guess I find the reason cause the problem. I check the Analysis perspective of the vivado hls project.

When I use a single "for" loop, the performance diagram of the exporting module is as shown in the left image. While the performance diagram of the exporting module is as shown in the right image when I use two "for" loops. 

1.jpg    2.jpg

We can see that the read and write operations are executed in the same time when two "for" loops are employed. That cause conflicts and finnally cause the AXI-Stream problem. I think this problem is caused by my improper design optimization settings. Maybe I have to use some directives (such as "HLS PIPELINE off") to force the compiler to generate a module executing sequentially. Can anyone give me some advices ? Thanks a lot !

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
17,193 Views
Registered: ‎08-17-2011

Hello @pan_shaowu

 

 

 

So the major concept that you need to learn/remember is that hls::Mat<> is basically "only" an hls stream -- hls::stream<> -- It's actually an array of N channels (and you have N=1).

 

Next, streams are fifos; in software that's modeled as infinite queues but in HW they have finite size.

The default value is a depth of 2 (IIRC)

in your first code you do :

for all pixels loop {

  .. something to read pixel_in

   mframe takes pixel_in

   pixel_out is read from mframe

   .. wirte out pixel_out

} // end loop

 

If you notice, mframe has never more than one pixel element inside since as soon as you write to it, you unload it. in other terms mframe never contains a full frame of pixel (but a full frame flow through it!).

 

In your second coding, mframe has to actually contain all the pixels as you have 2 for loops and you don't start unloading the pixels unless you have the first loop complete.

Needless to say that your fifo had a depth of 2 so actually you never read more than 3 pixels in.

That's why you see that the ready signal of the iput stream drops after a few pixels; that's the back pressure being applied by the VHLS block.

 

Where to go from there?

 

Well first stop doing FPGA tests and chipscope if you did not run cosim first and that it passed.

you would have done cosim and it had failed - or got stuck - then you would have debugged there, rather than waiting for a bitstream to implement.

 

Check UG902 about cosim and self checking testbench. maybe for video you can't have selfchecking so at least you need to have visual checks of generated pictures - you can adapt XAPP1167 for that.

 

For your design, you could increased the depth of the stream - the XAPP1167 explains that, but here it's impractical or sometimes impossible to buffer a full size frame.

If you check carefully the XAPP, the design operates in "dataflow" mode; check UG902 as to what this means.

In short, dataflow means that the HW functions will operate in parallel, and here the second loop will start executing as soon as data has been generated in the first loop - if you understand, the links between the loops is a stream / fifo, so as soon as a data is generated in the first loop, the second loop could process that; this is possible because the processing happens in sequential order.

 

Well I leave you to read more.

 

I hope this helps....

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.

View solution in original post

0 Kudos
Highlighted
Contributor
Contributor
9,560 Views
Registered: ‎02-27-2014

Thanks, the problem has been solved.

0 Kudos
Highlighted
Visitor
Visitor
9,227 Views
Registered: ‎07-16-2013

Hi,

I wonder whether you have used some 3rd party imaging toolkits to help you resize the input image?

0 Kudos