UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer mh9840
Observer
641 Views
Registered: ‎01-14-2019

Bit of confusion - Designing a Stream data processing IP

Jump to solution

 

Hello all. I've recently started using Vivado HLS, and I'm having a bit of confusion on designing a IP that processes Streaming data.

So far the design is as following :

 

void top_func (

    hls::stream<> &in

    hls::stream<> &out

) {

    array A[], B[];

 

    //loop n times of  in.read(), storing to array A

   

    process(A,B);     //this would look like a simple c++ function, doing only arithmetic operations

 

    //loop n times of out.write(), writing array B to output stream

}

 

So the IP reads in large data through Stream interface, process it with simple function, then write the output array through Stream interface.

 

Question is : Is this the right way to design a Streaming IP, when I want to run it multiple times?

Designing such IP in Verilog usually involves in FSM design with infinite loop. But with the above C++ implementation, it 'looks like' running one single time and that's it.

I will eventually port the design to a FPGA, design a Application Program in SDK, and run it. (still learning about how to connect Streaming Interfaces with the processor)

 

I would appreciate if someone could explain how the top functions are to be designed in HLS. And if I need any specific implementation methods to run the exported IP in a Application Program.

 

+) One more question. I have some global variables declared in a header file of the top function. The values remain even if I 'call' the top function multiple times, correct?

0 Kudos
1 Solution

Accepted Solutions
Scholar u4223374
Scholar
561 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

HLS doesn't really do infinite loops, because obviously that makes running a simulation difficult. For streams, I'd normally do a loop until you detect a stream element with the TLAST bit set - and then stop. For a test you can just set TLAST at the end of your test data; on the FPGA you tie TLAST to 0 and Vivado will happily synthesize away all of the associated logic.

 

Your understanding of the read behaviour is correct, as is your understanding of the testbench and multiple calls. For my work (image processing), I normally set up my blocks so that they process exactly one image worth of data and then stop. Having the processor restart the block 30 times per second (for 30 FPS video) is not a problem.

 

Generally with streams I wouldn't use two arrays (eg. A and B). The ideal case is that you just read from one stream, do the process, and output to the other stream - with essentially no storage in the block. You can do this for simply operations like colour conversion in an image. For more complex operations you might need a line (or several lines) of data stored before you can process it. In this case, copy the data to array A, but then process from A directly to the output stream (ie don't write to an intermediate array). This saves both time and resources.

6 Replies
Explorer
Explorer
606 Views
Registered: ‎07-18-2018

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

hi mh9840,

    There are examples of streams and axi streams in the tool that might be helpful places to start. Also check of UG 902. It's the best reference to understand what different HLS constructs are trying to do.

So the basics of the HLS:STREAM is instead of the top level interfaces being implemented as a RAM that you do a read/write from, it's a Fifo.

You write into the stream or read from the stream. The rule is that the data has to be accessed in a sequential order. Once you read, you can't access it again. It's basically going to be the same as an ap_fifo interface on the top.

Your IP block should be able to while running keep reading data from the stream, doing a calculation, and writing it back to the stream. If this takes multiple cycles, and you want to have it return a value every cycle after a certain amount has been fed into it, you will likely need to pipeline or dataflow the block.

Something that is helpful is to implement the HLS block, and then look at the RTL code generated. It highlights what the interface will look like, how the control signals start, stop, and reset the block.

 

Observer mh9840
Observer
584 Views
Registered: ‎01-14-2019

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

Thank you for the help. I found 'Using HLS Streams' in UG 902. Along with '2D_convolution_with_linebuffer' example in Xilinx directory.

To clarify I understand how it works : If I want the function to read 100 data and then process it, looping blocking read (in.read()) 100 times would make it work in proper sequence, correct?

+) And I have to call the top function multiple times in the testbench, otherwise I can only read output from the first call. Is this normal behavior?

0 Kudos
Scholar u4223374
Scholar
562 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

HLS doesn't really do infinite loops, because obviously that makes running a simulation difficult. For streams, I'd normally do a loop until you detect a stream element with the TLAST bit set - and then stop. For a test you can just set TLAST at the end of your test data; on the FPGA you tie TLAST to 0 and Vivado will happily synthesize away all of the associated logic.

 

Your understanding of the read behaviour is correct, as is your understanding of the testbench and multiple calls. For my work (image processing), I normally set up my blocks so that they process exactly one image worth of data and then stop. Having the processor restart the block 30 times per second (for 30 FPS video) is not a problem.

 

Generally with streams I wouldn't use two arrays (eg. A and B). The ideal case is that you just read from one stream, do the process, and output to the other stream - with essentially no storage in the block. You can do this for simply operations like colour conversion in an image. For more complex operations you might need a line (or several lines) of data stored before you can process it. In this case, copy the data to array A, but then process from A directly to the output stream (ie don't write to an intermediate array). This saves both time and resources.

Contributor
Contributor
482 Views
Registered: ‎02-22-2008

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

When you say you process one image and stop. How do you do that. I'm trying to generate a histogram of a single image and then stop. I don't need to process all 30 frames per second. So I tried to add a local enable that gets set over AXI-Lite, then stops. But it just keeps processing image data. Code below:

void generate_histogram(hls::stream< ap_axiu<W,1,1,1> >& _src, ap_uint<12> height, ap_uint<1> enable, uint32_t red[256], uint32_t green[256], uint32_t blue[256])
{
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE axis register both  port=_src
#pragma HLS INTERFACE s_axilite port=height
#pragma HLS INTERFACE s_axilite port=enable
#pragma HLS INTERFACE s_axilite port=red
#pragma HLS INTERFACE s_axilite port=green
#pragma HLS INTERFACE s_axilite port=blue


    ap_axiu<W, 1, 1, 1> axis;
    ap_uint<12> h = 0;
    ap_uint<1> local_enable = 0;

	while (h < height)
	{
		_src >> axis;

		uint8_t b1, b2;
		uint8_t g1, g2;
		uint8_t r1, r2;

		b1 = axis.data(7,  0);
		g1 = axis.data(15, 8);
		r1 = axis.data(23, 16);
		b2 = axis.data(31, 24);
		g2 = axis.data(39, 32);
		r2 = axis.data(47, 40);

		if (axis.user)
		{
			local_enable = enable;
		}
		if ((h < height) && local_enable)
		{
			blue[b1]++;
			blue[b2]++;
			green[g1]++;
			green[g2]++;
			red[r1]++;
			red[r2]++;
		}

		if (axis.last)
		{
			h++;
		}
	}
}
0 Kudos
Scholar u4223374
Scholar
465 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

@nlbutts That's odd, I've used a similar layout before with no problems.

 

A few ideas:

- You're setting the ap_start bit, not auto_restart? Obviously auto_restart will cause the block to automatically restart every time it finishes.

- Where is this data coming from? Is it correctly setting TLAST?

 

In my blocks I eventually decided that just having TLAST on each line wasn't enough, so I used a TUSER bit to indicate "end of frame". That helps if there's a chance of getting partial frames, since it means that you fully re-synchronize at the end of each frame. Alternatively you can use TUSER to indicate start of frame - or you can do both.

0 Kudos
Contributor
Contributor
457 Views
Registered: ‎02-22-2008

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

The data is coming from an image sensor. I've been doing testing with the TPG. So user gets set at the start of a frame and TLAST gets set at the end of each line. So I feed in the height of the images and then count TLAST to determine when it is done. 

When I just set the start bit, it processes one frame and then backs up the AXI-Stream. I'm going to try moving the 

		_src >> axis;

Outside the while loop. I think that should continue to consume AXI-Stream data from the FIFO but not push it through the histogram block. 

0 Kudos