UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer mh9840
Observer
1,673 Views
Registered: ‎01-14-2019

Bit of confusion - Designing a Stream data processing IP

Jump to solution

 

Hello all. I've recently started using Vivado HLS, and I'm having a bit of confusion on designing a IP that processes Streaming data.

So far the design is as following :

 

void top_func (

    hls::stream<> &in

    hls::stream<> &out

) {

    array A[], B[];

 

    //loop n times of  in.read(), storing to array A

   

    process(A,B);     //this would look like a simple c++ function, doing only arithmetic operations

 

    //loop n times of out.write(), writing array B to output stream

}

 

So the IP reads in large data through Stream interface, process it with simple function, then write the output array through Stream interface.

 

Question is : Is this the right way to design a Streaming IP, when I want to run it multiple times?

Designing such IP in Verilog usually involves in FSM design with infinite loop. But with the above C++ implementation, it 'looks like' running one single time and that's it.

I will eventually port the design to a FPGA, design a Application Program in SDK, and run it. (still learning about how to connect Streaming Interfaces with the processor)

 

I would appreciate if someone could explain how the top functions are to be designed in HLS. And if I need any specific implementation methods to run the exported IP in a Application Program.

 

+) One more question. I have some global variables declared in a header file of the top function. The values remain even if I 'call' the top function multiple times, correct?

1 Solution

Accepted Solutions
Scholar u4223374
Scholar
1,593 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

HLS doesn't really do infinite loops, because obviously that makes running a simulation difficult. For streams, I'd normally do a loop until you detect a stream element with the TLAST bit set - and then stop. For a test you can just set TLAST at the end of your test data; on the FPGA you tie TLAST to 0 and Vivado will happily synthesize away all of the associated logic.

 

Your understanding of the read behaviour is correct, as is your understanding of the testbench and multiple calls. For my work (image processing), I normally set up my blocks so that they process exactly one image worth of data and then stop. Having the processor restart the block 30 times per second (for 30 FPS video) is not a problem.

 

Generally with streams I wouldn't use two arrays (eg. A and B). The ideal case is that you just read from one stream, do the process, and output to the other stream - with essentially no storage in the block. You can do this for simply operations like colour conversion in an image. For more complex operations you might need a line (or several lines) of data stored before you can process it. In this case, copy the data to array A, but then process from A directly to the output stream (ie don't write to an intermediate array). This saves both time and resources.

12 Replies
Explorer
Explorer
1,638 Views
Registered: ‎07-18-2018

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

hi mh9840,

    There are examples of streams and axi streams in the tool that might be helpful places to start. Also check of UG 902. It's the best reference to understand what different HLS constructs are trying to do.

So the basics of the HLS:STREAM is instead of the top level interfaces being implemented as a RAM that you do a read/write from, it's a Fifo.

You write into the stream or read from the stream. The rule is that the data has to be accessed in a sequential order. Once you read, you can't access it again. It's basically going to be the same as an ap_fifo interface on the top.

Your IP block should be able to while running keep reading data from the stream, doing a calculation, and writing it back to the stream. If this takes multiple cycles, and you want to have it return a value every cycle after a certain amount has been fed into it, you will likely need to pipeline or dataflow the block.

Something that is helpful is to implement the HLS block, and then look at the RTL code generated. It highlights what the interface will look like, how the control signals start, stop, and reset the block.

 

Observer mh9840
Observer
1,616 Views
Registered: ‎01-14-2019

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

Thank you for the help. I found 'Using HLS Streams' in UG 902. Along with '2D_convolution_with_linebuffer' example in Xilinx directory.

To clarify I understand how it works : If I want the function to read 100 data and then process it, looping blocking read (in.read()) 100 times would make it work in proper sequence, correct?

+) And I have to call the top function multiple times in the testbench, otherwise I can only read output from the first call. Is this normal behavior?

0 Kudos
Scholar u4223374
Scholar
1,594 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

HLS doesn't really do infinite loops, because obviously that makes running a simulation difficult. For streams, I'd normally do a loop until you detect a stream element with the TLAST bit set - and then stop. For a test you can just set TLAST at the end of your test data; on the FPGA you tie TLAST to 0 and Vivado will happily synthesize away all of the associated logic.

 

Your understanding of the read behaviour is correct, as is your understanding of the testbench and multiple calls. For my work (image processing), I normally set up my blocks so that they process exactly one image worth of data and then stop. Having the processor restart the block 30 times per second (for 30 FPS video) is not a problem.

 

Generally with streams I wouldn't use two arrays (eg. A and B). The ideal case is that you just read from one stream, do the process, and output to the other stream - with essentially no storage in the block. You can do this for simply operations like colour conversion in an image. For more complex operations you might need a line (or several lines) of data stored before you can process it. In this case, copy the data to array A, but then process from A directly to the output stream (ie don't write to an intermediate array). This saves both time and resources.

Contributor
Contributor
1,514 Views
Registered: ‎02-22-2008

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

When you say you process one image and stop. How do you do that. I'm trying to generate a histogram of a single image and then stop. I don't need to process all 30 frames per second. So I tried to add a local enable that gets set over AXI-Lite, then stops. But it just keeps processing image data. Code below:

void generate_histogram(hls::stream< ap_axiu<W,1,1,1> >& _src, ap_uint<12> height, ap_uint<1> enable, uint32_t red[256], uint32_t green[256], uint32_t blue[256])
{
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE axis register both  port=_src
#pragma HLS INTERFACE s_axilite port=height
#pragma HLS INTERFACE s_axilite port=enable
#pragma HLS INTERFACE s_axilite port=red
#pragma HLS INTERFACE s_axilite port=green
#pragma HLS INTERFACE s_axilite port=blue


    ap_axiu<W, 1, 1, 1> axis;
    ap_uint<12> h = 0;
    ap_uint<1> local_enable = 0;

	while (h < height)
	{
		_src >> axis;

		uint8_t b1, b2;
		uint8_t g1, g2;
		uint8_t r1, r2;

		b1 = axis.data(7,  0);
		g1 = axis.data(15, 8);
		r1 = axis.data(23, 16);
		b2 = axis.data(31, 24);
		g2 = axis.data(39, 32);
		r2 = axis.data(47, 40);

		if (axis.user)
		{
			local_enable = enable;
		}
		if ((h < height) && local_enable)
		{
			blue[b1]++;
			blue[b2]++;
			green[g1]++;
			green[g2]++;
			red[r1]++;
			red[r2]++;
		}

		if (axis.last)
		{
			h++;
		}
	}
}
0 Kudos
Scholar u4223374
Scholar
1,497 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

@nlbutts That's odd, I've used a similar layout before with no problems.

 

A few ideas:

- You're setting the ap_start bit, not auto_restart? Obviously auto_restart will cause the block to automatically restart every time it finishes.

- Where is this data coming from? Is it correctly setting TLAST?

 

In my blocks I eventually decided that just having TLAST on each line wasn't enough, so I used a TUSER bit to indicate "end of frame". That helps if there's a chance of getting partial frames, since it means that you fully re-synchronize at the end of each frame. Alternatively you can use TUSER to indicate start of frame - or you can do both.

0 Kudos
Highlighted
Contributor
Contributor
1,489 Views
Registered: ‎02-22-2008

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

The data is coming from an image sensor. I've been doing testing with the TPG. So user gets set at the start of a frame and TLAST gets set at the end of each line. So I feed in the height of the images and then count TLAST to determine when it is done. 

When I just set the start bit, it processes one frame and then backs up the AXI-Stream. I'm going to try moving the 

		_src >> axis;

Outside the while loop. I think that should continue to consume AXI-Stream data from the FIFO but not push it through the histogram block. 

0 Kudos
Contributor
Contributor
995 Views
Registered: ‎02-24-2019

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution
If you really need to read two arrays and process them together to produce a single output, then how to do it?
0 Kudos
Scholar u4223374
Scholar
980 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

@eewse Probably better to start a new thread for this question.

 

When you say "arrays", what exactly do you mean? Are these two streams coming in together? Two arrays stored in system RAM? Or two arrays that the CPU will program via AXI Lite?

 

In the first case, the key challenge is keeping the streams synchronized. If they're coming from two sources that don't do flow control (eg. camera sensors) then you are going to need to buffer at least one. If they're both coming from RAM, then you have a potential deadlock condition. If one is from a sensor and one is from RAM, then you're in a good position.

 

In the second case, it's easy. An AXI Master can pull that data out of RAM in whatever order you want, although linear reads/writes are most efficient.

 

In the third case it's also very easy. Block RAM has no penalty for non-linear read/write.

0 Kudos
Contributor
Contributor
969 Views
Registered: ‎02-24-2019

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

u4223374,

Thank you for prompt reply.

I refer to the first case. If they are two streams coming in togerther. What is the proper way to buffer one of them? FrameBuffer or LineBuffer or VDMA?

 

 

 

 

0 Kudos
Contributor
Contributor
964 Views
Registered: ‎02-24-2019

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

how to ensure that sgbm_filter_simple can consume the two input streams: sf_c3lr_to_c1_l and c3lr_to_c1_r simultaneously?

Screenshot from 2019-05-30 04-27-55.png

 

0 Kudos
Scholar u4223374
Scholar
921 Views
Registered: ‎04-26-2015

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

@eewse That looks like a situation where it would make sense to just stick an AXI FIFO on both of the streams (in the block design). An N-element FIFO will compensate for up to N elements of offset between the streams. If, for example, one of your blocks is expected to delay its output by a line, and the other delays it by 10 pixels, then setting the FIFO length to just about one line would be appropriate.

Similarly, if the HLS block reads a lot from one stream before reading from the other, the FIFOs will need to be adjusted to compensate.

 

With that done, the HLS block should be able to read the two streams freely, with no special care paid to them.

0 Kudos
Contributor
Contributor
901 Views
Registered: ‎02-24-2019

Re: Bit of confusion - Designing a Stream data processing IP

Jump to solution

u4223374,

How about

#pragma hls dataflow

function1   //latency 300,000 ns

function2 //latency 300,000 ns

function3 //latency 300,000 ns

function 4 //latency 600,000 ns

function 5 // latency 300,000 ns

 

 

 

 

 

0 Kudos