UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor fimmerzeel
Visitor
292 Views
Registered: ‎10-26-2018

Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Hello,

I'd like to have a continuous pixel stream, where every time a pixel is received an output pixel is generated. The output is a combination of a few pixels, so it has a delay. The delay is handled by a row/column counter, which set the user/last bits after some delay. 

The code does work (also in hardware), however in hardware it is not possible too process each clock cycle new data (valid/ready bits stays low). According to synthesis report the timings are met and II=1.

What am I doing wrong?

void top_function(hls::stream<ap_axiu<8,1,1,1> >& S_AXIS,  hls::stream<ap_axiu<8,1,1,1> >& M_AXI)
{
#pragma HLS INTERFACE axis port=S_AXIS
#pragma HLS INTERFACE axis port=M_AXIS
#pragma	HLS INTERFACE ap_ctrl_none port=return

	static hls::stream<ap_axiu<8,1,1,1> > output_f1;

#pragma HLS DATAFLOW
	
	function1(S_AXIS, output_f1);
	function2(output_f1, M_AXIS);
}


void function1(hls::stream<ap_axiu<8,1,1,1> >& S_AXIS,  hls::stream<ap_axiu<8,1,1,1> >& M_AXIS)
{
	static ap_uint<13> row = 0;
	static ap_uint<13> col = 0;
	static ap_uint<1> in_last_prev = 1;
	ap_axiu<8,1,1,1> in, out;

#pragma HLS PIPELINE II=1
	in = S_AXIS.read();
	
	/* some code to buffer pixels and calculate output 'DELAY' times later */
	
	out.data = calculated_output;
	
	if(col == DELAY-1)
		out.last = 1;
	else
		out.last = 0;

	if(row == DELAY and col == DELAY)
		out.user = 1;
	else
		out.user = 0;

	out.keep = in.keep;
	out.strb = in.strb;
	out.id   = in.id;
	out.dest = in.dest;
	M_AXIS.write(out);
	
	/* update column counter and row counter */
	if(in.user == 1)
		row = 0;
	else if(in_last_prev == 1)
		row += 1;

	if(in_last_prev == 1)
		col = 0;
	else
		col += 1;

	if(in.last == 1)
		in_last_prev = 1;
	else
		in_last_prev = 0;
}


void function2(hls::stream<ap_axiu<8,1,1,1> >& S_AXIS, hls::stream<ap_axiu<8,1,1,1> >& M_AXIS)
{
	ap_axiu<8,1,1,1> in, out;

#pragma HLS PIPELINE II=1
	in = S_AXIS.read();

	out.data = 255-in.data;
	out.last = in.last;
	out.user = in.user;
	out.keep = in.keep;
	out.strb = in.strb;
	out.id   = in.id;
	out.dest = in.dest;
	M_AXIS.write(out);
}


/* testbench */
int main (int argc, char** argv)
{
	/* load image and convert to stream */

	L1: for(int row = 0; row < size_in.height+DELAY; row++) {
		L2: for(int col = 0; col < size_in.width; col++) {
			top_function(src_axi, dst_axi);
		}
	}

	return 0;
}
0 Kudos
7 Replies
Explorer
Explorer
267 Views
Registered: ‎07-18-2018

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Can you do a co-sim of it? The waveform should give you an indication if when it's fed a new pixel each cycle after the initial latency, that it's outputting a pixel each clock cycle. That should give you some confidence that the description is correct. If that all appears to work, then I would check how you are talking to the block in the full design, it's possible you are stopping and starting the block which would could be stalling the dataflow pipeline?

Start there with sharing the co-sim results to verify that the II of 1 from the reports is reflecting what you believe it is.

0 Kudos
Moderator
Moderator
254 Views
Registered: ‎11-09-2015

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Hi @fimmerzeel,

You might want to have a look to my Xilinx Video Series 14, Xilinx Video Series 15, Xilinx Video Series 17 and Xilinx Video Series 18. I am basically doing what you are trying to achieve.

Hope that helps,

Regards,


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**
0 Kudos
Visitor fimmerzeel
Visitor
217 Views
Registered: ‎10-26-2018

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Hi @evant_nq,

 

Thanks for your reply, appreciate it!

The top function has the '#pragma HLS INTERFACE ap_ctrl_none port=return' in it, so I guess the block is not starting/stopping (only once the ap_rst is high), right? I've also tried the ap_ctrl_hs and forced the ap_start to high, but same result.

I can't run co-simulation, because of compiler errors. Do you have any idea what I'm doing wrong?

   Build using "C:/Xilinx/Vivado/2018.3/msys64/mingw64/bin/g++"
   Compiling example.cpp_pre.cpp.tb.cpp
cosim.tv.mk:68: recipe for target 'obj/example.cpp_pre.cpp.tb.o' failed
In file included from C:/Xilinx/Vivado/2018.3/include/etc/ap_private.h:119:0,
                 from C:/Xilinx/Vivado/2018.3/include/ap_common.h:641,
                 from C:/Xilinx/Vivado/2018.3/include/ap_int.h:54,
                 from C:/Xilinx/Vivado/2018.3/include/ap_axi_sdata.h:86,
                 from C:/Xilinx/Vivado/2018.3/include/hls/hls_axi_io.h:39,
                 from C:/Xilinx/Vivado/2018.3/include/hls_video.h:48,
                 from C:/fpga/HLS/example/example.h:11,
                 from C:/fpga/HLS/example/example.cpp:1:
C:/Xilinx/Vivado/2018.3/msys64/mingw64/include/c++/6.2.0/iomanip:462:65: error: 'quoted' function uses 'auto' type specifier without trailing return type
     _CharT __delim = _CharT('"'), _CharT __escape = _CharT('\\'))
                                                                 ^
C:/Xilinx/Vivado/2018.3/msys64/mingw64/include/c++/6.2.0/iomanip:462:65: note: deduced return type only available with -std=c++14 or -std=gnu++14
C:/Xilinx/Vivado/2018.3/msys64/mingw64/include/c++/6.2.0/iomanip:471:65: error: 'quoted' function uses 'auto' type specifier without trailing return type
     _CharT __delim = _CharT('"'), _CharT __escape = _CharT('\\'))
                                                                 ^
C:/Xilinx/Vivado/2018.3/msys64/mingw64/include/c++/6.2.0/iomanip:471:65: note: deduced return type only available with -std=c++14 or -std=gnu++14
C:/Xilinx/Vivado/2018.3/msys64/mingw64/include/c++/6.2.0/iomanip:481:65: error: 'quoted' function uses 'auto' type specifier without trailing return type
     _CharT __delim = _CharT('"'), _CharT __escape = _CharT('\\'))
0 Kudos
Visitor fimmerzeel
Visitor
209 Views
Registered: ‎10-26-2018

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Hi @florentw,

Thanks for your reply and the tutorial :)

Your tutorial makes use of the row/col for loops. Because of the DELAY it will take DELAY-cycles extra at the end of a row or frame to process all outputs. The idea behind my code is that is process each input also an output. So when the next frame is started with the first pixels, it outputs the last pixels of the previous frame. Is this right?

for(int row = 0; row < height ; row++) {
      for(int col = 0; col < width+ DELAY; col++) {	
#pragma HLS PIPELINE
			if (col < height)
				pixel_in = src.read();
			
			if (col >= DELAY)  //extra cycles due to delay if col > width
				dst.write(pixel_out)
        }
    }

 

 

0 Kudos
Moderator
Moderator
201 Views
Registered: ‎11-09-2015

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Hi @fimmerzeel,

I am not sure why you try to have a delay in your code. The delay should be automatically inserted by the core if it is doing some processing. Just stop reading the data while processing.

Regards,


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**
0 Kudos
Visitor fimmerzeel
Visitor
197 Views
Registered: ‎10-26-2018

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

Hi @florentw,

The delay is caused by a linebuffer, where DELAY-pixels later an output could be created for the first pixel. In example, the output of the second pixel is an average of the first, second and third pixel. In that case the algorithm should wait till the third pixels is arrived to be able to compute the output for the second pixel. The for-loop should run an extra cycle to compute the last pixel and during the extra cycle it won't read new data (the output stream has a delay of 1 pixel related to the input instream). In my case the delay is not 1 pixel but a few pixel rows, so It takes a lot of processing (loop iterations) after the last pixel is read.   

0 Kudos
Visitor fimmerzeel
Visitor
142 Views
Registered: ‎10-26-2018

Re: Initiation Interval (II) is 1, but hardware is not able to process each clock cycle a pixel

I've made some progress and it's now running with II=1. I did a RTL-export for every function. So function1 is now an IP block in Vivado and function2 also. In Vivado I've connected those blocks (AXI streams). So it looks like something is going wrong in the top function with the DATAFLOW pragma.

Does anybody know why it works when every function is an IP block, but it doesn't work when all function are combined in a DATAFLOW region?

0 Kudos