UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Reply

Reducing the latency and having output valid signal for array ?

Highlighted
Participant
Posts: 43
Registered: ‎10-19-2015

Reducing the latency and having output valid signal for array ?

Hi All,

 

Following code gets 1024-bit width data and outputs 64-bit chunks in 16 cycles. 

 

Basically, my input comes in 16 cycles (1024 bit data in every 16 cycles), and I have 16 cycles to chunk the data and output 64-bit data. The code is shown below. 

 

Now, I have a 2 problems.

 

1) When I synthesis this code with Vivado HLS, I am getting following latency and II (See latency.PNG) which gives me 20 clcock cycles for latency and 21 clock cycles for my II. However, example_top, should output 16 of 64-bit data in 16 clock cycles. As you see from the report, my trip count is 16 and my loop is pipelined. Why I am getting 21 for II and 20 for latency ? How am I going tor educe the latency and II to 16 ? 

 

2) I am also having problem with the interfaces. example_top needs 64-bit output and a data valid because the downstream module is designed VHDL and it accepts 64-bit data with data valid). How am I going to add a data valid signal ?  I know if i use *pointer in my C argument, it generates a data valid signal. If I use an array (ap_uint<64> OUT_DATA_TEST[16]), it generates interfaces as shown in interfaces.PNG. 

 

void example_top(
		       ap_uint<1> IN_DATA_VALID,
		       ap_uint<1024> IN_DATA,
			   ap_uint<64> OUT_DATA_TEST[16])

{

#pragma HLS STREAM variable=OUT_DATA_TEST depth=1 dim=1
#pragma HLS INTERFACE ap_ctrl_none port=return

	if (IN_DATA_VALID==1)
	{
	    for(ap_uint<8> i=0;i<16;i++) {
		   #pragma HLS PIPELINE II=1
		   OUT_DATA_TEST[i] =  IN_DATA.range(64*i+63, 64*i);
	    }

	}

}

latency.PNG
interfaces.PNG
Voyager
Posts: 714
Registered: ‎06-24-2013

Re: Reducing the latency and having output valid signal for array ?

[ Edited ]

Hey @akboken,

 

Running your example through Vivado 2017.2 gives me the expected 16/16 latency as well as an II of 1.

So everything fine there, not sure why your results differ ...

 

Regarding the output interface: As can be seen in Figure 1-39 of UG902, the ap_(o)vld is not supported for arrays at the moment, so you will have to change the output to a pointer and request the appropriate INTERFACE.

 

Performance Estimates 

 

Hope this helps,

Herbert

-------------- Yes, I do this for fun!
Participant
Posts: 43
Registered: ‎10-19-2015

Re: Reducing the latency and having output valid signal for array ?

Hi Hpoetzl and thank you for prompt response.

 

1) Yes, if I use 10 ns as an input lock period, I can get the same resutls as you did (please see the attached file(latency_10ns_period.PNG). However, I need to have 5 ns period which is my target, and as you see from the code, it is a simple code which gets 64 bits from the input data. I do not see why I can not achieve 16 clock cycles at 5 ns.

 

2)  I also changed the code to use pointer. However, Vivado HLS 2015.3 fails to synthesize my code. Please see error_when_using_pinter.PNG example.  I am definitely accessing the data in sequential order. 

 

Thanks in advance for your help.

 

latency_10ns_period.PNG
error_when_using_pointer.PNG
Voyager
Posts: 714
Registered: ‎06-24-2013

Re: Reducing the latency and having output valid signal for array ?

Hey @akboken,

 

Try this then:

void slice(ap_uint<1> IN_DATA_VALID,
           ap_uint<1024> IN_DATA,
           ap_uint<64> *OUT_DATA_TEST)
{
    #pragma HLS INTERFACE ap_ovld port=OUT_DATA_TEST
    #pragma HLS INTERFACE ap_ctrl_none port=return

    if (IN_DATA_VALID==1) {
	for(int i=0; i<16; i++) {
            #pragma HLS PIPELINE II=1
            *OUT_DATA_TEST = IN_DATA.range(64*i+63, 64*i);
        }
    }
}

Best,

Herbert

-------------- Yes, I do this for fun!
Participant
Posts: 43
Registered: ‎10-19-2015

Re: Reducing the latency and having output valid signal for array ?

THanks Hpoetzl again,

 

Well, I believe following line should be changed

 

*OUT_DATA_TEST = IN_DATA.range(64*i+63, 64*i);

 

 

to as follows?  Right ? If I have above code, then it breaks my testbench ? Why just output a single pointer ? Above line might work in hardware, but how about my testbench ? I also want to validate my code from C testbench.  How would you write a testbench for above code ? My testbench expects 16 of 64-bit data. 

 

* (OUT_DATA_TEST +i) = IN_DATA.range(64*i+63, 64*i);

 

 

Voyager
Posts: 714
Registered: ‎06-24-2013

Re: Reducing the latency and having output valid signal for array ?

Hey @akboken,

 

How would you write a testbench for above code?

I think that is an excellent question but should be answered in a separate thread.

I can imagine a number of solutions to this problem.

 

Best,

Herbert

-------------- Yes, I do this for fun!