UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
1,246 Views
Registered: ‎10-19-2015

Why I get always longer latency (II) for HLS loops???

Hi All,

 

I am having trouble with very simple copy loop in HLS. The current design has a requirement to finish the copying the data to output in less than 33 cycles. I thought HLS could do it because it is just copying 64-bit data processed data to output. However, HLS is giving me 36 cycles (latency) to do the job which is really bad.  

 

To see the problem, please take a look at the Copy2Output loop which iterates 32 times (trip count 32). It just copies 64-bit data to 64-bit axis interface. This loop takes 35 cycles (the dataflow II is 36, and I do not know why??). I am wondering why it can not be completed within 32 or at least 33 clock cycles ? This is just a simple operation???

 

void  simple_core(const in_point_t* din, apuint64 dout[32],
		ap_uint<12>  *threshold) {

#pragma HLS DATAFLOW
#pragma HLS INTERFACE axis register depth=1 port=dout
// Note: depth>1 for din breaks Co-Sim.
#pragma HLS INTERFACE axis register depth=1 port=din
#pragma HLS DATA_PACK variable=din field_level
#pragma HLS DATA_PACK variable=dout field_level

// comment out following line to test Co-sim.
#pragma HLS INTERFACE ap_ctrl_none port=return


	....

	apuint64  result64[ITERATIONS*2];

	copy2local(din, buffer);
        unsigned int index = 0;
	// Processing block
	ProcessingBlock:for(ap_uint<8> i=0;i<ITERATIONS;i++) {

                .................
		result64[index] = tmp_out.range(63, 0);
		index = index + 1;
		result64[index] = tmp_out.range(127, 64);
		index = index + 1;
		temp_result.range(PACK_WIDTH*i+PACK_WIDTH_ONE_LESS, PACK_WIDTH*i) = tmp_out;
	}



	Copy2Output:for(ap_uint<6> i=0;i<ITERATIONS*2;i=i+1) {
		#pragma HLS PIPELINE II=1
        	dout[i] = result64[i];
	}



}

I also attached the report.

Tags (4)
xilinx_case_study.PNG
0 Kudos
4 Replies
Scholar hbucher
Scholar
1,237 Views
Registered: ‎03-22-2016

Re: Why I get always longer latency (II) for HLS loops???

@akboken  Try 

#pragma HLS unroll

https://www.xilinx.com/html_docs/xilinx2017_4/sdsoc_doc/uyd1504034366571.html

 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
1,234 Views
Registered: ‎10-17-2017

Re: Why I get always longer latency (II) for HLS loops???

This probably because it would take two cycles to read from result64[] and then two cycles to write to dout[].(3 cycles simultaneously)

 

 

Which means you won't get an output untill the first 3 cycles. And hence the complete loop would take 35 cycles in total. 

0 Kudos
Highlighted
Adventurer
Adventurer
1,185 Views
Registered: ‎10-19-2015

Re: Why I get always longer latency (II) for HLS loops???

The unroll does help, but it is not enough. 

 

With unroll, it takes 34 cycles to complete (as opposed to 36 previously). I still need to reduce it to 33.  I am completely baffled why we can not finish this loop in 32 cycles ??? There is no operations, and it is just an assignment???

 

 

0 Kudos
Scholar hbucher
Scholar
1,177 Views
Registered: ‎03-22-2016

Re: Why I get always longer latency (II) for HLS loops???

@akboken On the top right of the HLS window, there is a button "ANALYSIS". It will show a graph with all the steps.

You can drill into the loops and see what is going on.

My hunch is that it is mixing the contents of the two loops - yes it can do that.

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos