UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer yhy.xilinx
Observer
103 Views
Registered: ‎06-20-2019

Deceleration with HLS pragmas on SDSoc

Hello everyone,

I am trying to use HLS pragmas and accelarate my code. However, what i am observing is quite awkward. When I placed the pragmas, my code latency increases. Why would be the reason? Here is an example code snippet:

void trying(const int noutput_items, unsigned char *input_items, unsigned char *output_items, unsigned char *d_in_buff, unsigned char *d_out_buff, const int d_noutput, const int d_out_bs,
		  const int d_in_bs, const int d_k, const int d_n,
		  const int d_m, int d_reg, unsigned int size, unsigned int size_out)
{
	const int c_size = noutput_items * d_noutput/d_out_bs;
	 #pragma HLS dataflow
	 unsigned char in[1512];
	 for(int i=0; i<1512; i++){
		 in[i] = input_items[i];
	 }

	 unsigned char out[1512*8];

	 for (int k = 0; k < (noutput_items * d_noutput / d_out_bs); k++) {
		#pragma HLS loop_tripcount min=c_size max=c_size
	    for (int i = 0; i < d_in_bs; i++) {
		#pragma HLS loop_tripcount min=d_in_bs max=d_in_bs
	        for (int j = 0; j < 8; j++) {
			#pragma HLS PIPELINE
	        	  d_in_buff[8*i + j] = (in[k*d_in_bs + i] >> (7 - j)) & 1;
	        }
	    }

	    for (int in_bit = 0, out_bit = 0; in_bit < (8 * d_in_bs); in_bit += d_k, out_bit += d_n) {
			#pragma HLS unroll
			#pragma HLS PIPELINE
	      	    another_func(gr::a, &d_in_buff[in_bit], &d_out_buff[out_bit], d_reg);
	    }

	    for (int i = 0; i < d_out_bs; i++) {
	       #pragma HLS loop_tripcount min=d_out_bs max=d_out_bs
	       #pragma HLS unroll
unsigned char c = 0;
for (int j = 0; j < d_m; j++) { #pragma HLS loop_tripcount min=d_m max=d_m #pragma HLS PIPELINE c |= d_out_buff[d_m*i + j] << (d_m - 1 - j); } out[k*d_out_bs + i] = c; } } for(int i=0; i<12096; i++){ #pragma HLS unroll factor=1512 output_items[i] = out[i]; } }
0 Kudos
1 Reply
Highlighted
Scholar u4223374
Scholar
79 Views
Registered: ‎04-26-2015

Re: Deceleration with HLS pragmas on SDSoc

Where are the delays coming from? You should be able to see what is taking all the time, in the HLS analysis view.

 

The big thing that stands out for me is that you've got a couple of loops unrolled to very high levels (1512 writes in a single cycle?) and no indication that the memory can actually support that. In this case HLS will be forced to build an extremely large and relatively slow state machine to control memory access, which will add much more latency than just pipelining the loop.