UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
571 Views
Registered: ‎04-14-2016

Behaviour of HLS synthese regarding FIR filter

Hello, I have a question regarding to output of a normal FIR filter (4th order, 5 integer coefficients).

The code is simple:

int fir_filter_opt1(int input) {
#pragma HLS PIPELINE
	int i=0;
	int res=0;
	for (i=N_FILTER-1; i>=0; i--) {
		if (i==0) {
			temp[i] = input;
		} else {
			temp[i] = temp[i-1];
		}
		res += temp[i]*coeffs[i];

	}
	return res;
}

The N_FILTER, temp and coeffs are defined outside the function in a corresponding header file. If I synthesize this without the pipeline optimization, the result is as expected:

30 clock cycles latency for the loop, which is executed 5 times with 6 clock cycles per iteration. 

Usage: 3 DSPs, 214 FF, 202 LUT, 0 BRAM

But after adding the PIPELINE optimization, I expected significantly more DSPs used and an loop latency of 1.

The result was that 0 DSPs are used, 194 FF, 222 LUT, latency was 1.

So why did HLS failed to unroll the loop? I did not find any useful information in the report. For me, it looks like that HLS optimized everything away, but why?

Thank you!

0 Kudos
1 Reply
Moderator
Moderator
500 Views
Registered: ‎10-04-2011

Re: Behaviour of HLS synthese regarding FIR filter

Hello @m2b821,

 

I think there are a couple of things going on here. The first is that I don't see the definitions of temp and coeffs. For sure, temp needs to be defined as a static variable since the value will be maintained across function call. 

 

When you run synthesis on the pipelined function, you should see a message indicating that the loop was unrolled - which is the default behavior of loops inside piplelined regions. I called the loop "L1" for clarity. 

 

INFO: [XFORM 203-502] Unrolling all loops for pipelining in function 'fir_filter_opt1' (test_pipeline/test_ip.cpp:1).
INFO: [XFORM 203-501] Unrolling loop 'L1' (test_pipeline/test_ip.cpp:29) in function 'fir_filter_opt1' completely.

 

Also, the temp array will be partitioned so that multiple memory access per cycle are possible - resulting in FFs for these array locations. 


INFO: [XFORM 203-102] Partitioning array 'temp' automatically.

 

Since the coeffs is a constant, HLS can optimize the resulting MACC operation according to the value of the variables. It may be that a combinatorial multiplier in LUTs is the most efficient implementation of your function.  For example, if i use single digit values for coeffs, I get the same as you. If I use values in the 100's or 1000's, I get DSP48's. 

 

A few final thoughts are that DSP's listed in the HLS synthesis resources, whether inferred or directed with resource directives, are suggestions to Vivado synthesis. Vivado synthesis can and will override these if it determines a better implementation results by using a different resource. The best thing to do is to check the resources by exporting with Vivado synthesis checked to see the true results. And finally, verify the operation in CoSim to ensure the resulting RTL still meets your needs after directives are applied.

 

OK, hope this helps,

Scott

0 Kudos