cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
raisinill
Visitor
Visitor
770 Views
Registered: ‎06-05-2019

Apply function over streaming array

 

Hello!

 

I've implemented a function that takes one sample as an input and processes it. Synthesis results show an II=1. A very simplified version is shown below

 

int foo(int x)
{
#pragma HLS INLINE off
#pragma HLS LATENCY min=3
#pragma HLS PIPELINE

	static int s[3];

	for (int i = 0; i < 3; ++i)
		s[i] = s[i+1];
	s[2] = x;
	return s[0];
}

 

Now I would like to apply it over each element of an array. When foo has no static variable inside I get the desired II=10.

void bar(const int x[10], int y[10])
{
#pragma HLS INTERFACE axis port=x
#pragma HLS INTERFACE axis port=y
#pragma HLS PIPELINE II=10

	for (int i = 0; i < 10; ++i)
		y[i] = foo(x[i]);
}

 

However, if foo has a static variable the VHLS issues warnings:

  • WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 10, distance = 1, offset = 1)
    between 'call' operation ('tmp_1_2', test/test.cpp:83) to 'foo' and 'call' operation ('tmp_1', test/test.cpp:83) to 'foo'.

And after synthesis I get II=40.

 

Because foo itself successfully synthesises with II=1I think there must be a way to apply it over array with the desired II. Could you please help me on this?

 

0 Kudos
5 Replies
nithink
Xilinx Employee
Xilinx Employee
741 Views
Registered: ‎09-04-2017

@raisinill when you use static variable and call the function multiple times, we have a depedency now on the previous call. So the II=40 you are seeing is due to this.

Your code depicts a shift register. Do you want parallel shift registers for each of the inputs? what's your intention?

Thanks,

Nithin

0 Kudos
raisinill
Visitor
Visitor
649 Views
Registered: ‎06-05-2019

Hello @nithink ,

 

when you use static variable and call the function multiple times, we have a depedency now on the previous call. So the II=40 you are seeing is due to this.

Got it. Could you please then guide me into modifying things a bit?


Your code depicts a shift register. Do you want parallel shift registers for each of the inputs? what's your intention?


I'm trying to implement a multi-channel systolic FIR filter. A similar one was proposed in xapp1236, however I need a much simpler, single-rate variant. For this I use a slightly modified MAC block stated on page 8.

This example works on a sample-by-sample basis. When I synthesise the FIR filter as a separate IP I get the desired II = 1.

However, the function that produces data for the FIR outputs the data for all channels as an array: int x[NUM_CHANS]. I use a streaming pragma on it.

 

This leads to the problem initially stated in the topic.

 

Regards,
Ilya



 

0 Kudos
nithink
Xilinx Employee
Xilinx Employee
643 Views
Registered: ‎09-04-2017

@raisinill  One approach i could think of is to use a templatized function. This will create separate static variables for each function call

template <int id>
int foo(int x)
{
#pragma HLS INLINE off
#pragma HLS LATENCY min=3
#pragma HLS PIPELINE

static int s[3];

for (int i = 0; i < 3; ++i)
s[i] = s[i+1];
s[2] = x;


return s[0];
}

void bar(const int x[10], int y[10])
{
#pragma HLS INTERFACE axis port=x
#pragma HLS INTERFACE axis port=y
#pragma HLS PIPELINE II=10

y[0] = foo<0>(x[0]);
y[1] = foo<1>(x[1]);
y[2] = foo<2>(x[2]);
y[3] = foo<3>(x[3]);
y[4] = foo<4>(x[4]);
y[5] = foo<5>(x[5]);
y[6] = foo<6>(x[6]);
y[7] = foo<7>(x[7]);
y[8] = foo<8>(x[8]);
y[9] = foo<9>(x[9]);

}

Going a step further, you can create a class and have your function part of the class

Thanks,

Nithin

0 Kudos
raisinill
Visitor
Visitor
636 Views
Registered: ‎06-05-2019

@nithink I need all variables to go through a single static variable. The architecture from the WP looks as follows:

fir.png

 

This is done as:

dout_t multi_channel_fir(const din_t x, const coeff_t coeff[NUM_TAPS])
{
#pragma HLS PIPELINE

	din_t x_r;
	// shift registers are used in every mac unit to store data for different channels
	static ap_shift_reg<din_t, NUM_CHANS> shift_reg[NUM_TAPS];

#pragma HLS ARRAY_PARTITION variable=coeff complete
#pragma HLS ARRAY_PARTITION variable=shift_reg complete

dout_t acc(0,0); mac_loop: for (int i = 0; i < NUM_TAPS; ++i) { if (i == 0) x_r = x; else // propagation of data between mac units x_r = shift_reg[i].shift(x_r, NUM_CHANS-1); dout_t mult = x_r * coeff[i]; acc += mult; } return acc; }

 

So the static variable looks as

 

static ap_shift_reg<din_t, NUM_CHANS> shift_reg[NUM_TAPS];

 

If I use a separate instance of foo for each channel, then I get the number of DPS blocks NUM_TAPS*NUM_CHANS. However, the above function can do the same using only NUM_TAPS DPS blocks.

 

Regards,
Ilya

0 Kudos
nithink
Xilinx Employee
Xilinx Employee
583 Views
Registered: ‎09-04-2017

@raisinill There is an array of static variables in the example that you showed.  In your case it's just one.

Thanks,

Nithin

0 Kudos