cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Highlighted
Visitor
791 Views
Registered: ‎07-09-2018

## OCCURENCE in a pipelined function

Jump to solution

Hello!

I would like to know a bit more how the OCCURENCE pragma works in my example.

Suppose we have a function foo that is to be pipelined.
It has an input stream x, an output stream z and applies function z = bar2(x, A) where A is a static parameter. This parameter A depends on another stream y. Each time there's data in y, A has to be recalculated as A = bar1(y). The design guarantees that new y samples appear no more than every 20 x samples, so OCCURENCE directive could be applied here.

In code it looks as

```void foo(stream<int> &xs,
stream<int> &ys,
stream<int> &zs)
{
#pragma HLS PIPELINE

static int A = 0;
int x, y;

if ( !ys.empty() ) {
#pragma HLS OCCURRENCE cycle=20
ys >> y;
A = bar1(y);
}

if ( !xs.empty() ) {
xs >> x;
zs << bar2(x, A);
}
}

int bar1(int x)
{
int y;
for (int i = 0; i < 10; ++i) {
/* iteratively compute y from x */
}
return y;
}

int bar2(int x, int A)
{
int z;
for (int i = 0; i < 10; ++i) {
/* iteratively compute z from x & A */
}
return z;
}```

In this example I expect Vivado HLS to unroll the loop in bar2 due to the PIPELINE pragma, but keep the loop in bar1 rolled because of the OCCURRENCE.
However, HLS unrolls both loops and simply sets II=20 for bar1 with no optimisation.
How can this behavior be changed to keep bar1 loop rolled?

Ilya

1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
674 Views
Registered: ‎09-05-2018

Hey @sinill57 ,

I see that now, I missed those details originally.

In that case, I think HLS is taking the simplest approach possible. It sees the pipeline pragma, and therefore assumes all loops below must be unrolled. And then separately, it sees sees the occurence pragma and applies II=20 to that block.

It could be possible to compare the argument to the occurence pragma to the number of iterations of the loop to see if it's okay to save resources by leaving the loop rolled, but HLS does not make this optimization. Instead, it looks like HLS takes the conservative approach to ensure functionality at the expense of some resources.

It's a good optimization that the development team could make to HLS in the future, but I like your solution to separate the two blocks within the pipeline.

Nicholas Moellers

Xilinx Worldwide Technical Support
7 Replies
Highlighted
Teacher
774 Views
Registered: ‎10-23-2018

This might give some insight... https://www.xilinx.com/support/answers/57710.html

Hope that Helps
If so, Please mark as solution accepted. Kudos also welcomed. :-)

Highlighted
Visitor
766 Views
Registered: ‎07-09-2018

Hi.

Thank you for the answer, however I've visited this link a few times already and it doesn't cover my question because it doesn't mention the resource usage in any way.

Highlighted
Xilinx Employee
745 Views
Registered: ‎09-05-2018

Hey @sinill57 ,

I think you have said the answer in your post. A is produced by bar1() and consumed by bar2(). HLS thinks that when bar1() executes, you want that result available on the next clock cycle so it can be used in bar2().

That's why I beleive it unolls bar1(). Even though you've told it it's going to run only ever 20th time or greater, that's not enough logically for HLS to assume bar1() can have a greater II. And that's a good assumption because it would change the result.

This is just my intuition though; you could check this by removing the dependency between the two and checking the result

Nicholas Moellers

Xilinx Worldwide Technical Support
Highlighted
Visitor
689 Views
Registered: ‎07-09-2018

Hello, @nmoeller !

HLS thinks that when bar1() executes, you want that result available on the next clock cycle so it can be used in bar2().

HLS states that the latency of bar1() is 11, so how can it produce data on the next clock cycle? The loop in the function has an internal dependency for calculation, so it cannot be performed in parralel, only pipelined.

That's why I beleive it unolls bar1(). Even though you've told it it's going to run only ever 20th time or greater, that's not enough logically for HLS to assume bar1() can have a greater II. And that's a good assumption because it would change the result.

I would totally agree with you if HLS kept II = 1 on bar1(). But, as I've mentioned in the original post, it does set an II of 20, keeping the loop unrolled.

A solution I've found sofar is to separate these two if blocks into different functions inside the dataflow region. This way, I  can only apply PIPELINE to bar2() and I just don't need OCCURRENCE anymore.

However, it would still be great to understand the logic of HLS in this scenario.

Ilya

Highlighted
Xilinx Employee
675 Views
Registered: ‎09-05-2018

Hey @sinill57 ,

I see that now, I missed those details originally.

In that case, I think HLS is taking the simplest approach possible. It sees the pipeline pragma, and therefore assumes all loops below must be unrolled. And then separately, it sees sees the occurence pragma and applies II=20 to that block.

It could be possible to compare the argument to the occurence pragma to the number of iterations of the loop to see if it's okay to save resources by leaving the loop rolled, but HLS does not make this optimization. Instead, it looks like HLS takes the conservative approach to ensure functionality at the expense of some resources.

It's a good optimization that the development team could make to HLS in the future, but I like your solution to separate the two blocks within the pipeline.

Nicholas Moellers

Xilinx Worldwide Technical Support
Highlighted
Visitor
660 Views
Registered: ‎07-09-2018

It could be possible to compare the argument to the occurence pragma to the number of iterations of the loop to see if it's possible to unroll it, but HLS does not make this optimization.

Think you meant  "if it's possible to roll it"  :)

In this case the situation is clear, thank you.

Highlighted
Xilinx Employee
636 Views
Registered: ‎09-05-2018

Hey @sinill57 ,

Yep, thanks for pointing that out. I've edited the post to correct it.

Nicholas Moellers

Xilinx Worldwide Technical Support