cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Visitor
4,243 Views
Registered: ‎10-01-2016

trying to achieve PIPLINE interval of 1

hello, I want to achieve a PIPELINE interval of 1, but the current interval is 8.

For simplicity, I just put the inner loop of my code here.

```#pragma HLS PIPELINE#define C 2#defien PARA 16
for(int l=0;l<PARA;++l){
int ker_num = l;
if(init&&j==0){
sum = 0;
} else {
sum = out_buf[ker_num][i];
}
for(int m=0;m<C;++m){
}
}
out_buf[ker_num][i] = sum;
}```

the data type of  buf_line and f_buf is ap_uint<64>

the data type of out_buf is ap_int<16>

the mul_bi function is just a pop_count.

I've  partitioned f_buf,buf_line, out_buf. so they can be fetched in one cycles.

```#pragma HLS ARRAY_PARTITION variable=buf_line complete dim=1
#pragma HLS ARRAY_PARTITION variable=f_buf cyclic factor=16 dim=1
#pragma HLS ARRAY_PARTITION variable=f_buf complete dim=2
#pragma HLS ARRAY_PARTITION variable=out_buf cyclic factor=16 dim=1```
Tags (3)
1 Solution

Accepted Solutions
Visitor
7,088 Views
Registered: ‎12-02-2016

First of all, consider moving the topic to HLS.

I do not know how expensive it is to have so much logic branches in your code when compiling to bitstream, this might blow up your design by implementing large multiplexers.

Now to the problem in question:

The PIPELINE pragma unrolls all encompassed loops by default. I think what you might want to consider is the PIPELINE REWIND pragma in the innermost loop. This tells HLS that there is no need to flush the pipeline between operations, but rather keep the pipeline always full. If this doesnt work you might have dependencies in your code.

To make sure that there is no inter dependency between the data used in the pipeline, the synthesized bitstream will simply flush the pipeline and fill it again, invalidating the REWIND option. The minimum distance between depencies required to keep the pipeline always full depends on the depth of your pipeline; e.g. your depth is 24, make sure that the minimum distance between actual dependencies is larger. If you assured that the is no inter dependency (or that the distance is sufficient) but receive a warning about it, consider telling hls with the AP DEPENDENCE INTER FALSE pragma.

Below is the core of my convolutional layer. few remarks considering the code and its surroundings:

- padding is done in software together with some other dataset transformations

- currently single precision floating point data

- multiply accumulate buffer (mac) allows for larger distance between dependencies

- everything is stored in bram

```for(nin = 0; nin < _NIN_; nin++)
{
for(wkern = 0; wkern < _WK0_; wkern++)
{
/* inter independence of mac no longer true above this point */
for(nout = 0; nout < _NOUT_PW_; nout++)
{
for(hkern = 0; hkern < _HK0_; hkern++)
{
for(wout = 0; wout < _WEFF_; wout++)
{
#pragma HLS PIPELINE II=1 REWIND

win = wkern + wout;
i = in[nin][win];

/* bram shift register => circular access pattern */
mac_height = (shreg + _HK0_ - hkern - 1) % _HS_;

for(nworker = 0; nworker < _NWORKER_; nworker++)
{
#pragma HLS UNROLL
#pragma AP dependence variable=in inter false
#pragma AP dependence variable=mac inter false
#pragma AP dependence variable=kern inter false

k[nworker] = kern[nworker][nin][nout][hkern][wkern];
p[nworker] = i * k[nworker];
mac[nworker][nout][mac_height][wout] +=  p[nworker];
}
}
}
}
}
}```

problem in the above code

5 Replies
Visitor
4,233 Views
Registered: ‎10-01-2016

I think maybe I should put a simplified version of my total design. I will do that later.

Visitor
7,089 Views
Registered: ‎12-02-2016

First of all, consider moving the topic to HLS.

I do not know how expensive it is to have so much logic branches in your code when compiling to bitstream, this might blow up your design by implementing large multiplexers.

Now to the problem in question:

The PIPELINE pragma unrolls all encompassed loops by default. I think what you might want to consider is the PIPELINE REWIND pragma in the innermost loop. This tells HLS that there is no need to flush the pipeline between operations, but rather keep the pipeline always full. If this doesnt work you might have dependencies in your code.

To make sure that there is no inter dependency between the data used in the pipeline, the synthesized bitstream will simply flush the pipeline and fill it again, invalidating the REWIND option. The minimum distance between depencies required to keep the pipeline always full depends on the depth of your pipeline; e.g. your depth is 24, make sure that the minimum distance between actual dependencies is larger. If you assured that the is no inter dependency (or that the distance is sufficient) but receive a warning about it, consider telling hls with the AP DEPENDENCE INTER FALSE pragma.

Below is the core of my convolutional layer. few remarks considering the code and its surroundings:

- padding is done in software together with some other dataset transformations

- currently single precision floating point data

- multiply accumulate buffer (mac) allows for larger distance between dependencies

- everything is stored in bram

```for(nin = 0; nin < _NIN_; nin++)
{
for(wkern = 0; wkern < _WK0_; wkern++)
{
/* inter independence of mac no longer true above this point */
for(nout = 0; nout < _NOUT_PW_; nout++)
{
for(hkern = 0; hkern < _HK0_; hkern++)
{
for(wout = 0; wout < _WEFF_; wout++)
{
#pragma HLS PIPELINE II=1 REWIND

win = wkern + wout;
i = in[nin][win];

/* bram shift register => circular access pattern */
mac_height = (shreg + _HK0_ - hkern - 1) % _HS_;

for(nworker = 0; nworker < _NWORKER_; nworker++)
{
#pragma HLS UNROLL
#pragma AP dependence variable=in inter false
#pragma AP dependence variable=mac inter false
#pragma AP dependence variable=kern inter false

k[nworker] = kern[nworker][nin][nout][hkern][wkern];
p[nworker] = i * k[nworker];
mac[nworker][nout][mac_height][wout] +=  p[nworker];
}
}
}
}
}
}```

problem in the above code

4,176 Views
Registered: ‎04-26-2015

In my experience, HLS can have trouble with conditions within loops when those conditions affect whether or not a RAM access has to occur. You're better-off always doing the RAM access, and then using a condition to determine whether the resulting value gets used.

Give this a try:

```#pragma HLS PIPELINE
#define C 2
#defien PARA 16
for(int l=0;l<PARA;++l){
int ker_num = l;

int tmp = out_buf[ker_num][i]; // RAM is always accessed.
if(init&&j==0){
sum = 0;
} else {
sum = tmp;
}
for(int m=0;m<C;++m){
int tmp_result = mul_bi(buf_line[m][i+j-PAD],f_buf[ker_num][m][j]); // RAM is always accessed.
sum += tmp_result;
}
}
out_buf[ker_num][i] = sum;
}```
Visitor
4,159 Views
Registered: ‎10-01-2016

@u4223374Thank you for your reply. It doesn't solve the problem. But it helps get rid of the warning

`CRITICAL WARNING: [SDSoC 0-0] Timing constraints were not met`