cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
fvandesa
Contributor
Contributor
1,509 Views
Registered: ‎05-17-2018

[SCHED 204-68] HLS: Unable to enforce a carried dependency constraint (II = 1, distance = 1)

Hello,

Unless I partition the varable rRAM in the code below completely into registers, I can't get the design scheduled.

I don't see why this is, if dual ported memories are used, and the warning is not particulary useful.

Is there really a constraint in the code that explains the violation? Or am I missing a pragma?...?

thanks in advance.

 

Tags (1)
0 Kudos
7 Replies
nmoeller
Xilinx Employee
Xilinx Employee
1,491 Views
Registered: ‎09-05-2018

Hey @fvandesa,

I don't see any code, can you try again to attach it?

Nicholas Moellers

Xilinx Worldwide Technical Support
0 Kudos
u4223374
Advisor
Advisor
1,476 Views
Registered: ‎04-26-2015

Normally this warning is because HLS can't be sure that you won't access the same element multiple times in a row. Reading/writing to block RAM effectively takes two cycles, so if you did access the same element multiple times then the first operation wouldn't have finished before the next one hits the same memory location.

 

The solutions are:

- If that is actually the correct result (ie you could hit the same element multiple times) - rewrite your code so that it doesn't happen. Sorry.

- If that is not the correct result (cannot ever hit the same element multiple times) - either use the dependence pragma to tell HLS that it's OK, or (preferably) rewrite the code in a way that allows HLS to properly understand the behaviour.

0 Kudos
fvandesa
Contributor
Contributor
1,443 Views
Registered: ‎05-17-2018

Something had gone wrong with attaching the code.

The code outputs 50% overlapped blocks of samples (to feed into an FFT)

Note that when rRAM gets partitioned completely, I get an II =1, but a latency = 0 (what does that mean?).

I'ld prefer to avoid partitioning RAMs completely, because in reality these are much larger than the attached example code.

 

#include <iostream>
#include <ap_fixed.h>
#include <ap_int.h>
#include <math.h>

const int c_N_FFT = 32;
const int c_N_OVLP = c_N_FFT/2;
const int c_N_ISS = 1;
const int c_N_OSS = 2 * c_N_ISS;
const int  c_N3_OVLP = c_N_ISS;
const int  c_N2_OVLP = c_N_OSS/c_N_ISS ;
const int  c_N1_OVLP = c_N_OVLP/(c_N2_OVLP * c_N3_OVLP);
constexpr size_t Log2(size_t n)
 {
  return ( (n<2) ? 1 : 1+Log2(n/2));
 }
const int c_NLOG2_OVLPWORDS = Log2(c_N_OVLP/c_N3_OVLP -1);

const int c_W_InSample = 10;
const int c_W_OutSample = 10;
const int c_W_Window = 10;
typedef ap_fixed<c_W_InSample, c_W_InSample> t_InSample;
typedef ap_fixed<c_W_OutSample, c_W_OutSample> t_OutSample;
typedef ap_fixed<c_W_Window, 1> t_Window;

typedef ap_fixed<c_N_ISS * c_W_InSample, c_N_ISS * c_W_InSample> t_LongSample;

void PreProcess(t_InSample InSample[c_N_ISS], t_OutSample OutSample[c_N_OSS]) {
//#pragma HLS INTERFACE ap_none port=OutSample

#pragma HLS PIPELINE II=1
#pragma HLS ARRAY_PARTITION variable=InSample complete
#pragma HLS ARRAY_PARTITION variable=OutSample complete

 static ap_uint<c_NLOG2_OVLPWORDS> rWrAddress;
 static t_LongSample rRAM[c_N1_OVLP][c_N2_OVLP];
//#pragma HLS ARRAY_PARTITION variable=rRAM complete

 ap_uint<c_NLOG2_OVLPWORDS-1> RdAddress = rWrAddress;
 for (int u = 0; u < c_N_ISS; ++u) {
  t_LongSample *LongSample = rRAM[RdAddress];
#pragma HLS UNROLL
  OutSample[0 * c_N_ISS + u] = LongSample[0].range(
    (u + 1) * c_W_InSample - 1, u * c_W_InSample);
  OutSample[1 * c_N_ISS + u] = LongSample[1].range(
    (u + 1) * c_W_InSample - 1, u * c_W_InSample);
 }

 t_LongSample LongSample;
 for (int u = 0; u < c_N_ISS; ++u) {
#pragma HLS UNROLL
  LongSample.range((u + 1) * c_W_InSample - 1, u * c_W_InSample) =
    InSample[u];
 }
 rRAM[rWrAddress.range(c_NLOG2_OVLPWORDS-1,1)][rWrAddress[0]] = LongSample;

 rWrAddress++;
}

 

 

0 Kudos
u4223374
Advisor
Advisor
1,404 Views
Registered: ‎04-26-2015

Latency of zero means that data is output on the same clock cycle that it reaches the input. II=1 is a normal result; if the output is ready on the same clock cycle as the input, then on the next clock cycle it'll be able to accept another input.

 

Well, you're going to need at least three RAM ports to make this work, since you read two locations (LongSample[0] and LongSample[1], both of which point to rRAM) and write one in every cycle.

 

As for the dependency - I think it's just because the access pattern is pretty messy, including sometimes writing and reading the same element. You could try using the dependence pragma with the "WAR" option (write after read; from a quick look I don't think you ever read an element immediately after writing it).

0 Kudos
fvandesa
Contributor
Contributor
1,388 Views
Registered: ‎05-17-2018

Thanks for the reply.

 

I think my example was too complicated and obfuscated my actual problem, which is very basic I'm afraid.

Below is a reduced example. What is keeping HLS from using a memory (FIFO) while still keeping II=1?

thanks in advance.

 

void Overlap_Window(t_InSample InSample,  t_OutSample &OutSample_0) {
#pragma HLS PIPELINE II=1

 static ap_uint<7> rAddr;
 static t_InSample rRam[128];
 OutSample_0 = rRam[rAddr];
 rRam[rAddr] = InSample;
 ++rAddr;
}

 

WARNING: [SCHED 204-68] The II Violation in module 'Overlap_Window': Unable to enforce a carried dependence constraint (II = 1, distance = 1, offset = 1)
   between 'store' operation (../cpp/design/PreProcess.cpp:59) of variable 'op.V', ../cpp/design/PreProcess.cpp:53 on array 'rRam_V' and 'load' operation ('rRam_V_load', ../cpp/design/PreProcess.cpp:58) on array 'rRam_V'.

 

 

 

 

 

 

0 Kudos
u4223374
Advisor
Advisor
1,368 Views
Registered: ‎04-26-2015

HLS should have spotted that, but it might just be concerned about corner cases - for example, what if you run the block once (so it writes to element 0), then reset the block and immediately run it again (so it has to read from element 0)? Now it's reading from an element that was written on the last cycle...

 

Luckily, it's easy to prevent. Just add:

#pragma HLS DEPENDENCE variable=rRam inter false
0 Kudos
fvandesa
Contributor
Contributor
1,361 Views
Registered: ‎05-17-2018

if it is reading  immediately after a reset, shouldn't it read "0" (i.e. reset value of the array elements)?

0 Kudos