Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎02-20-2019

Synthesis, Cosimulation, and Waveform II all disagree

I'm working on two different pipelined HLS cores cores that each process a packet that is 256 samples long. Roughly speaking they look something like:

void top(myaxisin_t input[256], myaxisout_t output[256], mycfg_t cfgdata[CONFIGLEN]){
#pragma HLS INTERFACE axis port=input
#pragma HLS INTERFACE axis port=output
#pragma HLS DATA_PACK variable=cfgdata
//#pragma HLS INTERFACE s_axilite port=cfgdata
for (int i=0;i<256; i++) { #pragma HLS pipeline rewind //do stuff with input[i] to generate output[i]
output[i].last=i==255; } }

With varying success I've gotten these cores to pass c-sim, synthesis, and cosim. I'll revisit why I'm including the commented axilite interface pragma later. Great.


As I understand it with the above pragmas the core will use ap_hs control lines but should be able to stream packets of data through uninterrupted provided an II=256 is achieved. This is what I want I'm streaming ADC samples through and the 256 sample packet length is related to algorithmic framing. From some posts on here I've gathered that for this to be true in hardware I need to ignore both the synthesis and implementation reports and look at the ap_ready signal, ensuring that my testbench calls top() at least twice.

When I've done all this I'm seeing the following:

Core A)

  • Synthesis: Latency 260 - 261 Interval 256 - 256 Type loop rewind(delay=0 initiation interval(s)) //Looks good
  • Implementation: Latency 262 Interval 257  (min max and avg all the same) //Hmm, maybe ok, I don't understand though!
  • Waveforms: It takes 264 clocks between ap_ready going high and 257 clocks between TLAST on my output going high (this matches ap_done) //This is a problem, right?!

Core B)

  • Synthesis: Latency 288 - 289 Interval 256 - 256 Type loop rewind(delay=0 initiation interval(s))   //Looks good
  • Implementation: Latency 289 Interval 257  (min max and avg all the same)  //Hmm, maybe ok.
  • Waveforms: It takes 291 clocks between ap_ready going high, TLAST on my output going high (in sync with ap_done, out of phase with ap_ready) //This is a problem. It also is different from A.

As if that wasn't confusing enough, what I really want to do is use axilite (or possibly and axi master) for the cfgdata port. If I uncomment that line Core A breaks utterly and core B's implementation (and waveform) II jumps to ~9500. I've tried playing with DATAFLOW and various code restructures without much success and I think part of the problem is that I don't really understand what these various numbers mean.

Any insight on this would be much appreciated.


0 Kudos
1 Reply
Registered: ‎02-20-2019

Just a note that by looking more carefully at the wavefroms I was able to verify that there is indeed a bubble (in the cosim wavefroms, anyway) in the packet approach, so it isn't doing what I need.

I've also tried an alternate vector based approach where the main loop is in the test bench. There I'm able to achieve II=1 in synthesis, Implementation here reports 1 as well, and again waveforms don't agree with more than 1 clock between outputs, though I note HLS advises

WARNING: [HLS 200-626] This design is unable to schedule all read ports in the first II cycle. The RTL testbench may treat the design as non-pipelined

The project is here, incase someone wants to git it a look. Master is packet based, the vector branch is what I'm referencing in this reply.


0 Kudos