cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
656 Views
Registered: ‎05-19-2014

[Vivado HLS 2019.1] BRAM Contents Streaming at II=1 Using write_nb()

I am trying to dump the contents of a BRAM memory to a stream after filling it with some data. I have to use a non-blocking `write_nb()` rather than a plain `write()` to avoid deadlocking.

I have not been able to identify a code pattern that will make Vivado HLS 2019.1 produce a solution at II=1. It is constantly complaining about dependencies, it cannot meet. In the version below, it cannot schedule the read-out pointer `y_ptr`.

#include <hls_stream.h>
#include <ap_int.h>

using K = ap_uint<10>;
using V = ap_uint<18>;
struct KV {
	K key;
	V val;
};

static unsigned const  FILL = 0;
static unsigned const  SETTLE1 = 1;
static unsigned const  SETTLE2 = 2;
static unsigned const  DUMP = 3;

void top(
	hls::stream<KV> &src,
	hls::stream<V>  &dst
) {
#pragma HLS pipeline II=1
	static unsigned state = FILL;
	static V mem[1<<K::width] = { 0, };
	static K y_ptr = 0;

	switch(state) {
	case FILL: {
		KV x;
		if(src.read_nb(x)) {
			mem[x.key] = x.val;
			if(x.key == 0)  state = SETTLE1;
		}
	}
	break;

	case SETTLE1: state = SETTLE2; break;
	case SETTLE2: state = DUMP;    break;

	case DUMP: {
		V const mv = mem[y_ptr];
		if(dst.write_nb(mv)) {
			if(y_ptr == K{0}-1)  state = FILL;
			y_ptr++;
		}
	}
	break;
	}

} // top()

What readout coding pattern would help me achieve an II of 1?

0 Kudos
11 Replies
Highlighted
Xilinx Employee
Xilinx Employee
630 Views
Registered: ‎09-04-2017

@preusser  with 2019.2, HLS schedules this code with II=1. will you be able to work using 2019.2 or 2020.1?

Thanks,

Nithin

0 Kudos
Highlighted
Adventurer
Adventurer
560 Views
Registered: ‎05-19-2014

Thanks for checking and confirming, @nithink!

While moving up to Vivado HLS 2019.2 is not completely impossible, it is a major hassle with all the system IP that we have around this module.
If there was a way for recoding the solution to make it work with 2019.1, this would be a major relief.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
542 Views
Registered: ‎09-04-2017

@preusser  Have recoded slightly. with this we get II=1 in 2019.1. please check

#include <hls_stream.h>
#include <ap_int.h>

using K = ap_uint<10>;
using V = ap_uint<18>;
struct KV {
	K key;
	V val;
};

static unsigned const  FILL = 0;
static unsigned const  SETTLE1 = 1;
static unsigned const  SETTLE2 = 2;
static unsigned const  DUMP = 3;

void top(
	hls::stream<KV> &src,
	hls::stream<V>  &dst
) {
#pragma HLS pipeline II=1
	static unsigned state = FILL;
	static V mem[1<<K::width] = { 0, };
	static K y_ptr = 0;
    K cur_ptr = 0;
	switch(state) {
	case FILL: {
		KV x;
		if(src.read_nb(x)) {
			mem[x.key] = x.val;
			if(x.key == 0)  state = SETTLE1;
		}
	}
	break;

	case SETTLE1: state = SETTLE2; break;
	case SETTLE2: state = DUMP;    break;

	case DUMP: {
		V const mv = mem[cur_ptr];
		cur_ptr = y_ptr;
		y_ptr++;
		if(dst.write_nb(mv)) {
			if(cur_ptr == K{0}-1)  state = FILL;
			//y_ptr++;
		}
	}
	break;
	}

} // top()

 

Thanks,

Nithin

0 Kudos
Highlighted
Adventurer
Adventurer
535 Views
Registered: ‎05-19-2014

Thanks for the suggestion, Nithin.
Unfortunately, this approach will allow `y_ptr` and, hence, also `cur_ptr` to leap ahead when there is backpressure. So, the output will be gapped and incomplete.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
519 Views
Registered: ‎09-04-2017

@preusser  At the time of transitioning to FILL state we can keep a flag and in FILL state you can decrement y_ptr. That should work isn't it?

Thanks,

Nithin

0 Kudos
Highlighted
Adventurer
Adventurer
508 Views
Registered: ‎05-19-2014

Thanks, Nithin. I appreciate your involvement.
`y_ptr` has to scan through the entire address space so that each memory value is included in the memory dump performed in the `DUMP` state. It is incremented upon a successful `write_nb` to facilitate stepping to the next memory cell. If it helps, the reinstantiation of its starting condition `y_ptr` = 0 could be moved to any other state. Even additional settling states before or after the memory dump would be acceptable.
0 Kudos
Xilinx Employee
Xilinx Employee
505 Views
Registered: ‎09-04-2017

This is what i meant. If you have the testbench, we can validate quickly if there's any discrepancy

#include <hls_stream.h>
#include <ap_int.h>

using K = ap_uint<10>;
using V = ap_uint<18>;
struct KV {
	K key;
	V val;
};

static unsigned const  FILL = 0;
static unsigned const  SETTLE1 = 1;
static unsigned const  SETTLE2 = 2;
static unsigned const  DUMP = 3;

void top(
	hls::stream<KV> &src,
	hls::stream<V>  &dst
) {
#pragma HLS pipeline II=1
	static unsigned state = FILL;
	static V mem[1<<K::width] = { 0, };
	static K y_ptr = 0;
    K cur_ptr = 0;
    bool dec;
    dec = false;
	switch(state) {
	case FILL: {
		KV x;
		if (dec)
		{
		 y_ptr = y_ptr - 1;
		 dec = false;
		}
		if(src.read_nb(x)) {
			mem[x.key] = x.val;
			if(x.key == 0)  state = SETTLE1;
		}
	}
	break;

	case SETTLE1: state = SETTLE2; break;
	case SETTLE2: state = DUMP;    break;

	case DUMP: {
		V const mv = mem[cur_ptr];
		cur_ptr = y_ptr;
		y_ptr++;
		if(dst.write_nb(mv)) {
			if(cur_ptr == K{0}-1) {
				state = FILL;
				dec = true;
			}
			//y_ptr++;
		}
	}
	break;
	}

} // top()

 

Thanks,

Nithin

0 Kudos
Highlighted
Adventurer
Adventurer
356 Views
Registered: ‎05-19-2014

Dear Nithin,
I happily share my testbench.

test.hpp:

#include <hls_stream.h>
#include <ap_int.h>

using K = ap_uint<8>;
using V = ap_uint<16>;
struct KV { K key; V val; };

void top(hls::stream<KV> &src, hls::stream<V>  &dst);

test_tb.cpp:

#include "test.hpp"

#include <random>

int main() {
	// Interface Streams
	hls::stream<KV> src;
	hls::stream<V>  dst;

	// Generate Input Data mapping: key -> key*key+1
	for(unsigned i = (1<<K::width); i-- > 0;) {
		src.write(KV{ .key = i, .val = i*i+1 });
	}

	// Execute Design for 1000 cycles
	//  - randomly skip consuming generated output in one of seven cycles
	std::default_random_engine  gen;
	std::uniform_int_distribution<unsigned>  rnd(0,6);

	bool err = false;
	for(unsigned clk = 0, i = 0; clk < 1000; clk++) {
		top(src, dst);
		if(rnd(gen) > 0) {
			V  y;
			if(dst.read_nb(y)) {
				// Validate produced output
				bool const  ok = y == V{i*i+1};
				std::cout << i << '\t' << y << '\t' << (ok? "OK" : "ERROR") << std::endl;
				err |= !ok;
				i++;
			}
		}
	}

	return  err? 1 : 0;
}

test.cpp:

#include "test.hpp"

static unsigned const  FILL = 0;
static unsigned const  SETTLE1 = 1;
static unsigned const  SETTLE2 = 2;
static unsigned const  DUMP = 3;

// Original: II=2
void top(hls::stream<KV> &src, hls::stream<V>  &dst) {
#pragma HLS pipeline II=1
	static unsigned state = FILL;
	static V mem[1<<K::width] = { 0, };
	static K y_ptr = 0;

	switch(state) {
	case FILL: {
		KV x;
		if(src.read_nb(x)) {
			mem[x.key] = x.val;
			if(x.key == 0)  state = SETTLE1;
		}
	}
	break;

	case SETTLE1: state = SETTLE2; break;
	case SETTLE2: state = DUMP;    break;

	case DUMP: {
		V const mv = mem[y_ptr];
		if(dst.write_nb(mv)) {
			if(y_ptr == K{0-1})  state = FILL;
			y_ptr++;
		}
	}
	break;
	}

} // top()

// Modified Proposal: II=1
void top_(
	hls::stream<KV> &src,
	hls::stream<V>  &dst
) {
#pragma HLS pipeline II=1
	static unsigned state = FILL;
	static V mem[1<<K::width] = { 0, };
	static K y_ptr = 0;
    K cur_ptr = 0;
    bool dec;
    dec = false;
	switch(state) {
	case FILL: {
		KV x;
		if (dec)
		{
		 y_ptr = y_ptr - 1;
		 dec = false;
		}
		if(src.read_nb(x)) {
			mem[x.key] = x.val;
			if(x.key == 0)  state = SETTLE1;
		}
	}
	break;

	case SETTLE1: state = SETTLE2; break;
	case SETTLE2: state = DUMP;    break;

	case DUMP: {
		V const mv = mem[cur_ptr];
		cur_ptr = y_ptr;
		y_ptr++;
		if(dst.write_nb(mv)) {
			if(cur_ptr == K{0-1}) {
				state = FILL;
				dec = true;
			}
			//y_ptr++;
		}
	}
	break;
	}

} // top()

Currently, my original `top()` function is used. Rename `top` <-> `top_` to activate your proposal.

The active original `top()`:

  • passes csim,
  • synthsizes with II=2, and
  • passes C/RTL cosim.

Your proposal would probably require `cur_ptr` and `dec` to become static state in order to pass the C simulation. Note that you cannot validate the proper behavior of a design under backpressure in plain C simulation as the software model of `hls::stream` assumes an infinite depth and, hence, cannot assert backpressure. Only C/RTL co-simulation may some insight under these conditions.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
220 Views
Registered: ‎09-04-2017

@preusser   Have recoded slightly and ran with your testbench. C/RTL co-sim pass and II=1 is achieved with this modification

#include "test.hpp"

static unsigned const  FILL = 0;
static unsigned const  SETTLE1 = 1;
static unsigned const  SETTLE2 = 2;
static unsigned const  DUMP = 3;

// Original: II=2
void top(hls::stream<KV> &src, hls::stream<V>  &dst) {
#pragma HLS pipeline II=1
	static unsigned state = FILL;
	static V mem[1<<K::width] = { 0, };
	static K y_ptr = 0;
  static unsigned nxt_state;


	switch(state) {
	case FILL: {
		KV x;
		if(src.read_nb(x)) {
			mem[x.key] = x.val;
			if(x.key == 0)  nxt_state = SETTLE1;
		}
	}
	break;

	case SETTLE1: nxt_state = SETTLE2; break;
	case SETTLE2: nxt_state = DUMP;    break;

	case DUMP: {
		V const mv = mem[y_ptr];
		if(dst.write_nb(mv)) {
			if(y_ptr == K{0-1}) { 
      nxt_state = FILL; 
      }
			y_ptr++;
		}
	}
	break;
	}
  state = nxt_state;
} // top()
Highlighted
Adventurer
Adventurer
205 Views
Registered: ‎05-19-2014

Thanks for your efforts, @nithink.

I think, we should settle the case here and conclude that I have to move on to a newer HLS version.

I still see an increased II=2 carried by the static `state` variable. I can also not make the pointer increment `y_ptr++` unconditional as this would skip the identified memory locations under backpressure. Passing my testbench nonetheless is certainly a deficiency on my side. However, I would have to replace the `hls::stream` abstraction by something custom to be able to model backpressure at all. Note that the `hls::stream` software model shipping with HLS is simply incapable of asserting backpressure as it assumes an infinite FIFO depth.

0 Kudos
Highlighted
Adventurer
Adventurer
146 Views
Registered: ‎05-19-2014

There is a workaround around the problematic II performance when using `write_nb()` in Vivado 2019.1. The purpose of the non-blocking write is to decouple the initiation synchronization of the stages within a dataflow region. This can also be achieved through pragmas:

#pragma HLS dataflow disable_start_propagation
#pragma HLS interface ap_ctrl_none port=return

 With these two pragmas in place in the encompassing dataflow region, a blocking `write()` can be used. The code then synthesizes perfectly with II=1, and its implementation operates properly even in the presence of backpressure on the device.

Unfortunately, these pragmas prevent the cosimulation engine from synchronizing into the RTL simulation properly. Cosimulation may sometimes break. This is confirmed by the documentation. In any case, the documentation and differentiation of these two pragmas appears to be exceptionally vague.

0 Kudos