UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Contributor
Contributor
474 Views
Registered: ‎07-08-2019

Describe Off-chip Memory with Specific Bandwidth

Jump to solution

Hi,

In implementation of a module in Vivado HLS using C++, the input data of my top-level module are stored on off-chip memory (e.g. some sort of DDR). I load off-chip data part-by-part into on-chip memories which are of type BRAMs. After processing the data in BRAM, the result which is already in BRAM, stored back into off-chip memory and another part of off-chip input data is loaded and so on.

void my_top_module(
	float din[10][20][100],
	float dout[10][20][100],
	...
)
{
	float A[20][100], B[20][100];
	#pragma HLS array_partition variable=A   complete dim=0
	#pragma HLS array_partition variable=B complete dim=0
	#pragma HLS RESOURCE variable=A core=RAM_2P_BRAM
	#pragma HLS RESOURCE variable=B core=RAM_2P_BRAM

	for (int i=0; i<10; i++) {
		load din[i] into A[i];
		process(A[i]) with results in B[i] elements.
		store B[i] into dout[i];
	}
}

My module frequency is 100 MHz.

I expect the off-chip memory bandwidth to be near (and no more than) an specific amount, fore exmple 1GB/s or 1.5 GB/s.

How can I do this? What pragmas and interfaces must be used? Is there any need to some specific loop structure to perform load(A) and store(B) operations?

Thanks in advance,

Ali

0 Kudos
1 Solution

Accepted Solutions
Scholar u4223374
Scholar
440 Views
Registered: ‎04-26-2015

Re: Describe Off-chip Memory with Specific Bandwidth

Jump to solution

HLS doesn't know or care about external memory bandwidth. It just knows that there's something on the AXI bus that it can read from (or write to). For throughput calculations it assumes that it'll always be able to read/write immediately, although the synthesized hardware is happy to wait if that's required.

 

For loading data into A, the standard approach would be either memcpy or a pipelined loop:

for (int i = 0; i < LENGTH_OF_A; i++) {
#pragma HLS PIPELINE II=1 A[i] = din[x][y][i]; }

If you really need throughput, you can potentially partition A and perform wider reads. For example, you might read four 32-bit elements at once, which at 100MHz will be 1.6GB/s. However, keep in mind that this increases hardware within the block (because A will require a wider BRAM) and also increases resources for all your AXI infrastructure (because you're now using a 128-bit AXI bus).

 

Normally, the preferred option is to simply read one element per cycle. It's not the fastest, but it's normally fast enough - and it does greatly simplify the design.

View solution in original post

4 Replies
Scholar u4223374
Scholar
441 Views
Registered: ‎04-26-2015

Re: Describe Off-chip Memory with Specific Bandwidth

Jump to solution

HLS doesn't know or care about external memory bandwidth. It just knows that there's something on the AXI bus that it can read from (or write to). For throughput calculations it assumes that it'll always be able to read/write immediately, although the synthesized hardware is happy to wait if that's required.

 

For loading data into A, the standard approach would be either memcpy or a pipelined loop:

for (int i = 0; i < LENGTH_OF_A; i++) {
#pragma HLS PIPELINE II=1 A[i] = din[x][y][i]; }

If you really need throughput, you can potentially partition A and perform wider reads. For example, you might read four 32-bit elements at once, which at 100MHz will be 1.6GB/s. However, keep in mind that this increases hardware within the block (because A will require a wider BRAM) and also increases resources for all your AXI infrastructure (because you're now using a 128-bit AXI bus).

 

Normally, the preferred option is to simply read one element per cycle. It's not the fastest, but it's normally fast enough - and it does greatly simplify the design.

View solution in original post

Contributor
Contributor
401 Views
Registered: ‎07-08-2019

Re: Describe Off-chip Memory with Specific Bandwidth

Jump to solution

Hi,

Thanks @u4223374 for your response.

But, if I want to model a real off-chip memory (e.g., DDR2 with specific B.W.) then what can I do?

I mean, how can I specify an exact amount of bandwidth or an upper bound on it?

In other words, I need off-chip communications with higher bandwidth and as you said, I can achieve it by partitioning arrays of on-chip memories and performing more parallel load-store operations. But, it is not unlimited and I need an upper limit on bandwidth during my simulations and experiments.

Thanks,

Ali

0 Kudos
Xilinx Employee
Xilinx Employee
381 Views
Registered: ‎09-04-2017

Re: Describe Off-chip Memory with Specific Bandwidth

Jump to solution

Hi Ali,

  If you running just with HLS, i don't think we have a way to mimic the external interfaces. you can use SDAccel flow which might serve your purpose. Use HLS to generate the xo and import it in SDAccel environment. you can profile your code there to see how much is the latency/bandwidth that IO transactions and your kernel will take.

Thanks,

Nithin

Scholar u4223374
Scholar
360 Views
Registered: ‎04-26-2015

Re: Describe Off-chip Memory with Specific Bandwidth

Jump to solution

@akokha In HLS, all you can do is design your system so it will definitely stay within the available bandwidth. For example, with 1GB/s bandwidth and a 100MHz clock, you might choose to do 64-bit transfers - which will use 800MB/s.