UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer witxilinx
Observer
313 Views
Registered: ‎12-26-2018

Optimize BRAM_18k utilization for a partitioned array

My current design utilizes more resource than available on the FPGA board. For the baseline implementation, an array is declared as:

ap_uint<512> out_mem[2048]

with the total number BRAM_18k utilization of  57.

temp.png

After applying the following pragma, the BRAM_18k utilization becomes 29*4 = 116 even if the total number of bits 262144*4 = 1048576 bits is the same as above.

ap_uint<512> out_mem[2048]
#pragma HLS array_partition variable = out_mem cyclic factor = 4 dim = 1

Screenshot from 2019-06-06 16-33-07.png

Question

Is there any way to reduce the utilization of BRAM_18k in this case?

0 Kudos
6 Replies
Highlighted
Voyager
Voyager
290 Views
Registered: ‎03-28-2016

Re: Optimize BRAM_18k utilization for a partitioned array

Unfortunately, I don't see a good solution at first glance.  The unpartitioned RAM has a very high utilization of the 57 BRAMs.  The 4 partitioned RAMs have a much lower utilization.  The data width appears to be the issue.  A data width of 512-bits requires 29 RAMs each with an 18-bit data bus.  

You might have to look at your algorithm and see if you can make changes at the level.

Ted Booth - Tech. Lead FPGA Design Engineer
www.designlinxhs.com
Scholar u4223374
Scholar
285 Views
Registered: ‎04-26-2015

Re: Optimize BRAM_18k utilization for a partitioned array

As @tedbooth has said, getting that 512-bit width is requiring that each sub-array uses 29 block RAMs. The widest each RAM can be in TDP mode is 18-bit, which leads to a depth of 1024 elements - twice what you actually need.

 

Two options from here:

- Only partition it with a factor of 2, taking advantage of the fact that each RAM has two ports (ie so you can still read four values per cycle). This should reduce the total number of RAMs to 58.

- Keep the partition factor of 4, but put the RAM in SDP mode. This means you only get a single read port and a single write port from each sub-array, but in SDP mode the RAMs can operate in 512x36-bit configuration. This should reduce the number required for each sub-array down to 15, for a total of 60 RAMs used.

Observer witxilinx
Observer
262 Views
Registered: ‎12-26-2018

Re: Optimize BRAM_18k utilization for a partitioned array

@tedbooth @u4223374 I agree that there seems to be no good solutions; that array is for reading and writing and for the design it needs TDP mode.

However, I was just wondering if there is any way or any HLS pragma that would tell the Vivado HLS to configure the BRAM_18k , trading the address spacec for bit width? The array case is acceptable, but I am more concerned about implementing a not so deep, but wide data width FIFO queue, such as:

hls::stream<ap_uint<512> > queue;
#pragma HLS stream variable=queue depth=4 dim=1

This requires only 2048 bits plus other resource for logic. But as of right now, it wastes ~29 BRAM_18k.

Question: Is there any way in HLS to configure the BRAM_18k to trade address space for data width?

0 Kudos
Scholar u4223374
Scholar
259 Views
Registered: ‎04-26-2015

Re: Optimize BRAM_18k utilization for a partitioned array

@witxilinx It's not a question of what HLS can do - the BRAM hardware cannot support a port wider than 18-bit in TDP mode.

0 Kudos
Observer witxilinx
Observer
246 Views
Registered: ‎12-26-2018

Re: Optimize BRAM_18k utilization for a partitioned array

@u4223374 Thank you for your answer. I did some search and found this blog where they use Xilinx Paramteterized Macros (XPM) in place of the Block Memory Generator.

However, I do not know how if it is applicable in HLS, or how I would use this in HLS. I am not so familiar with this advanced topic. I will take some time to look into it.

0 Kudos
Voyager
Voyager
225 Views
Registered: ‎03-28-2016

Re: Optimize BRAM_18k utilization for a partitioned array

@witxilinx 

The XPM that you mentioned is not applicable to HLS.  That is for RTL development.

HLS does allow the user to specify how an operation or array is implemented by way of the RESOURCE directive.  Search "set_directive_resource" in UG902 for more details.

In the case of a shallow but wide memory, you might want to look at distributed memory instead of Block RAM.

Ted Booth - Tech. Lead FPGA Design Engineer
www.designlinxhs.com
0 Kudos