cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
naz_rb
Adventurer
Adventurer
1,127 Views
Registered: ‎11-10-2019

Unexpected BRAM utilization

Jump to solution

I am trying to optimize for BRAM usage for the const INT9 arr[96][16] array. Based on UG573  p.6 at the bottom, if I used simple dual-port memory configuration, I can have 512x72bit block:

When used as RAMB36 SDP memory, one port width is fixed (i.e., 512 x 64 or 512 x 72).
The other port width can then be 32K x 1 through 512 x 72. When used as RAMB18 SDP
memory, one port width is fixed (i.e., 512 x 36). The other port width can then be
16K x 1 through 512 x 36.

When I reshape my array with factor 16 in dimention 2, the "new" array should be 96x144bit, that is, two 72bit wide blocks in parallel, however, HLS uses 4 blocks.

#pragma HLS ARRAY_RESHAPE variable=arr block factor=16 dim=2
#pragma HLS RESOURCE variable=arr core=RAM_S2P_BRAM

The array is accessed in a pipelined II=1 loop (i<96). Is it possible that HLS fails to configure the memory for 512x72bit and instead uses four 1k x 36bit blocks? I am using ZCU104 Eval board (xczu7ev-ffvc1156-2-e).

UPDATE:

The similar scenario is below when a UINT9 [50][8] is rashaped by factor 8 in dimnsion 2. This time it sould be a 50x72bit single BRAM, however, HLS uses two. The observation is that the bitwidth in the table shows 71bits only, so the 72nd bit is in the second memory probably. I thought, one bit was reserved for parrity, but the spec says that "The parity bits are only available for the x9, x18, and x36 port widths" p.8. So, I assume that that's not the case.

bram.png

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
nithink
Xilinx Employee
Xilinx Employee
954 Views
Registered: ‎09-04-2017

@naz_rb  HLS reports in terms of RAMB18k. You can look at the headings in utilization estimates.  Can you share your test. Might be easier to take a look and understand the issue

Thanks,

Nithin

View solution in original post

11 Replies
drjohnsmith
Teacher
Teacher
1,096 Views
Registered: ‎07-09-2009
what is your device your targeting ?

The BRAMs in a device have constraints on how many IO pins they have, you need to look at the data sheet for the device you have,
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
naz_rb
Adventurer
Adventurer
1,068 Views
Registered: ‎11-10-2019
I am using ZCU104 Eval board. Also, I updated my question.
0 Kudos
nithink
Xilinx Employee
Xilinx Employee
1,021 Views
Registered: ‎09-04-2017

@naz_rb  Did you check if the pragma is being honored for core=RAM_S2P_BRAM. If HLS can do it with a single address port, it will give preference to that.

For x72 mode, you will need separate read and write addresses. In Single port mode, BRAM cannot be configured as x72 mode. please check the RTL for RAM to see if that is the case.

Thanks,

Nithin

 

nithink
Xilinx Employee
Xilinx Employee
1,015 Views
Registered: ‎09-04-2017

@naz_rb  This is a simple example which i have written, that shows 2 RAMB18 as the utilization which is same as 1 RAMB36

#include <ap_cint.h>
void top(uint9 din[50], int addr, uint9 dout[50])
{
uint9 arr[50][8];
#pragma HLS ARRAY_RESHAPE variable=arr block factor=8 dim=2

L1:for(int i=0;i<50;i++)
{
#pragma HLS PIPELINE II=1
arr[i][addr] = din[i];
}

L2:for(int j=0;j<50;j++)
{
#pragma HLS PIPELINE II=1
dout[j] = arr[j][addr];
}
}

you can modify this to show your issue and send it back.

Thanks,

Nithin

0 Kudos
naz_rb
Adventurer
Adventurer
965 Views
Registered: ‎11-10-2019

Hi @nithink ,

My array is declared as "const", so HLS gives me the following message during synthesis:

 

WARNING: [RTMG 210-274] Memory 'L_02_PRO_32_16_wei' is read-only, switch it to a ROM.
INFO: [RTMG 210-279] Implementing memory 'L_02_PRO_32_16_wei_rom' using block ROMs.

Does it mean that the ROM does not support 72bit ports? How do I know if it uses 18/36 BRAM?

I guess, the more important question is this:

If I declared this array as an argument to the function and wanted to write-only externally and read-only internally, then the simple dual port of 72bits wide would still work ? The purpose of this is to fill the ram using PS and then read from it by the funtion. I do not intend to read and write at the same time, if this matters.

Thank you,

 

0 Kudos
nithink
Xilinx Employee
Xilinx Employee
955 Views
Registered: ‎09-04-2017

@naz_rb  HLS reports in terms of RAMB18k. You can look at the headings in utilization estimates.  Can you share your test. Might be easier to take a look and understand the issue

Thanks,

Nithin

View solution in original post

naz_rb
Adventurer
Adventurer
940 Views
Registered: ‎11-10-2019

@nithink  If HLS reports BRAM utilization in 18k blocks, then the report on the attached picture makes sense. Please, see if my reasoning is correct:

1. The const INT9 wei[50][8] when reshaped by f=8, dim2 is made of two 18K BRAMs in parallel to create 512 x 72bit memory. 

2. The INT32 acc[960] uses two 18k BRAMs stacked to make 1k x 36bit memory.

The only thing that does not make sence is why the report for wei_U says Bits P0 = 71 (not 72).

I will provide the code later today.

P.S. It was sad to learn that the total available memory is reported in 18k BRAMs. Now it means that the memory is actually a half of what I was counting on. :/

0 Kudos
nithink
Xilinx Employee
Xilinx Employee
921 Views
Registered: ‎09-04-2017

@naz_rb on 71 bits vs 72, since you are using it as ROM, if one of the bits is a same across all the addresses, HLS can optimise it

 

Thanks,

Nithin

drjohnsmith
Teacher
Teacher
915 Views
Registered: ‎07-09-2009
For reference,
Your board has the XCZU7 EV part,
page 8 here
https://www.xilinx.com/support/documentation/selection-guides/zynq-ultrascale-plus-product-selection-guide.pdf

has 11 Mb block ram and 27 Mb Uram,

Note Uram can not be used a ROM

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
naz_rb
Adventurer
Adventurer
866 Views
Registered: ‎11-10-2019

@nithink  If everything is built of 18k BRAM, then what's the point of having 36k BRAM? What are the differences?

0 Kudos
nithink
Xilinx Employee
Xilinx Employee
791 Views
Registered: ‎09-04-2017

@naz_rb  These are estimates at HLS Synthesis. Actual inference will happen during logic synthesis. So you will be able to see RAMB36/RAMB18 based on the width and depth being used appropriately. Please take a look at utilization report after Vivado Synthesis

Thanks,

Nithin 

0 Kudos