01-01-2020 07:20 AM - edited 01-01-2020 03:03 PM
I am trying to optimize for BRAM usage for the const INT9 arr[96][16] array. Based on UG573 p.6 at the bottom, if I used simple dual-port memory configuration, I can have 512x72bit block:
When used as RAMB36 SDP memory, one port width is fixed (i.e., 512 x 64 or 512 x 72). The other port width can then be 32K x 1 through 512 x 72. When used as RAMB18 SDP memory, one port width is fixed (i.e., 512 x 36). The other port width can then be 16K x 1 through 512 x 36.
When I reshape my array with factor 16 in dimention 2, the "new" array should be 96x144bit, that is, two 72bit wide blocks in parallel, however, HLS uses 4 blocks.
#pragma HLS ARRAY_RESHAPE variable=arr block factor=16 dim=2 #pragma HLS RESOURCE variable=arr core=RAM_S2P_BRAM
The array is accessed in a pipelined II=1 loop (i<96). Is it possible that HLS fails to configure the memory for 512x72bit and instead uses four 1k x 36bit blocks? I am using ZCU104 Eval board (xczu7ev-ffvc1156-2-e).
UPDATE:
The similar scenario is below when a UINT9 [50][8] is rashaped by factor 8 in dimnsion 2. This time it sould be a 50x72bit single BRAM, however, HLS uses two. The observation is that the bitwidth in the table shows 71bits only, so the 72nd bit is in the second memory probably. I thought, one bit was reserved for parrity, but the spec says that "The parity bits are only available for the x9, x18, and x36 port widths" p.8. So, I assume that that's not the case.
01-02-2020 07:36 AM
@naz_rb HLS reports in terms of RAMB18k. You can look at the headings in utilization estimates. Can you share your test. Might be easier to take a look and understand the issue
Thanks,
Nithin
01-01-2020 10:07 AM
01-01-2020 03:04 PM
01-01-2020 09:27 PM
@naz_rb Did you check if the pragma is being honored for core=RAM_S2P_BRAM. If HLS can do it with a single address port, it will give preference to that.
For x72 mode, you will need separate read and write addresses. In Single port mode, BRAM cannot be configured as x72 mode. please check the RTL for RAM to see if that is the case.
Thanks,
Nithin
01-01-2020 09:55 PM
@naz_rb This is a simple example which i have written, that shows 2 RAMB18 as the utilization which is same as 1 RAMB36
#include <ap_cint.h>
void top(uint9 din[50], int addr, uint9 dout[50])
{
uint9 arr[50][8];
#pragma HLS ARRAY_RESHAPE variable=arr block factor=8 dim=2
L1:for(int i=0;i<50;i++)
{
#pragma HLS PIPELINE II=1
arr[i][addr] = din[i];
}
L2:for(int j=0;j<50;j++)
{
#pragma HLS PIPELINE II=1
dout[j] = arr[j][addr];
}
}
you can modify this to show your issue and send it back.
Thanks,
Nithin
01-02-2020 07:00 AM
Hi @nithink ,
My array is declared as "const", so HLS gives me the following message during synthesis:
WARNING: [RTMG 210-274] Memory 'L_02_PRO_32_16_wei' is read-only, switch it to a ROM. INFO: [RTMG 210-279] Implementing memory 'L_02_PRO_32_16_wei_rom' using block ROMs.
Does it mean that the ROM does not support 72bit ports? How do I know if it uses 18/36 BRAM?
I guess, the more important question is this:
If I declared this array as an argument to the function and wanted to write-only externally and read-only internally, then the simple dual port of 72bits wide would still work ? The purpose of this is to fill the ram using PS and then read from it by the funtion. I do not intend to read and write at the same time, if this matters.
Thank you,
01-02-2020 07:36 AM
@naz_rb HLS reports in terms of RAMB18k. You can look at the headings in utilization estimates. Can you share your test. Might be easier to take a look and understand the issue
Thanks,
Nithin
01-02-2020 09:04 AM - edited 01-02-2020 09:13 AM
@nithink If HLS reports BRAM utilization in 18k blocks, then the report on the attached picture makes sense. Please, see if my reasoning is correct:
1. The const INT9 wei[50][8] when reshaped by f=8, dim2 is made of two 18K BRAMs in parallel to create 512 x 72bit memory.
2. The INT32 acc[960] uses two 18k BRAMs stacked to make 1k x 36bit memory.
The only thing that does not make sence is why the report for wei_U says Bits P0 = 71 (not 72).
I will provide the code later today.
P.S. It was sad to learn that the total available memory is reported in 18k BRAMs. Now it means that the memory is actually a half of what I was counting on. :/
01-02-2020 09:57 AM
@naz_rb on 71 bits vs 72, since you are using it as ROM, if one of the bits is a same across all the addresses, HLS can optimise it
Thanks,
Nithin
01-02-2020 10:03 AM
01-02-2020 12:50 PM - edited 01-02-2020 12:52 PM
@nithink If everything is built of 18k BRAM, then what's the point of having 36k BRAM? What are the differences?
01-02-2020 09:04 PM
@naz_rb These are estimates at HLS Synthesis. Actual inference will happen during logic synthesis. So you will be able to see RAMB36/RAMB18 based on the width and depth being used appropriately. Please take a look at utilization report after Vivado Synthesis
Thanks,
Nithin