UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
9,584 Views
Registered: ‎06-04-2015

Inferred RAM Mapping to BRAM Inefficiently

Jump to solution

My design infers a 36-bit wide dual-port RAM, and I'm using ISE to target a Spartan 6 part.

 

When I make the RAM 2048 addresses deep (i.e., a 2048x36 dual-port RAM) ISE maps this to 4 RAMB16BWERs, each 512x36. This is as expected.

 

I would expect that a 4096x36 dual-port RAM would be mapped to 8 RAMB16BWERs (each 512x36), but for some reason ISE chooses to map this to 9 4096x4 RAMB16BWERs.

 

When I use coregen to generate a 4096x36 dual-port RAM it selects 8 RAMB16BWERs, as expected.

 

This is clearly a bug in ISE. Short of using coregen (which I don't want to use because I need the flexibility of RAM inference) is there any way I can force ISE to map the 4096x36 RAM correctly?

0 Kudos
1 Solution

Accepted Solutions
Instructor
Instructor
17,612 Views
Registered: ‎08-14-2007

Re: Inferred RAM Mapping to BRAM Inefficiently

Jump to solution

Did you look at the technology schematic to see if the 2K deep version is really using 512 x 36 BRAMs?  I would have expected it to use 4 BRAMs of 2K x 9 to avoid cascading by depth.  Unfortunately there is no equivalent 4K x 4.5 bit so the tools will use 4K x 4 BRAMs to built your deeper memory, and that requires 9 BRAMs.  Core Generator would allow you to choose the memory primitive to use.  2K by 9 might be best to reduce multiplexer usage while still using only 8 BRAMs.  I don't know of a way to determine the primitive used for inference.  You might consider inferring two 2K x 36 memories and code the multiplexing and write enable logic externally.

-- Gabor
0 Kudos
3 Replies
Instructor
Instructor
17,613 Views
Registered: ‎08-14-2007

Re: Inferred RAM Mapping to BRAM Inefficiently

Jump to solution

Did you look at the technology schematic to see if the 2K deep version is really using 512 x 36 BRAMs?  I would have expected it to use 4 BRAMs of 2K x 9 to avoid cascading by depth.  Unfortunately there is no equivalent 4K x 4.5 bit so the tools will use 4K x 4 BRAMs to built your deeper memory, and that requires 9 BRAMs.  Core Generator would allow you to choose the memory primitive to use.  2K by 9 might be best to reduce multiplexer usage while still using only 8 BRAMs.  I don't know of a way to determine the primitive used for inference.  You might consider inferring two 2K x 36 memories and code the multiplexing and write enable logic externally.

-- Gabor
0 Kudos
Historian
Historian
9,549 Views
Registered: ‎01-23-2009

Re: Inferred RAM Mapping to BRAM Inefficiently

Jump to solution

It isn't actually a bug, its a choice.

 

When a memory is larger than one physical BRAM, it needs to be constructed from multiple RAMs. When this occurs, there are two ways of doing it - width expansion and depth expansion.

 

If you do width expansion, then all the RAMs get placed in parallel - they all get the same addresses and control signals. But some of the RAMs are responsible for some bits of the wide word, and others for others. This has the advantage that each DIN bit  goes directly to one RAM, and (often more importantly due to speed considerations) each DOUT bit comes directly from one RAM. So this has the advantage of having the fastest clock 2 Q.

 

If you do depth expansion, then you do the opposite. Each RAM is responsible for a portion of the dataspace; all 36 bits of the lowest address range go to/come from the lowest RAM, the next address range comes from the next, etc... The disadvantage of this is that you need a multiplexer (implemented in LUTs) to generate the final data output - essentially selecting between the output of the lowest RAM, the next RAM, etc... This will slow down your effective clock2Q since it needs to go through the multiplexer. The (normal) advantage of this is that it consumes less power (since only one RAM is cycling at a time).

 

However, due to the "parity" bit of the RAM, there is an extra cost of doing width expansion. When you go below x9 for a BRAM, you lose the use of the parity bit (as Gabor mentioned, there is no RAM with a width of 4.5). So, while the RAM would fit in 8 RAMs using depth expansion 8 x 512 x 36, it  needs more RAMs using width expansion - since it really wants to go 4096 x (8x4.5), so it ends up with  4096 x (9x4).

 

So, while less efficient from resource utilization, the inferred RAM will have a faster clock2Q.

 

Avrum

0 Kudos
Highlighted
9,499 Views
Registered: ‎06-04-2015

Re: Inferred RAM Mapping to BRAM Inefficiently

Jump to solution

Thanks for the explanation Gabor and Avrum. In the interest of time, I haven't gone back to check whether the 2Kx36 RAM actually uses the BRAMs in 512x36 or 2Kx9 configuration, but it really help to understand the theory behind the issue. Because my design needs efficient RAM packing (will end up using every single BRAM on the part), I ended up re-coding the RAM module so that it would infer the 512x36 configuration.

0 Kudos