UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
9,192 Views
Registered: ‎06-05-2014

Prevent Block Ram Cascade Chain?

Synthesis is using the 'Block Ram Cascade chain' on a very fast clock which consequently fails timing in P&R:

 

INFO: [Synth 8-5555] Implemented Block Ram Cascade chain of height 8 and width 8 for RAM U0/bufmem_0_reg

 

Is there any flag to prevent Synthesis from doing this?

 

-bisector

 

0 Kudos
5 Replies
Historian
Historian
9,191 Views
Registered: ‎01-23-2009

Re: Prevent Block Ram Cascade Chain?

How big is the RAM that you are trying to implement. If the RAM is deep and narrow, and you are not allowing for pipelining of the output, then the cascade path is the best way to implement the RAM. If, however, synthesis is implementing the RAM inefficiently (like using depth expansion rather than width expansion), then there may be a way to change that.

 

So, this all depends on the RAM - exactly what size RAM are you trying to implement (width and depth).

 

Avrum

0 Kudos
Adventurer
Adventurer
9,189 Views
Registered: ‎06-05-2014

Re: Prevent Block Ram Cascade Chain?

type Tbufmem is array(0 to 32767) of std_logic_vector(7 downto 0);
attribute ram_style: string;
signal bufmem_0: Tbufmem;
attribute ram_style of bufmem_0: signal is "block";

 

This same code works fine (and meets timing) in Kintex-7 where there is no such hardware-cascade available. The problem arises only now that we target Kintex-U.

Historian
Historian
9,150 Views
Registered: ‎01-23-2009

Re: Prevent Block Ram Cascade Chain?

So, this needs to be implemented as 8 block RAMs (since each block RAM is 32kbit if you aren't using the "parity" bit).

 

There are two ways to implement this

 

Depth expansion: 8 RAMs, each RAM is 4kx8

   - this requires multiplexing the data from the 8 RAMs for readback, which in UltraScale is done using the cascade paths

 

Width expansion: 8 RAMs, each RAM is 32kx1

   - each RAM is responsible for one data of the entire RAM, so there is no MUXing/cascading

 

With UltraScale, it looks like it is choosing Depth expansion, which is costing you timing. All other things being equal, this is actually the better implementation in terms of power (it consumes 1/8th of the power of width expansion), but it costs in terms of timing.

 

Now the question is - how do we force Vivado to perform width expansion instead of depth expansion... Unfortunately, the answer is "I don't know..." I would have thought that Vivado would make the right choice based on timing requirements - are you sure your design is properly constrained at synthesis time? If there are no constraints on the output paths during synthesis, it would probably select depths expansion to save power.

 

I don't see any attribute that controls how the larger RAM is built (choosing depth vs. width). There are probably lots of ways of forcing it to use width expansion. You could implement a "bit enable" on writes; since each RAM has only one write enable per byte, this would force it to use the RAMs in parallel (but the tools will try and optimize this out if it is trivially redundant). You could also break your RAM into 8 parallel RAMs in RTL; you could even go so far as to implement a 32kx1 RAM in a submodule and instantiate it 8 times in parallel with a generate statement - if you have "flatten_hierarchy" set to "none" then it probably won't be able to merge the RAMs.

 

But you have hit an interesting issue. There probably needs to be some attribute added to synthesis to allow the user to control this...

 

Avrum

Adventurer
Adventurer
9,131 Views
Registered: ‎06-05-2014

Re: Prevent Block Ram Cascade Chain?

Thanks @avrumw. I can certainly work around it using one of the methods you described. Was hoping for an "easy way out". When I open the synthesized-design run it certainly seems to know about the clock. I have even increased it to 500 MHz to no avail.

 

 

 

 

0 Kudos
Newbie mdjamoos2
Newbie
1,792 Views
Registered: ‎04-13-2018

Re: Prevent Block Ram Cascade Chain?

Related to this issue, I am interested in being able to force a cascade chain implementation.  

In my design, I was consistently getting a cascade implementation for module A, which seemed efficient and appropriate.  I locked down placement of the moduleA blockrams for reliable timing closure.  Then, in an unrelated part of the design, moduleB, I replaced a distributed RAM with a blockram.  Now, for no obvious reason, the cascade implementation of moduleA (cascade heights: 8,4,2) has been replaced by a larger implementation with no cascade.  And so my placement constraints were ignored.

I know I can write the constraints so that they are less sensitive to naming, but with 2 additional BRAMs to place, that is much harder to accomodate.

I don't understand how a minor change to add 1 more blockRAM to a one module would cause such a dramatic change in implementation of an unrelated module (target device: VU7P).  Ideally, I should be able to force these inferred block rams to be cascaded - maybe through something like a ramstyle setting.

Thanks for your help,

Mike

 

0 Kudos