cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
622 Views
Registered: ‎05-08-2018

Cannot use output registers of BRAM

I am inferring a block ram with SystemVerilog code. I want the BRAM's optional output registers to be used. Synthesis gets me exactly that, but during implementation Vivado pulls the registers out of the BRAM into seperate flip-flops which kills timing.

I am adhering to the recommended language template, initializing to 0 and not using a reset. I also read several threads discussing similar problems and no workable solution is offered there for an inferred block ram. How can I force Vivado to keep the output registers in the BRAM?

module wram #(parameter
  adr_width      = 11,
  data_width     = 16,
  pipeline_delay = 2  // 1 = only synchronous ram, 2 = additional output register
)(
  input                     clk,
  input                     we,
  input  [adr_width-1  : 0] wadr,
  input  [data_width-1 : 0] di,
  input  [adr_width-1  : 0] radr1,
  output [data_width-1 : 0] do1
);

if ((pipeline_delay != 1) && (pipeline_delay != 2)) $error("pipeline_delay must be 1 or 2");

reg   [data_width-1 : 0] mem_do;      // ram data output
reg   [data_width-1 : 0] outreg = 0;  // output register (or wire to mem_do)

logic [data_width-1:0] mem [(2**adr_width)-1:0];

always_ff @(posedge clk) begin
  if (we) mem[wadr] <= di;   // write operation
  mem_do <= mem[radr1];      // read operation (synchronous)
end;

always_ff @(posedge clk) begin
  outreg <= mem_do;          // output register
end;

generate
  if (pipeline_delay == 2) assign do1 = outreg;  // use output regsiter
  else                     assign do1 = mem_do;  // bypass output register
endgenerate;

endmodule

 

Tags (2)
0 Kudos
6 Replies
Highlighted
Teacher
Teacher
597 Views
Registered: ‎07-09-2009

whats the device your targeting ?

Can you check that the output register of the bram can be initialised to 0 at start up, I though only LUT registers could be initialised like this .

Can you check that if you put a few more registers on the output, then you get the expected, else it could be the tools sucking the rgister into the IOB 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Visitor
Visitor
577 Views
Registered: ‎05-08-2018

whats the device your targeting ?

XC7Z010

Can you check that the output register of the bram can be initialised to 0 at start up

Yes, in fact AR#64049 states that the output registers SHOULD be initialized with zeros.

Can you check that if you put a few more registers on the output

That is not a workable solution since I cannot afford to make the pipeline longer.
0 Kudos
Highlighted
Guide
Guide
560 Views
Registered: ‎01-23-2009

during implementation Vivado pulls the registers out of the BRAM into seperate flip-flops

Are you sure? I have never seen (or heard of) the tool doing this before. I thought that once the flip-flops are pulled in to the BRAM in synthesis, nothing could pull them back out.

And how is it breaking timing? The tool is timing driven, so even if it could pull the FFs out, it would only do so to improve timing. If this made the BRAM to FF timing fail, then you can only assume that the FF to "other logic" timing path is even worse (or similarly bad). You should check this before investing lots of time diagnosing what's happening here - if both the path to and from the FFs are violating, then you have a bigger problem - the tools can't find a combination of placement of these intermediate pipeline flip-flops that can pass timing.

Avrum

0 Kudos
Highlighted
Visitor
Visitor
487 Views
Registered: ‎05-08-2018

Are you sure?

Yes: in the schematic of the synthesized design, the data out of the RAM is connected directly to the multiplexer in the following stage, i.e. the FF is in the BRAM. However, in the schematic of the implemented design, the RAM is connected to a FF in the fabric and the mux comes only after that FF. So it seems pretty clear the tool is pulling the registers out of the BRAM.

I have never seen (or heard of) the tool doing this before. I thought that once the flip-flops are pulled in to the BRAM in synthesis, nothing could pull them back out.

The tool is doing this, and other people reported the same before, such as here: Vivado Pulls Registers out of BRAM. That thread was never resolved.

And how is it breaking timing? The tool is timing driven, so even if it could pull the FFs out, it would only do so to improve timing. If this made the BRAM to FF timing fail, then you can only assume that the FF to "other logic" timing path is even worse (or similarly bad).

It is breaking timing because the output delay of the RAM plus the route to the FF is 3ns combined when the output register is not used. I am aiming at 2.5ns. In the following pipeline stage I only have one level of logic and then a DSP slice with enabled input registers. I suppose that the path from a BRAM output register through one LUT to a DSP input register would not take 2.5ns as long as the LUT is reasonably placed. I would try it, but I cannnot do anything as long as that output register is not being used, so this issue needs to be fixed first.

You are suggesting that the tool makes the best choice (timing-wise) because it is timing-driven. I don't believe it does. In this case, it's doing more harm than good. In the linked thread, several users agree that the placer makes bad choices regarding output registers and one user reports: "Unless one instantiates the BRAMs with a keep_hierarchy property or uses coregen IP, Vivado does this for you if it thinks pulling the register into the fabric is going to be beneficial timing-wise".

I don't mind that the placer sometimes makes wrong choices as long as I have a way to correct them. But how can I possibly tell the tool to keep the registers in the BRAM? I am already using the suggested template and it does not achieve what it's supposed to. Please don't tell me I have to instantiate primitives like Ken Chapman did 15 years ago because the tool is still not smart enough and at the same time too stubborn to accept help from a user trying to guide it :-(

0 Kudos
Highlighted
Teacher
Teacher
468 Views
Registered: ‎07-09-2009

@dbemman
As far as I read it , AR# 64049 does not say output register needs to be initialised to zero, it just can not be initialised to anything else.

As you have seen, the tools are very fickle at inference of brams,

you mention you want to hook this into a DSP block, but you cant due to the timing,

Have you tried it ?

The BRAM and DSP blocks are co located in most if not all Xiinx FPGAs , so have a faster route than the normal long line timing,


Im guessing your doing some form of filtering / correlating. The DSP/ BRAM are set like this so that filters such as FFT and FIR can be efficiently implemented , such as this structure
https://zipcpu.com/dsp/2017/09/29/cheaper-fast-fir.html
.
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
347 Views
Registered: ‎07-16-2008

If you enabled post-place phys_opt_design, it performs the following optimizations by default.


* high-fanout optimization

* placement-based optimization of critical paths

* rewire

* critical-cell optimization

* DSP register optimization

* BRAM register optimization

* URAM register optimization

* a final fanout optimization

For the BRAM register optimization, it improves critical path delay by moving registers from slices to block RAMs, or from block RAMs to slices.

BRAM and DSP are dedicated resources in the hardware and the placement is not so flexible as fabric, especially when in the case of routing congestion. I don't think a LUT in between is a good idea for design running up to 400MHz.

The tool looks at the overall timing and tries to improve WNS and TNS. 

If you would like to prevent the optimization, you may add DONT_TOUCH property to these registers.

(*DONT_TOUCH="TRUE"*) reg   [data_width-1 : 0] outreg = 0;  // output register (or wire to mem_do)
-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos