cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
324 Views
Registered: ‎07-02-2019

Timing fails for inferred Block RAM

Hi,

I am developing a design for a Zynq Ultrascale+ (xczu9eg) in VHDL using Vivado v2018.3 (the old version is due to compatibility reasons with the project).

Currently it does not meet timing constraints due to some negative slacks with sources / destinations similar to:

Source:     i_system_wrapper/system_i/sc_logen_1/U0/sel/CLKBWRCLK

Destination:  i_system_wrapper/system_i/sc_logen_1/U0/psdsp_16/D

Both are not directly related to signals of the design, but inferred by Vivado from a clock-triggered process containing a for loop and a multi-stage pipeline doing some math implemented in DSPs. sc_logen_1/U0/sel is implemented as a RAMB18E2. So far I have not found out what it does. The problem might be related to some messages during synthesis:

INFO: [Synth 8-6837] The timing for the instance i_system_wrapper/system_i/i_5/sc_logen_1/U0i_0/sel (implemented as a Block RAM) might be sub-optimal as no optional output register could be merged into the block ram. Providing additional output register may help in improving timing.

What options do I have? Any help is greatly appreciated. Thank you.

Best regards,

Erik

 

0 Kudos
4 Replies
Highlighted
252 Views
Registered: ‎01-22-2015

Re: Timing fails for inferred Block RAM

@e_heinz 

..... inferred by Vivado from a clock-triggered process containing a for loop and a multi-stage pipeline doing some math implemented in DSPs....

Did you or someone on your team write this VHDL?  -and do you want the VHDL to infer BRAM?  If so, Vivado is very particular about how you write the VHDL to infer BRAM.  See "RAM HDL Coding Guidelines" on page 118 of Xilinx document UG901(v2019.2).  Or, you can use the Xilinx Block Memory Generator IP (see document PG058) to setup and instantiate BRAM into your design.

 

The problem might be related to some messages during synthesis:

The "INFO: [Synth 8-6837]" message is suggesting that you add more pipelining to the design.  Specifically, it suggests that you use VHDL to place registers on the output of the inferred BRAM.  The BRAM will then automatically "pull-in" these registers and pipeline itself.  This additional pipelining could help your design pass timing analysis.

Cheers,
Mark

0 Kudos
Highlighted
Visitor
Visitor
206 Views
Registered: ‎07-02-2019

Re: Timing fails for inferred Block RAM

I wrote the VHDL. At least the part in question, and the adjacent blocks. The rest and most of the project, however, is inherited from a reference design by Analog Devices.

I have not intended a Block RAM at this point. The code looks like this:

lo_gen : process (clk_in)
  begin
    if (not gdt_ok) then
      for i in 0 to CH_NUM-1 loop
      -- [...]
-- reset logic here end loop; elsif rising_edge(clk_in) then for i in 0 to CH_NUM-1 loop -- [...]
-- lengthy calculations here -- about 50 lines, 13 pipeline stages, and 40 signal arrays end loop; end if; end process lo_gen;

It is a local oscillator for n frequencies, i.e. a sin/cos calculations using a small lookup table and a Taylor approximations. The calculations are implemented by 6 DSP modules per channel which absorb most of the signals.

There is no signal named "sel" in the VHDL code. This is created by Vivado and I have no idea what actually goes into the Block RAM. Therefore I do not know how to do more pipelining, since I do not know to what signals of the design are affected.

I tried to prohibit using Block RAM by constrictions "set_property RAM_STYLE DISTRIBUTED ...", but this does not work. Probably the cell in questions does not exist yet, when the XDC file is evaluated.

I might mention that the clock clk_in, that drives the process, is the output of a BUFGMUX that is used to switch between two clocks of 250 MHz and 500 MHz. This could be the actual origin of the problem. If I remove the BUFGMUX and feed the process by either 250 MHz or 500 MHz, the problem vanishes.

Still I wonder what the Block RAM is used for and how it could be manipulated by the code.

 

Best regards,

Erik

0 Kudos
Highlighted
189 Views
Registered: ‎01-22-2015

Re: Timing fails for inferred Block RAM

Erik,

It is a local oscillator for n frequencies, i.e. a sin/cos calculations using a small lookup table and a Taylor approximations. The calculations are implemented by 6 DSP modules per channel which absorb most of the signals.

This sounds like one of those situations where we are smart enough to understand the VHDL but we may never be smart enough to understand what synthesis is doing.  Sometimes we must just trust the tools.  You could probably keep throwing pipeline registers at it and eventually the BRAM will absorb some of them and automatically pipeline itself,

but…

…the output of a BUFGMUX that is used to switch between two clocks of 250 MHz and 500 MHz. This could be the actual origin of the problem. If I remove the BUFGMUX and feed the process by either 250 MHz or 500 MHz, the problem vanishes.
Very nice test!  You have what are called logically exclusive clocks.  A timing constraint is usually needed to make timing analysis work properly for logically exclusive clocks.  UG903(v2019.2) on page 43 talks about the needed constraint but page 170 of UG949(v2019.2) has a better description.  As you’ll find, the needed constraint depends upon how/if the clocks interact.  However, if the clocks don’t interact (which is probably your situation) then the needed constraint looks something like the following.

set_clock_groups -logically_exclusive -group clk250 -group clk500

Based on what you said about “remove the BUFGMUX”, I suspect that a proper set_clock_groups constraint will solve your timing analysis problem.

Cheers,
Mark

Highlighted
Visitor
Visitor
148 Views
Registered: ‎07-02-2019

Re: Timing fails for inferred Block RAM

Mark,

good point, but I already have a "set_clock_groups" in the constrictions. Without it, the BUFGMUX does not work at all. With it, it works in general, except for the strange Block RAM thing.

I learn from your valuable comments, that there might be no straight solution for this problem. Moreover, I get the impression that 500 MHz is already a challenge for the Zynq Ultrascale, when it comes to complex signal processing. So for the moment I will abandon the idea of switching clocks at run time. It would have been a nice-to-have feature anyway.

Best regards,

Erik

 

0 Kudos