cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
ryanjohnson8
Explorer
Explorer
404 Views
Registered: ‎05-30-2017

BUFGCE CE Input path keeps failing

Jump to solution

Hi,

I keep getting a similar timing failure from build to build, where the CE input to a BUFGCE keeps failing. One example is in the attached picture. Look how close the driving register is to the destination BUFGCE. How is it failing when its placed so close?

timing_fail.png
schematic.png
path_report.png
0 Kudos
1 Solution

Accepted Solutions
avrumw
Guide
Guide
252 Views
Registered: ‎01-23-2009

When the MMCM is locking, the frequencies of the output clocks and, probably more importantly, the relative frequencies and phase differences between the different output clocks are not guaranteed to be stable.

The "normal" way of dealing with this is to keep your design in reset during the time that the MMCM is not locked - this is certainly the easiest mechanism if your design uses a global reset methodology.

If your design does not use a global reset methodology, then I guess you may have to consider something like this. But, as you see, this is not an easy mechanism to implement...

Avrum

View solution in original post

0 Kudos
8 Replies
373 Views
Registered: ‎01-22-2015

@ryanjohnson8 

Clock path skew is the problem.  Try routing things so the left BUFGCE feeds both the right BUFGCE and Q2_reg/C.

What I propose is also called balancing the clock distribution.  

Cheers,
Mark

joe306
Scholar
Scholar
355 Views
Registered: ‎12-07-2018

Hello, timing closure newbie here, who did you glean that from the images, which one should I focus on?  I'm trying to learn for other people's posts.

Thank you

Joe

0 Kudos
joe306
Scholar
Scholar
352 Views
Registered: ‎12-07-2018

Hi, if you solve this problem could you share your solution and if you are using Block Design, include an image. I'm trying to figure Timing Closure by reading the forums. 

Thanks

 

 

0 Kudos
avrumw
Guide
Guide
332 Views
Registered: ‎01-23-2009

@joe306 , this is a pretty esoteric timing problem...

As opposed to a simple clock buffer, a BUFGCE can gate the clock; when the CE is asserted, the clock is passed through, but when it is not asserted the clock is suppressed. You can do all kinds of things with this:

  • Clock gating to turn of a bunch of logic when it is not needed to save power
  • Clock gating to turn off a domain around reset (which is a whole other topic)
  • Generating a "divided" clock by enabling it periodically 
    • Although in UltraScale/UltraScale+ this is less useful due to the existence of the BUFGCE_DIV which can do the divide for you
  • (and more)

To do this deterministically, the BUFGCE has to gate the clock synchronously - if the CE is asserted, the "next" rising edge goes through, otherwise it doesn't. This means that there is the equivalent of a setup check done between the arrival time of the CE and the "next" rising clock edge (to determine if it goes through or not).

The problem here is the clock tree. For all other static timing paths, the source and destination cell are on the leaves of the clock tree; the clock tree fans out to all of these cells with low skew so that synchronous paths are possible. In a normal clock tree, the skew is less than 300ps or so. This means the setup path must be no more than Tck_period - 300ps (this is a simplification). But, in order to do this on a large die, there is significant delay through the clock network (called clock insertion delay). In this case, the clock insertion looks to be around 3ns.

But the setup check at the CE is between the CE (which comes from a flip-flop that is driven by the leaf of the ungated clock domain "unsafe_clock_buffer"), but the destination is the other BUFGCE ("safe_clock_buffer"). But as I mentioned above, the setup check is with respect to the clock at the input pin (.I) of the clock buffer. This clock buffer is at the root of the clock tree, not at a leaf of the clock tree. Therefore the source clock goes through a complete clock tree, but the destination clock does not. Since the delay through the clock tree is on the order of 3ns, this 3.33ns path cannot be met.

This isn't really a "timing closure" problem, but an architecture problem. 

In some systems, this arrangement will work. If the clock is slower then instead of having 3.33ns - 3ns for this path, at (for example) 200MHz it would have 5.00ns - 3ns, which would pass. 

Similarly if the clock tree is "shallower", the delay on the clock tree will be less than the 3ns it is here. In UltraScale/UltraScale+ the tool builds the clock tree to fan out to only those clock regions that need the clock. In all cases, the tool builds a clock with low skew, but if the clock is needed in only a small number of clock regions, there is less routing clock insertion delay, so a path like this might pass. If the clock is needed in many clock regions (or regions that are far away) the clock tree needs to be deeper (have more delay) in order to reach all points with low skew.

It is likely that neither of these are easily changed in this design - the clock needs to run at 300MHz, and the clock is needed throughout the design. So these aren't options.

What markg@prosensing.com is proposing is to drive the .I of "safe_clock_buffer" with the output of "unsafe_clock_buffer". In this case, the .I of "unsafe_clock_buffer" has an additional delay - the delay of "unsafe_clock_buffer". 

I don't know if this will work, or if it is recommended (it is an interesting approach through). In general, cascaded clock buffers are not a good idea - specifically because their delays cascade. If you do this, then the two clocks end up skewed with respect to eachother; a flip-flop on "unsafe_clock_buffer" goes through one BUFGCE, but a flip-flop on "safe_clock_buffer goes through two BUFGCEs. This can make it difficult or impossible to cross synchronously between these two domains. 

But, when you cascade BUGFCEs, it probably doesn't go through the clock tree - that connection (from BUFG to BUFG) takes a dedicated cascade path. This is both good and bad. This will minimize the skew (so it will be easier to cross between the domains), but doesn't accomplish the goal of delaying the input of "safe_clock_buffer" enough to meet the required timing. So, it will likely be "better" but maybe not good enough...

That being said, I don't know how to solve this. With a deep clock tree there ends up being an inherent maximum frequency that you can use a BUFGCE in synchronous mode.

Avrum

ryanjohnson8
Explorer
Explorer
271 Views
Registered: ‎05-30-2017

Thank you all for your time. @avrumw That explanation is extremely helpful and also made my head explode.

Maybe I should take a step back here... What I'm really trying to do here is implement the "safe clock startup" feature of the Clocking Wizard IP. All I did was build an equivalent circuit to what it builds when you enable that feature. The Clocking Wizard also has this same timing failure when I let it build the safe clock startup itself. Essentially this feature gates the output clock with a BUFGCE until "locked" goes high.

I have had no end to trouble with the Xilinx Clocking Wizard in my latest design. And it all revolves around this "safe clock startup" feature. The reason I'm using it in the first place is because their documentation says that it gates off the output clock until it is "stable". Well, what in the barnacles does that mean? Does that mean that if I don't turn it on, I'm going to run the risk of metastability in my design? I don't know, but there's no way I'm going to have some mysteriously "unstable" clock clock running through my design wreaking havoc and meta-stability! Of course I'm going to turn on that feature, right?

So now I feel like I'm in a corner and have to choose between an "unstable" clock (still don't know what that means...), or a clock with a timing failure to the CE clock buffer input. I'm in the corner and I don't know how to escape.

avrumw
Guide
Guide
253 Views
Registered: ‎01-23-2009

When the MMCM is locking, the frequencies of the output clocks and, probably more importantly, the relative frequencies and phase differences between the different output clocks are not guaranteed to be stable.

The "normal" way of dealing with this is to keep your design in reset during the time that the MMCM is not locked - this is certainly the easiest mechanism if your design uses a global reset methodology.

If your design does not use a global reset methodology, then I guess you may have to consider something like this. But, as you see, this is not an easy mechanism to implement...

Avrum

View solution in original post

0 Kudos
ryanjohnson8
Explorer
Explorer
243 Views
Registered: ‎05-30-2017

I see. Well I do have a global reset mechanism. So I'm going to back out of this path and keep things in reset until it has locked.

0 Kudos
joe306
Scholar
Scholar
187 Views
Registered: ‎12-07-2018

Wow! Thank you very much for the detailed response. I have so much to learn.

0 Kudos