cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Sarg_Nagel
Visitor
Visitor
713 Views
Registered: ‎10-16-2020

Discription of "never use a logic generated clock"

Jump to solution

Hi,

I am searching for discription where the rule for FPGA designers "never use a logic generated clock" is derived from. I've searched throught user guides, white papers and google, but couldn't find any related explanation. 

Can some please pinpoint me to a document where it is discribed, for example a XILINX user guide or so.

regards

 

0 Kudos
1 Solution

Accepted Solutions
avrumw
Guide
Guide
663 Views
Registered: ‎01-23-2009

First, let's describe "logic generated". Even in this category we can break it down to "generated by a flip-flop" and "generated by combinatorial logic".

The later is a definite no-no. At no point anywhere in Xilinx documentation will it promise that the output of a LUT is glitch free - regardless of what function it performs. So if the "clock" comes from a LUT it may have glitches on it, and glitches on a clock are fatal. So that one is easy.

As for a clock that is coming from a flip-flop, this is more subtle.

Even for this, there are two cases: The clock goes directly to flip-flops (without a clock buffer - i.e. is a locally routed clock) or the clock goes to a clock buffer before being fanned out to flip-flops. 

Using locally routed clocks is possible, but is highly discouraged - the local routing of a clock will introduce clock skew - the larger the domain (the more flip-flops clocked by the clock) the more skew there will be, which means there will be hold time failures to fix. With very small domains (a handful of flip-flops) this can be managed, but with anything larger it will likely fail timing.

So that leaves a clock that is generated by a flip-flop and is routed to a clock buffer before being used by flip-flops.

For this clock, there are a couple of issues, some of which may be acceptable in some designs. 

The first issue is the clock jitter. The clock is generated from general purpose digital logic and is (at least initially) routed through general fabric routing resources. The power grid to the fabric has noise on it from all the digital logic switching, which gets coupled into your generated clock causing jitter. The local routing will make this worse, as the clock signal is routed beside other digital signals and through switch fabrics that are running on the same noisy power domain. Furthermore, there is no way of measuring or even estimating the magnitude of this jitter, so you don't know what to specify for the tools for static timing analysis. Conversely, the dedicated clock logic in the FPGA is isolated from these noise sources to varying degrees, and the jitter propagation from the clock inputs is characterized by Xilinx (so specifying an input jitter gives the tool what it needs to know).

But if you can tolerate this jitter, then the clock itself is a viable clock.

The other big problem with it is that you cannot know the phase difference between this clock and the clock that generated it (the clock that drove the flip-flop that generated the clock), or in fact any other clock.

The base clock (the one driving the flip-flop), or, in fact any "normal" clock, come through dedicated clocking resources (clock capable input, possibly an MMCM/PLL and a dedicated clock buffer) - the timing characteristics of these resources are fixed and known by the tool. Furthermore when you use the MMCM much of the process/voltage/temperature (PVT) variation of these delays are cancelled by the MMCM. Coupled with the fact that the dedicated clock networks are skew balanced, this means that the arrival time of the clock at the actual flip-flops of the fabric is within a known range with respect to the arrival time of the clock at the input pin of the device.  This allows you to be able to use this clock as the capture clock for synchronous inputs coming into the FPGA and generate system synchronous outputs from the FPGA with this clock. Also, since all clocks that use dedicated logic have similarly known behavior this allows you to cross between multiple related clocks without asynchronous clock domain crossing circuits.

For a clock generated by the fabric, none of this is true. The clock pin of the flip-flop generating your derived clock has a known phase (it comes from a dedicated clock network). But from there, we start accumulating delays that are relatively large, not all predictable, and not PVT compensated:

  • The clock to output time of the flip flop is predictable but not PVT controlled
  • The local routing from the flip-flop to the clock buffer can vary from run to run (so is unpredictable), is potentially quite large and is not PVT controlled
  • The delay through the clock buffer and the clock network is predictable, but large and not PVT controlled
    • Although the network itself is balanced, so all flip-flops on the domain will get the clock with acceptable skew with respect to other flip-flops on the domain, so synchronous logic on the domain will function normally

As a result, the clock at the clock pin of any flip-flop on this domain has an unpredictable, large and PVT variable delay with respect to the base clock. This means that it is virtually impossible to 

  • use this clock as the clock for input capture or system synchronous output generation
  • cross synchronously between this clock and any other clock

If you can tolerate these two limitations (as well as the jitter above) then this clock is useable.

But in many cases, these restrictions make the clock somewhat useless. Furthermore the subtlety of these issues are easy to get wrong - particularly if you don't have a lot of experience with this kind of stuff. As a result, all this is generally simplified by "don't use fabric generated clocks".

Avrum

View solution in original post

8 Replies
joancab
Mentor
Mentor
707 Views
Registered: ‎05-11-2015

Experience write that down in your genes, that's probably why I cannot include a link.

0 Kudos
Sarg_Nagel
Visitor
Visitor
703 Views
Registered: ‎10-16-2020

I thought so, but I am now writing my thesis and need a proper explanation and source for this.

avrumw
Guide
Guide
664 Views
Registered: ‎01-23-2009

First, let's describe "logic generated". Even in this category we can break it down to "generated by a flip-flop" and "generated by combinatorial logic".

The later is a definite no-no. At no point anywhere in Xilinx documentation will it promise that the output of a LUT is glitch free - regardless of what function it performs. So if the "clock" comes from a LUT it may have glitches on it, and glitches on a clock are fatal. So that one is easy.

As for a clock that is coming from a flip-flop, this is more subtle.

Even for this, there are two cases: The clock goes directly to flip-flops (without a clock buffer - i.e. is a locally routed clock) or the clock goes to a clock buffer before being fanned out to flip-flops. 

Using locally routed clocks is possible, but is highly discouraged - the local routing of a clock will introduce clock skew - the larger the domain (the more flip-flops clocked by the clock) the more skew there will be, which means there will be hold time failures to fix. With very small domains (a handful of flip-flops) this can be managed, but with anything larger it will likely fail timing.

So that leaves a clock that is generated by a flip-flop and is routed to a clock buffer before being used by flip-flops.

For this clock, there are a couple of issues, some of which may be acceptable in some designs. 

The first issue is the clock jitter. The clock is generated from general purpose digital logic and is (at least initially) routed through general fabric routing resources. The power grid to the fabric has noise on it from all the digital logic switching, which gets coupled into your generated clock causing jitter. The local routing will make this worse, as the clock signal is routed beside other digital signals and through switch fabrics that are running on the same noisy power domain. Furthermore, there is no way of measuring or even estimating the magnitude of this jitter, so you don't know what to specify for the tools for static timing analysis. Conversely, the dedicated clock logic in the FPGA is isolated from these noise sources to varying degrees, and the jitter propagation from the clock inputs is characterized by Xilinx (so specifying an input jitter gives the tool what it needs to know).

But if you can tolerate this jitter, then the clock itself is a viable clock.

The other big problem with it is that you cannot know the phase difference between this clock and the clock that generated it (the clock that drove the flip-flop that generated the clock), or in fact any other clock.

The base clock (the one driving the flip-flop), or, in fact any "normal" clock, come through dedicated clocking resources (clock capable input, possibly an MMCM/PLL and a dedicated clock buffer) - the timing characteristics of these resources are fixed and known by the tool. Furthermore when you use the MMCM much of the process/voltage/temperature (PVT) variation of these delays are cancelled by the MMCM. Coupled with the fact that the dedicated clock networks are skew balanced, this means that the arrival time of the clock at the actual flip-flops of the fabric is within a known range with respect to the arrival time of the clock at the input pin of the device.  This allows you to be able to use this clock as the capture clock for synchronous inputs coming into the FPGA and generate system synchronous outputs from the FPGA with this clock. Also, since all clocks that use dedicated logic have similarly known behavior this allows you to cross between multiple related clocks without asynchronous clock domain crossing circuits.

For a clock generated by the fabric, none of this is true. The clock pin of the flip-flop generating your derived clock has a known phase (it comes from a dedicated clock network). But from there, we start accumulating delays that are relatively large, not all predictable, and not PVT compensated:

  • The clock to output time of the flip flop is predictable but not PVT controlled
  • The local routing from the flip-flop to the clock buffer can vary from run to run (so is unpredictable), is potentially quite large and is not PVT controlled
  • The delay through the clock buffer and the clock network is predictable, but large and not PVT controlled
    • Although the network itself is balanced, so all flip-flops on the domain will get the clock with acceptable skew with respect to other flip-flops on the domain, so synchronous logic on the domain will function normally

As a result, the clock at the clock pin of any flip-flop on this domain has an unpredictable, large and PVT variable delay with respect to the base clock. This means that it is virtually impossible to 

  • use this clock as the clock for input capture or system synchronous output generation
  • cross synchronously between this clock and any other clock

If you can tolerate these two limitations (as well as the jitter above) then this clock is useable.

But in many cases, these restrictions make the clock somewhat useless. Furthermore the subtlety of these issues are easy to get wrong - particularly if you don't have a lot of experience with this kind of stuff. As a result, all this is generally simplified by "don't use fabric generated clocks".

Avrum

View solution in original post

joancab
Mentor
Mentor
653 Views
Registered: ‎05-11-2015

Understandable. I'd suggest googling for "FPGA good practices", things like that. Xilinx has the UltraFast Design Methodology Guide ug1046 but not sure if it goes that much down.

0 Kudos
Sarg_Nagel
Visitor
Visitor
627 Views
Registered: ‎10-16-2020

Thank you for the hint, but I can't find any related in this guide.

0 Kudos
joancab
Mentor
Mentor
620 Views
Registered: ‎05-11-2015

And it's not serious that you write "because all those folks have it in their genes after a Darwinian evolutionary process" in your Thesis...

0 Kudos
maps-mpls
Mentor
Mentor
598 Views
Registered: ‎06-20-2017

>I thought so, but I am now writing my thesis and need a proper explanation and source for this.

@avrumw gave a great explanation.  You should also recognize that this is a rule of thumb based on current FPGA technology.  It could change in the future (e.g., in a hypothetical future FPGA architecture where PVT variation from a FF.Q through a BUFG is as tight or better than it is from an MMCM.Ox through a BUFG).  Rules of thumb are generally tied to best practices for the state of the art, and it is important to note not only the rule of thumb ("It is generally a bad idea to have Q to CK paths in an FPGA design") with the why ("min/max variations due to PVT effects on timing parameters reduce your timing budget and may make meeting your timing requirements more difficult than necessary, and at high frequencies, impossible"). 

You might consider, too, since you're working on an academic paper, that some academics who are experts might not even realize what @avrumw and others have shared, or that as a practical matter it won't be an academic topic of research or even be deemed worth a mention in their publications.  Some practical issues merely need to be stated and backed by a quick explanation.  You might have a hard time finding a citation to a paper that says it is a bad idea to submerge a powered Xilinx devCard in salt water, but a little thinking a quick explanation should be sufficient for your thesis paper, unless your thesis is best practices for FPGA clocking.  And if that is your thesis, you can run some experiments and include a table illustrating everything that @avrumw predicted, even quantifying the adverse affect at different frequencies at different levels of congestion.

*** Destination: Rapid design and development cycles ***
0 Kudos
Sarg_Nagel
Visitor
Visitor
541 Views
Registered: ‎10-16-2020

@avrumwThank you for your big and detailed answer!

0 Kudos