cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
melkin
Explorer
Explorer
5,595 Views
Registered: ‎02-13-2012

set_max_delay on a control set signal?

We have a reset synchronizer with very high final fanout.  The synchronized reset signal typically ties to the R, or sometimes the P, input of the destination FFs.  Due of the high fanout and wide distribution (anywhere in the SLR) of the reset signal, the implementation directives allow replication of the synchronizer's final stages.  

 

Because there are many different clocks in the overall FPGA, I cannot dedicate a BUFG to distribute this synchronized reset.  So I let Vivado replicate as many final FF stages in the synchronizer as needed.  

 

I want to make sure the reset is deasserted to all the end-of-line FF's synchronously.  I thought I could use set_max_delay on the final distributed reset signal to instruct the tools to place each replicated final stage in proximity to the FF's it controls.

 

A couple figures below show a small sample of the circuit.

 

But it turns out that set_max_delay only accepts data path pins as destinations, not control set pins as destinations.  Quoting the online help for "set_max_delay":

... A valid endpoint is a primary output or inout port, or the data pin of a sequential element. If a clock is specified then all the primary output and inout ports related to that clock as well as all the data pins of the registers connected to that clock are used as endpoints.

 

Is there some other timing constraint and implementation directives that can be used to insure that the reset is deasserted synchronously at all (>100K) destination FFs?

 

FYI:  FPGA=VU440.  Vivado 2016.1.  And no, I cannot redesign the whole thing.  The synchronized is a common library element used in multiple FPGA designs.  I just am the lucky guy who is pushing it to new limits.

 

Here is the "big picture".  The common 250 MHz clock is distributed by a BUFG is on the left.  

The reset synchronizer is in the middle.  It is an 8 FF pipeline, with the P pins driven by the incoming async resets, the first D input grounded, the remaining D pins daisy chain.  The final stage is massively replicated (and trimmed down here for illustration clarity).  

One of those final stage outputs is then sent the application FFs.  (Again, trimmed the destination FFs down to a single one to de-clutter the illustration.)

reset-syncr-big-picture.jpg

 

Here's a close-up of the final stage reset and the destination.  The incoming replicated synchronized reset (fanout ~= 130) is OR-ed with a local reset in the LUT.  The LUT drives the R pins of the local FFs in this block.

This specific destination FF has a report of failing timing ending at the R input by ~-80 ps.

reset-syncr-end-FF.jpg

 

0 Kudos
4 Replies
avrumw
Guide
Guide
5,583 Views
Registered: ‎01-23-2009

the implementation directives allow replication of the synchronizer's final stages

 

This is a bit of an oxymoron. If the flip-flop is the final stage of the synchronizer then it cannot be replicated. If the flip-flop can be replicated then it isn't part of a synchronizer...

 

So lets make sure we are using the correct terminology...

 

A reset synchronizer is a set of back to back flip-flops - there are a number of ways of implementing them

   - using the asynchronous preset of the sync flops then clocking a 0 through (a reset bridge)

     - from your later description it seems this is what you are using

   - using a regular synchronizer chain (which introduces latency on the assertion as well as the assertion)

 

Regardless, all the flip-flops that form the synchronizer itself cannot be replicated. This is enforced in the tool by setting the ASYNC_REG property on all the flip-flops in the chain - among other things, the ASYNC_REG prevents replication.

 

If you wish to do replication on this (now synchronized) reset signal, you need an extra flip-flop on the output. This is a pure pipeline flip-flop, and is not part of the synchronization chain (and hence does not have the ASYNC_REG property set).

 

So, if your MTBF dictates that you need 3 sync flip-flops, then you would have to have 3 with the ASYNC_REG property set, and one more without that allows replication (but, again, this is not part of the "synchronizer", but is a pure pipeline delay on the synchronized reset).

 

So to be clear, since you have 8 back to back FFs, 7 of them are the synchronizer (with ASYNC_REG set), and the 8th one is a pure pipeline flip-flop (and, yes, it is acceptable to have this also use the asynchronous preset input for reset assertion, but you might want to consider changing it to a pure pipeline register if you can tolerate the one clock of latency on reset assertion).

 

I want to make sure the reset is deasserted to all the end-of-line FF's synchronously.  I thought I could use set_max_delay on the final distributed reset signal to instruct the tools to place each replicated final stage in proximity to the FF's it controls.

 

An additional constraint is unnecessary . The reset net is driven by a flip-flop and drives either the S or P input of a number of other flip-flops. From Vivado's point of view, these are normal static timing paths; they start at a clocked element and end at a clocked element. By default, they will end up with a requirement of less than one period of the clock. No "set_max_delay" is required.

 

But it turns out that set_max_delay only accepts data path pins as destinations, not control set pins as destinations.

 

The nomenclature "data pin" here is confusing (and a little inconsistent). From the point of view of static timing, there is no such thing as a "control pin" - that's a placement/routing concept; the Q pin is an output, the C pin is a "clock pin" and all the other inputs are "data pins". Therefore, the CE and R/S/C/P are all "data pins" of the flip-flops, and hence are valid static timing path endpoints.

 

Is there some other timing constraint and implementation directives that can be used to insure that the reset is deasserted synchronously at all (>100K) destination FFs?

 

Again, no special constraint is needed. The deassertion of reset is always timed in Vivado. This is different than it was in ISE (at least in some technologies). In ISE (AND I NEED TO STRESS THIS IS ISE NOT VIVADO) the rules were somewhat messed up. The R and S pins of a flip-flop are synchronous pins, and hence are always timed (covered by the PERIOD constraint). The C and P pins of a flip-flop are asynchronous pins, and therefore they may or may not be timed.

 

The timing of the C and P pins is called a "reset recovery check", and IN ISE it was configurable as to whether the reset recovery check was enabled. This was controlled by the UCF constraint

 

ENABLE = "reg_sr_r"

or

DISABLE = "reg_sr_r"

 

To make things worse, the default value of this thing was traditionally incorrect - the default as "DISABLE" (which is a HUGE problem for anyone using asynchronous preset/clear). Xilinx eventually realized this and changed the default to "ENABLE", but the decision as to whether it is enabled or disabled is determined by device family - for Virtex-5 and Spartan-3 and earlier, the default is DISABLE, for Virtex-6 and Spartan-6 and later, the default is ENABLE.

 

(So, for anyone using Spartan-3, or Virtex-5 or earlier, who is using asynchronous preset/clears, you MUST MUST MUST put the

ENABLE = "reg_sr_r"

in your UCF, otherwise you will get occasional reset deassertion failures)

 

BUT - none of this applies in Vivado. In Vivado the reset recovery arc is always enabled (and cannot be changed).

 

This specific destination FF has a report of failing timing ending at the R input by ~-80 ps.

 

So, by this alone, you know that no additional constraints are required. If the tools say that the path fails timing, then it knows that it has a timing requirement to meet. The fact that it is failing timing therefore is not a problem with constraints, but simply a problem with placement/routing/replication.

 

Be sure that you have enabled phys_opt_design (ideally after both placement and routing) - it is this stage that will perform additional replication and placement optimizations of the replicated reset flip-flop.

 

Avrum

melkin
Explorer
Explorer
5,566 Views
Registered: ‎02-13-2012

Thanks for reply and details Avrum.

 

Starting with the last first.  Yes, the reported timing failure mentioned is after phys_opt_design, which was run after both place and route.  In fact, it was phys_opt_design that replicated the final stage of the "reset bridge".  

 

Now, back to the beginning...

 

I did not design the reset circuit.  It is something our RTL lead designer created years ago and continues to reuse in each new project.  Because it has worked so well for him in the past, he doesn't want it modified.  

 

I called it a synchronizer because he called it a synchronizer.  But apparently it is really is a reset bridge built with FDPE primitives.  He explicitly used the FDPE primitive and fixed the length to 8 stages.  He apparently depends on the asynchronous nature of P handle the asynchronous assert reset and the Q->D clocking in the destination clock domain to synchronously negate the "reset_out" signal.  

 

Since none of the FDPE's in it have the ASYNC_REG property.  The tool is free to replicate the final stage.  I do see a Synopsis synthesis ATTRIBUTE in the VHDL on the output of the block called "syn_maxfan".  Undoubtedly, that limits the fanout to 100.  But since we have switched to doing synthesis in Vivado, that attribute is ignored.  I can add "MAX_FANOUT" using set_property afterward.

 

I was getting an XDC error message when I tried to apply set_max_delay that basically said that the R or P pin of the FF was not a valid endpoint.  But I knew the D pin is a valid endpoint.  And based on the wording of the set_max_delay discussion of endpoints, I reached a conclusion that R or P could not be included in a set_max_delay constraint.  Glad to know it does not need to be explicitly declared.  So I will remove those from my XDC and give it a whirl.

 

BTW, I actually do have a spare BUFG.  I am going to see which of the resets needs it the most and try that as well.

Regards,

Mark

 

0 Kudos
avrumw
Guide
Guide
5,556 Views
Registered: ‎01-23-2009

I called it a synchronizer because he called it a synchronizer.  But apparently it is really is a reset bridge built with FDPE primitives.  He explicitly used the FDPE primitive and fixed the length to 8 stages.  He apparently depends on the asynchronous nature of P handle the asynchronous assert reset and the Q->D clocking in the destination clock domain to synchronously negate the "reset_out" signal.

 

This is a reset bridge, which is a specific type of synchronizer for resets. It is has the advantage over a typical N-stage synchronizer in that the asserting condition propagates through the chain instantaneously - it has no latency (other than combinatorial delay) and it will operate even if the clock is not running (which is one of the main reasons for using an asynchronous preset/clear methodology). However the deasserting edge goes through the chain in N clock, and hence comes out synchronous to the clock and metastability reduced (it needs to be both).

 

Many people use reset bridges even for synchronous reset methodologies - there is no harm in doing so...

 

Since none of the FDPE's in it have the ASYNC_REG property.

 

This is a not a good idea. In order for any synchronizer to be correct, there are a number of conditions that must be met by the synthesis and place and route tools. While there are other ways of doing it, the best way is to set the ASYNC_REG property on all flip-flops in the synchronizer chain (be it a conventional synchronizer or a reset bridge). This should not be viewed as "optional". In your case, since you want it to replicate the last stage, you can leave the ASYNC_REG property off the last one - but you need to realize that this means that it is a seven stage synchronizer with a flip-flop after it (and not an 8 stage synchronizer). But this should be fine - even at the fastest speeds 7 stages is more than enough.

 

 I can add "MAX_FANOUT" using set_property afterward.

 

Controlling the maximum fanout rarely does any good. Both synthesis and phys_opt_design are timing driven, and will manage fanout as required to meet timing; if a path is failing timing due to some net having too much fanout, then the tools will replicate that net and reduce fanout as needed. Artificially setting a maximum fanout will not be "better" than that, and will waste resources for no reason.

 

I was getting an XDC error message when I tried to apply set_max_delay that basically said that the R or P pin of the FF was not a valid endpoint.

 

They definitely are - I just tried it to be certain. It's possible you simply made a mistake on the name; the pin on the FDPE is called PRE, not P (the FDRE is R - gotta love the symmetry...)

 

I actually do have a spare BUFG

 

Using a BUFG will probably work, but it is kind of like hitting the problem with a sledge hammer...

 

Reset methodology is very tricky. If you think about it, a full (i.e. global) reset methodology takes TONS of routing resources - every flip-flop in the design is connected to the reset network, and, usually, the source of the network (at least for a single clock domain) is ultimately one flip-flop. This has a terrible effect on both routing (used for routing the reset to all the FFs), but also placement, since the tool is constantly struggling to meet the static timing requirements on tons of paths that all go through this one logical net - this leads the placer to cluster logic around this one flip-flop, which then generally draws all logic toward the middle of the die (which, in turn, increases congestion and generally messes up timing on "normal" paths).

 

So, the best solution is to eliminate as many flip-flops from the reset tree as possible. In most systems, only a very few flip-flops really need to be reset after FPGA power on - remember, all flip-flops are initialized in an FPGA on power on. So, if you can identify those that absolutely must come out of reset synchronously and only reset those, you can remove tons of routing and make placement much easier. BUT, correctly identifying those FFs that need reset from those that don't can be tricky, and if you get it wrong, you end up with an unreliable system...

 

One approach that I use often is to pipeline the reset signal once more as it enters each "major" block (I leave it up to you to define "major"). Often the latency of reset is irrelevant - if all the FFs come out of reset one clock later, it almost never matters. It often does matter that they come out on the same clock, but if that clock is delayed by one, it doesn't matter. Doing it this way, you break the reset tree into N sub-trees - one for each major block. Not only does this reduce the difficulty in getting timing to be met on this huge network, it actually helps placement - your "major" blocks tend to cluster around their "local" reset, which is a good thing...

 

While, in theory, this shouldn't be different than allowing the tools to replicate your "extra" flop, I find that, in practice, doing it manually makes a huge difference...

 

But (as with everything) your mileage may vary...

 

Avrum

Tags (1)
melkin
Explorer
Explorer
5,475 Views
Registered: ‎02-13-2012

Just an update.  The BUFG did not significantly help.  So I went back to figuring out why some of the reset signals had 14K+ fanouts.  Finally found that there were >100K "DONT_TOUCH" settings in the linked design.  Many were due to replication of a basic building block here.  

 

After removing all but the most critical or unremovable DONT_TOUCH instances, The latest implementation run was orders of magnitude better for timing (both in terms of TNS and #FailingNets).  Still have a few to solve, but the tool is now taking care of >99% of the prior problems.

0 Kudos