UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 

Demystifying Resets: Synchronous, Asynchronous and other Design Considerations... Part 2

Xilinx Employee
Xilinx Employee
1 0 7,353

In my previous blog post we discussed why proper planning is needed is for resets. Let us continue the discussion. In this blog we will examine techniques for combining multiple resets, sequencing resets across hierarchies and clock domains, the different types of flops available in Xilinx FPGAs and finally we will look at a few tips and tricks for handling resets in Vivado.

 

Combining Multiple Resets

Sometimes it is necessary to combine resets especially when there are multiple resets that could be active at the same time or at different times. For example, a state machine that needs to go back to the default state if any one reset from say a software reset or the system reset or the power-on reset is active. It is possible that these resets are synchronous or asynchronous. The first recommendation would be to synchronize the resets if they asynchronous. Please refer to Part 1 of this blog to learn about synchronizing asynchronous resets.  Remember to synchronize all asynchronous resets in the destination clock domain of interest first. The second recommendation would be  drive the reset to the destination from a flip-flop instead of a LUT. Driving reset from a LUT may result in glitches when one of the reset is active so it is possible that the design may not work reliably. Driving the reset from a flip-flop allows the tool to replicate the register if timing is critical or if there is a huge fanout.

Combining_resets_bad.png

Fig.1: Bad way to combine resets as the destination is driven from a LUT outout

 

Combining_resets_good.png

Fig. 2: The recommended way to combine resets

 

Fig. 1 shows three resets being combined and driving the reset of a register from a LUT. Fig.2 shows the recommended method to combine resets. Once again, if the three resets are asynchronous resets, then synchronize the resets first, combine them, register the combined reset before hooking the combined reset to the rest of the design.

 

Reset Sequencing and Clock Domain Crossing

At the planning stage, a strategy about how you want your system (multiple chips) or your design (all the hierarchies within the FPGA) to behave when the reset is active must be clearly spelled out. Do you want all the chips (or all the design hierarchies in the FPGA) going into the reset state and coming out of the reset state simultaneously? Or do the multiple chips (or design hierarchies in the FPGA) require some kind of reset sequencing? Depending on what the system architecture demands are, the reset sequencing needs to be designed accordingly.

Assume that there is one system wide reset that is asynchronously asserted. Let us see how to handle the reset in the two scenarios above:

  • All the design hierarchies are reset simultaneously
    In this case assume that in your design, you have three clock domains a 100 MHz core clock domain in Block A, a 166 MHz system clock domain in Block B and a 66 MHz PCI clock domain Block C. A single asynchronous reset, areset, that  resets the three clock domains simultaneously. Fig. 3. shows the block diagram, Fig. 4 shows the schematic of the reset architecture. At the top level since areset is asynchronous it is synchronized in the each of the three clock domains. Once it has been synchronized in each of the clock domains it simultaneously drives the reset in that hierarchy.  At the hierarchical level, the synchronized reset signal should be registered one or more times if the reset will have a huge fanout. Adding the register(s) will help the placer to replicate the register in order to meet timing. At the top level the asynchronous reset, areset, is constrained as a false path. At the hierarchical level, since the reset is synchronous no timing constraints are needed.

Block_Dia_for_Reset_Simultaneous_Fig_4.png

Fig. 3: Block Diagram for an asynchronous reset driving the entire design simultaneously

 

Schematic_for_Reset_Simultaneous_Fig_6.png

Fig. 4: Schematic for an asynchronous reset driving the entire design simultaneously

 

 

  • The reset needs to be sequenced based on clock domains 

In this case, let us assume that the 100 MHz core clock in Block A needs to come out of reset first, followed by the 166 MHz system clock in Block B and finally the 66 MHz PCI clock in Block C. The block diagram is shown in Fig. 5 and the schematic is shown in Fig. 6.

Block_Dia_for_Reset_Sequencing_Fig_6.png

Fig. 5: Block Diagram for reset sequencing

Schematic_for_Reset_Sequencing_Fig_8.png

Fig. 6: Schematic for reset sequencing

 

Notice that the top level, the asynchronous reset signal drives only Block A but all the blocks will be at the reset state simultaneously. But the blocks will be come out of reset sequentially. First Block A comes out of reset (the number of clock cycles determined by the synchronizer chain), followed by Block B and finally block C. Depending on the fanout of the reset signal within each blocks add registers to the synchronizer chains for easy replication. Within each clock domain, once the asynchronous reset has been synchronized, the reset has to be a synchronous style reset. The asynchronous reset input port, areset, can be a false path constraint. Timing within each clock domain will be analyzed as it is synchronous.

 

The types of Flop-flops available in Xilinx FPGAs

In Xilinx FPGAs, there are primarily types of Flop-flops that are available based on the control set which is made up of the clock, the data, the enable and a set/reset/preset or clear:

  1. FDCE: This is flip-flop has 4-pins, the Clock pin, the D-pin, the Enable pin and an asynchronous clear pin. Use this flop for asynchronous reset synchronization that are active low.
  2. FDRE: This is flip-flop has 4-pins, the Clock pin, the D-pin, the Enable pin and an synchronous reset pin. Use this flop for synchronous resets.
  3. FDSE: This is flip-flop has 4-pins, the Clock pin, the D-pin, the Enable pin and an synchronous set pin. Use this flop for synchronous sets
  4. FDPE: This is flip-flop has 4-pins, the Clock pin, the D-pin, the Enable pin and an asynchronous preset pin. Use this flop for asynchronous reset synchronization that are active high.

Handling Resets in Vivado

Control Sets: A Primer 

Disclaimer: This blog is not intended to be a comprehensive guide to control sets. Please refer to UG949: The Ultrafast Design Methodology Guide for more details. The aim of this blog is introduce a few important concepts.

In FPGA's control sets are the clocks, resets, enables and sets (presets). For two flops to be placed within the same slice the control sets have to match. If the control sets don't match, then the placer will have spread the logic over the FPGA. Spreading the logic might result in timing closure challenges. It some cases a high number of control sets might result in Global, Long or Short congestion which might also result in timing closure challenges.

While designing FPGAs, it is recommended to limit the total number of control sets in the design. In Xilinx devices the acceptable number of control sets is between 7.5-15% of the total number of available slices in the device. Please refer to UG949: The Ultrafast Design Methodology guide for more information and your specific Device User Guide to find out the total number of available slices. In this blog we will only focus on resets.

 

Like I have already discussed in the previous blog, resets and clock enables tend to be the two control signals(apart from clocks, of course) that dominate designs which are most likely culprits for high-fanout nets and possibly congestion. If there are too many high-fanout nets in the design (greater than 15% of the available slices in the device)  they tend to cause global congestion which can manifest as a timing closure problem. One way to mitigate global congestion is promote non-timing critical high-fanout nets to global clocks. Promoting a high-fanout net like a reset (resets are typically active for many clock periods and aren't timing critical in general) to a global clock (insert a BUFG) not only frees up routing resources for the timing critical signals in your design but also reduces congestion. The recommendation would be promote non-timing nets that have a fanout greater than 25k to a global clock. You can promote a non-timing critical high-fanout signal to a global using the following XDC constraint:

set_property CLOCK_BUFFER_TYPE BUFG [get_nets netName]

 

Timing critical high-fanout nets can be replicated with a KEEP attribute to aid in timing closure. It might be necessary to replicate the high-fanout net in each SLR.

Another trick to manage reduce control sets in the designs would be to merge reset/clock enable to the datapath. Suppose you had two flops one with a reset and another without reset and both clocks are on the same clock domain. Since the control sets don't match, the two flops cannot be placed in the same slice. By merging the reset (and clock enables) to the datapath, the control sets would match enabling the placement of flops in the same slice. You can add RTL attributes as shown in Fig. 9 below:

RTL_Attribute_fig_9.png

Fig. 9: RTL Attribute to control whether reset is hooked to the 'R' or 'CLR' pin or merged with the datapath

 

You can also merge the reset (or enable) to the datapath by using the XDC constraint as shown below:

set_property extract_reset "no" [get_cells top/synced_reset]

 

Handling High-fanout Nets

Earlier in the blog, I mentioned that replication will be needed depending on the fanout of the net - the more the fanout, the more critical the timing, the more the need to replicate the driver of the high fanout net. Having said that, in the synthesis stage, aggressive replication by the user can be counter-productive. This is because, the placement, routing and timing is unknown so aggressive replication will in all likelihood hurt more than it helps. The recommendation is to avoid MAX_FANOUT constraint in the synthesis stage on a global level. It is perfectly fine to have a few really targeted MAX_FANOUT constraints in a few blocks with a value of 512/1024 if absolutely necessary and only on nets that are expected to have a high fanout. In the opt_design stage, again since placement, routing and timing is not known, limited tool driven replication is recommended. In the placer, since we know exactly where the cells/hierarchies are placed and the timing is little realistic, tool driven mid-grained fanout optimization is recommended. After the router has finished the timing is accurate so in this stage you want tool driven fine grained replication based on where the driver is placed and where the destination register is placed and what the routing topology for the net is.

 

Fanout Optimization in Placer 

In Vivado version 2018.1, we introduced a new feature in the placer that automatically manages high-fanout nets. The new and improved algorithm automatically inserts BUFGs during global placement on non-timing critical nets based on resource (a BUFG) availability. The placer also replicates high fanout nets and control signals to DSPs/BRAMs. The placer replication is based on the placement and the distance that net has to drive. The main advantage is for the user as no guesswork is needed at the RTL/Synthesis stages to figure out which high fanout needs to be manually replicated. The control set utilization will be optimal resulting in fewer designs being congested. The new algorithm addresses timing critical high fanout nets like resets early in the placement stage and reduces the need for post-place or post-route physical optimization iterations.

Fig. 10 below shows one example of a really high-fanout net that was automatically replicated by the placer. The immediate benefit can be seen by the nearly 0.500 ns WNS improvement:

 

Fanout_Opt.JPG

Fig. 10: High-fanout net automatic replication in the placer

 

Conclusion

 In part 1 of this blog I posted that resets cannot and shouldn't be ad-hoc. It needs to be planned very carefully early in the design phase. We also talked about the impact of resets to power, the choice of active-high and active-low resets and whether to use asynchronous or synchronous resets. Here is a summary of recommendations for resets:

  1. Plan the reset architecture early in the design phase
  2. At the board or design or hierarchical level decide the different type and number of resets (power-on reset, system reset, software reset etc.) that will be active
  3. Are the resets coming in Asynchronous or synchronous?
  4. If the resets are asynchronous ensure that they are synchronized in each clock domin
  5. Ensure that at the hierarchical level, only synchronous reset is used
  6. Establish a guideline for one type of reset for the entire design - either active-high (recommended) or active-low
  7. Establish guidelines for the min/max number of clock cycles the reset needs to be active
  8. Determine the minimum number of elements that the reset will drive (state-machine registers, pipeline registers, datapath etc.) in every hierarchy of the design and the initial state of those registers (should the flop be reset or set). 
  9. Ensure that the details from 8. above are documented in the micro-architecture specification for each hierarchy/chip/board/system.
  10. Avoid resets to big blocks like DSPs, URAMs, BRAMs and LUTRAMs, if possible
  11. Avoid aggressive MAX_FANOUT constraints during the synthesis stage
  12. MAX_FANOUT constraints with a value of 512/1024 on targeted high-fanout nets in a particular hierarchy is fine
  13. When combining resets ensure that the resets being merged are synchronized in the destination clock domain, merged and then registered before driving the destination registers.
  14. Always drive reset from a flop
  15. At the planning stage determine if all chips or all hierarchies within a design can be reset simultaneously or needs to be sequenced based on specific clock domains
  16. If necessary replicate resets with KEEP attribute and in each SLR
  17. ..and finally use Vivado tool features like replication and auto-promotion of high-fanout nets to a global clock throughout the flow (synthesis, opt_design, place_design, route_design and phys_opt_design)

 

I am certain that there are many other subtleties and 'gotchas' that haven't been discussed here. Reset is a Ph.D. thesis in itself in my opinion. If you have any further insights, please feel to share it with the community by commenting on this post (the comments are moderated by me but rest assured if the comment is relevant to the topic you will see your comments).