In my previous blog post we discussed why proper planning is needed is for resets. Let us continue the discussion. In this blog we will examine techniques for combining multiple resets, sequencing resets across hierarchies and clock domains, the different types of flops available in Xilinx FPGAs and finally we will look at a few tips and tricks for handling resets in Vivado.
Combining Multiple Resets
Sometimes it is necessary to combine resets especially when there are multiple resets that could be active at the same time or at different times. For example, a state machine that needs to go back to the default state if any one reset from say a software reset or the system reset or the power-on reset is active. It is possible that these resets are synchronous or asynchronous. The first recommendation would be to synchronize the resets if they asynchronous. Please refer to Part 1 of this blog to learn about synchronizing asynchronous resets. Remember to synchronize all asynchronous resets in the destination clock domain of interest first. The second recommendation would be drive the reset to the destination from a flip-flop instead of a LUT. Driving reset from a LUT may result in glitches when one of the reset is active so it is possible that the design may not work reliably. Driving the reset from a flip-flop allows the tool to replicate the register if timing is critical or if there is a huge fanout.
Fig.1: Bad way to combine resets as the destination is driven from a LUT outout
Fig. 2: The recommended way to combine resets
Fig. 1 shows three resets being combined and driving the reset of a register from a LUT. Fig.2 shows the recommended method to combine resets. Once again, if the three resets are asynchronous resets, then synchronize the resets first, combine them, register the combined reset before hooking the combined reset to the rest of the design.
Reset Sequencing and Clock Domain Crossing
At the planning stage, a strategy about how you want your system (multiple chips) or your design (all the hierarchies within the FPGA) to behave when the reset is active must be clearly spelled out. Do you want all the chips (or all the design hierarchies in the FPGA) going into the reset state and coming out of the reset state simultaneously? Or do the multiple chips (or design hierarchies in the FPGA) require some kind of reset sequencing? Depending on what the system architecture demands are, the reset sequencing needs to be designed accordingly.
Assume that there is one system wide reset that is asynchronously asserted. Let us see how to handle the reset in the two scenarios above:
Fig. 3: Block Diagram for an asynchronous reset driving the entire design simultaneously
Fig. 4: Schematic for an asynchronous reset driving the entire design simultaneously
In this case, let us assume that the 100 MHz core clock in Block A needs to come out of reset first, followed by the 166 MHz system clock in Block B and finally the 66 MHz PCI clock in Block C. The block diagram is shown in Fig. 5 and the schematic is shown in Fig. 6.
Fig. 5: Block Diagram for reset sequencing
Fig. 6: Schematic for reset sequencing
Notice that the top level, the asynchronous reset signal drives only Block A but all the blocks will be at the reset state simultaneously. But the blocks will be come out of reset sequentially. First Block A comes out of reset (the number of clock cycles determined by the synchronizer chain), followed by Block B and finally block C. Depending on the fanout of the reset signal within each blocks add registers to the synchronizer chains for easy replication. Within each clock domain, once the asynchronous reset has been synchronized, the reset has to be a synchronous style reset. The asynchronous reset input port, areset, can be a false path constraint. Timing within each clock domain will be analyzed as it is synchronous.
The types of Flop-flops available in Xilinx FPGAs
In Xilinx FPGAs, there are primarily types of Flop-flops that are available based on the control set which is made up of the clock, the data, the enable and a set/reset/preset or clear:
Handling Resets in Vivado
Control Sets: A Primer
Disclaimer: This blog is not intended to be a comprehensive guide to control sets. Please refer to UG949: The Ultrafast Design Methodology Guide for more details. The aim of this blog is introduce a few important concepts.
In FPGA's control sets are the clocks, resets, enables and sets (presets). For two flops to be placed within the same slice the control sets have to match. If the control sets don't match, then the placer will have spread the logic over the FPGA. Spreading the logic might result in timing closure challenges. It some cases a high number of control sets might result in Global, Long or Short congestion which might also result in timing closure challenges.
While designing FPGAs, it is recommended to limit the total number of control sets in the design. In Xilinx devices the acceptable number of control sets is between 7.5-15% of the total number of available slices in the device. Please refer to UG949: The Ultrafast Design Methodology guide for more information and your specific Device User Guide to find out the total number of available slices. In this blog we will only focus on resets.
Like I have already discussed in the previous blog, resets and clock enables tend to be the two control signals(apart from clocks, of course) that dominate designs which are most likely culprits for high-fanout nets and possibly congestion. If there are too many high-fanout nets in the design (greater than 15% of the available slices in the device) they tend to cause global congestion which can manifest as a timing closure problem. One way to mitigate global congestion is promote non-timing critical high-fanout nets to global clocks. Promoting a high-fanout net like a reset (resets are typically active for many clock periods and aren't timing critical in general) to a global clock (insert a BUFG) not only frees up routing resources for the timing critical signals in your design but also reduces congestion. The recommendation would be promote non-timing nets that have a fanout greater than 25k to a global clock. You can promote a non-timing critical high-fanout signal to a global using the following XDC constraint:
set_property CLOCK_BUFFER_TYPE BUFG [get_nets netName]
Timing critical high-fanout nets can be replicated with a KEEP attribute to aid in timing closure. It might be necessary to replicate the high-fanout net in each SLR.
Another trick to manage reduce control sets in the designs would be to merge reset/clock enable to the datapath. Suppose you had two flops one with a reset and another without reset and both clocks are on the same clock domain. Since the control sets don't match, the two flops cannot be placed in the same slice. By merging the reset (and clock enables) to the datapath, the control sets would match enabling the placement of flops in the same slice. You can add RTL attributes as shown in Fig. 9 below:
Fig. 9: RTL Attribute to control whether reset is hooked to the 'R' or 'CLR' pin or merged with the datapath
You can also merge the reset (or enable) to the datapath by using the XDC constraint as shown below:
set_property extract_reset "no" [get_cells top/synced_reset]
Handling High-fanout Nets
Earlier in the blog, I mentioned that replication will be needed depending on the fanout of the net - the more the fanout, the more critical the timing, the more the need to replicate the driver of the high fanout net. Having said that, in the synthesis stage, aggressive replication by the user can be counter-productive. This is because, the placement, routing and timing is unknown so aggressive replication will in all likelihood hurt more than it helps. The recommendation is to avoid MAX_FANOUT constraint in the synthesis stage on a global level. It is perfectly fine to have a few really targeted MAX_FANOUT constraints in a few blocks with a value of 512/1024 if absolutely necessary and only on nets that are expected to have a high fanout. In the opt_design stage, again since placement, routing and timing is not known, limited tool driven replication is recommended. In the placer, since we know exactly where the cells/hierarchies are placed and the timing is little realistic, tool driven mid-grained fanout optimization is recommended. After the router has finished the timing is accurate so in this stage you want tool driven fine grained replication based on where the driver is placed and where the destination register is placed and what the routing topology for the net is.
Fanout Optimization in Placer
In Vivado version 2018.1, we introduced a new feature in the placer that automatically manages high-fanout nets. The new and improved algorithm automatically inserts BUFGs during global placement on non-timing critical nets based on resource (a BUFG) availability. The placer also replicates high fanout nets and control signals to DSPs/BRAMs. The placer replication is based on the placement and the distance that net has to drive. The main advantage is for the user as no guesswork is needed at the RTL/Synthesis stages to figure out which high fanout needs to be manually replicated. The control set utilization will be optimal resulting in fewer designs being congested. The new algorithm addresses timing critical high fanout nets like resets early in the placement stage and reduces the need for post-place or post-route physical optimization iterations.
Fig. 10 below shows one example of a really high-fanout net that was automatically replicated by the placer. The immediate benefit can be seen by the nearly 0.500 ns WNS improvement:
Fig. 10: High-fanout net automatic replication in the placer
In part 1 of this blog I posted that resets cannot and shouldn't be ad-hoc. It needs to be planned very carefully early in the design phase. We also talked about the impact of resets to power, the choice of active-high and active-low resets and whether to use asynchronous or synchronous resets. Here is a summary of recommendations for resets:
I am certain that there are many other subtleties and 'gotchas' that haven't been discussed here. Reset is a Ph.D. thesis in itself in my opinion. If you have any further insights, please feel to share it with the community by commenting on this post (the comments are moderated by me but rest assured if the comment is relevant to the topic you will see your comments).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.