08-15-2018 03:01 PM - edited 08-21-2018 09:24 AM
Today's designs are really complicated with many clocks domains, embedded processors, IPs, complex state machines and sometimes even HLS tool generated RTL from high level languages like C/C++. This complexity is exacerbated by many types of resets - processor, system, core, software and IP resets. Further complicating the reset architecture is the choice of synchronous and asynchronous resets, active high and active low resets which makes RTL coding style complicated too. Unfortunately, reset architecture is not thought about early in the design cycle leading to every designer deciding the fate of resets in their blocks which results in a reset strategy that is ad-hoc and poorly planned and implemented leading to many iterations, debug and sometimes even product recalls.
The general recommendation is use synchronous resets as much as possible. Asynchronous resets are perfectly acceptable as long as you are aware of the limitations. In fact asynchronous resets are especially useful if a clock cannot be guaranteed. Of course, under such circumstances, in a system where a stable clock cannot be guaranteed a synchronous reset can be held under reset until the PLLs or MMCMs are locked and a stable clock is generated before the reset is released.
In this blog I will attempt to demystify resets as applicable in Xilinx's FPGA designs. In Part 1 of the blog I will address a few design considerations that impacts the reset architecture, the power impact and coding styles for synchronous resets and how to synchronize an asynchronous reset.
The Need for Resets and Reset Planning
The benefit of resets in a system is it forces the design to be at a known state for simulation or to ensure that every chip in the system is at a known state after power-up.
Reset architecture needs to be really well planned early in the design phase. Early planning of reset avoids surprises with disparate teams spread all over world with different teams responsible for the RTL design, verification, synthesis, implementation, timing closure and design validation and generation of the production bitstream. If not adequately planned, reset issues may show up long after the product has been in production. A case in point, I own a smart refrigerator from a leading manufacturer. In this smart refrigerator the LCD panel will refuse to respond to the user touch and the only way to get it respond is a hard reboot jut like the Windows PC of yesteryears. Calling the factory gets you the same resolution - Have you power cycled the refrigerator? Power cycling the refrigerator is a painful process every time I want to change the setting, don't you agree? Sometimes the LCD won't respond even after power cycling. I suspect a reset architecture issue. I'm sure there are other examples too.
The first step of the planning process is to decide whether a reset is even necessary. Xilinx FPGAs come with a Global Reset that resets every thing in the device. It might be worthwhile eliminating resets in your design and using the Global Reset instead. Eliminating resets is not an easy solution as it not only leads to coding issues (discussed later in this blog) but also eliminating resets in legacy RTL code and third party IP is not a trivial task. If it is possible to get rid of resets that is in your part the IP and connect the third party IP resets to the Global Reset then it would be an ideal situation.
Once it has been decided that the system or chip design architecture absolutely requires resets, the next steps are decide what to reset and what not to reset, how to deal with synchronous and asynchronous resets, whether to use active-high or active-low resets and finally how to handle reset clock domain crossing.
Reset and the impact to system/chip power
Resets have a huge impact on power and it can be attributed to two primary factors. First, excessive reset use generally create 3-5% more logic (FFs and LUTs) which you can simply think of 3-5% more power in general (can be more). Secondly, resets generally tend to have a higher fanout and is more timing critical. As a result, it can often consume the more valuable routing resources leaving the less timing critical data paths to use lesser optimal routes. Since resets do not toggle that often in general, whether it exists on a short route or a long one, power is much the same since dynamic power is driven by switch-rates. Data paths however obviously do switch and if they have higher wire-lengths, even if they meet timing, they will consume more power. So if wire lengths are increased by say 15%, then signal power will increase by that same amount. Reducing resets in your design will result in power savings as well.
What to reset within a design and how long should the reset be active
One of the decisions to be made at the planning stage is to decide what gets and what does not get reset in the design. But as a general guideline, reseting state machines and FIFO read and write pointers should be enough. Data paths seldom needs to be reset if the state machine controlling the data path is properly designed. In most cases, only the first pipeline stage needs to be reset. The rest of the pipeline stages will be flushed in the subsequent clock cycles.
Another consideration for resets is the duration of the reset pulse. Ideally the reset should be active long enough for the entire pipeline registers to be flushed before valid data can flow through the design. The reset pulse should be long enough, typically 20-50 clock cycles or until a few cycles after the PLLs or MMCMs lock, depending on the number elements that are being reset - the more registers (or flip-flops) that are being reset, the longer the reset pulse. This will ensure recovery and removal times are met for the registers that are placed far away from the reset register.
A common practice is use the 'locked' signal from a PLL or MMCM and combine (AND/OR type LUT) it with the system or CPU or core reset before resetting all the flops in the design. Keep in mind that the 'locked' signal is an asynchronous signal and LUT (OR/AND) in the path might result in a glitch. Such a spurious glitch might cause unwanted results. The recommendation would be to register the output of the LUT and then using the output of the register to reset the flops. The idea here is you don't want a LUT driving resets all through the design as it can lead to a spurious reset due to a glitch. The recommendation is, the source of the reset must be the output of a flip-flop and not from the output of a LUT or combinatorial block. Registering the LUT output allows the tool to replicate it in case the fanout is very high.
Synchronous and Asynchronous Resets
Another architectural decision that has to be made very early in the design process is to decide whether to use a synchronous reset or an asynchronous reset. Xilinx's recommendation would to be use synchronous resets throughout your design. Asynchronous resets are very common in today's complex SoC designs. If there are any asynchronous resets in your design, the recommendation would be to synchronize the asynchronous reset. Xilinx also recommends the use of XPM CDC modules for all CDC topologies including asynchronous resets. The XPM CDC modules are correct by construction and come with timing constraints. An asynchronous reset synchronizer allows the reset signal to be asserted asynchronously but the de-assertion (or removal) will be synchronous.
Fig. 1 show an active high asynchronous reset synchronizer and the corresponding RTL coding is shown in Fig. 2. When the reset is asserted (1'b1), the Q-pins of the synchronizer flops goes high and stays high until the reset is de-asserted (goes low). Once the reset is de-asserted, in the next clock cycle, the 1'b0 on the D-pin of the first flop is captured and in the next cycle(two clock cycles later), the output of the second flop in the synchronizer goes low. If there are more flops on the synchronizer chain (more than the 2 flops shown in Fig. 1), then it would take that many more cycles before the reset is de-asserted on the last flop of the synchronizer chain.
Fig. 1: Schematic for synchronizing an asynchronous reset
Fig. 2: Verilog code snippet for synchronizing an active-high asynchronous reset
If the asynchronous reset comes from a top level port and will feed multiple clock domains, remember to synchronize the reset in each of the respective clock domains.
Notice that the code snippet is for an active high reset and the D-pin of the first flip-flop is connected to GND. If the reset is active low, the D-pin of the first flip-flop is connected to VCC. The asynchronous reset is connected to the CLR pins of the reset synchronizer (see Fig. 4 and Fig. 5 below).
Once the asynchronous reset has been synchronized in the top level module, the subsequent RTL coding style (the wire from the Q-pin of reset_a_synched_reg flip-flop in the Fig. 1 above) should ideally be for a synchronous reset as shown in the code snippet below:
Fig. 3: Code snippet for the rest of the design once the asynchronous reset has been synchronized in the top level.
In the code snippet of Fig. 3, the hierarchical modules below the Top level module use a synchronous reset coding style. Depending on the fanout of the reset register, 'reset_a_synced', it could be replicated as necessary in order to meet timing. Another thing to note is, the register reset_a_synced goes high as soon as the asynchronous reset is active. It goes low (or inactive) depending on how many synchronizer stages there are. If there are 2 FFs in the synchronizer chain, the reset will be inactive after two clock cycles. If there are three FFs in the synchronizer chain then the reset is inactive after three clock cycles. If a register is added to aid in replication then it add another reset latency.
The first reason for recommending synchronous resets is for big blocks like DSPs and block RAMs which by architecture support only synchronous resets. The inference of DSPs and block RAMs is possible if synchronous resets are used. Use of asynchronous resets might result in these structures getting inferred in the fabric which might hurt performance. In the DSP blocks, the pipeline registers only support synchronous resets. In block RAMs, the output registers support only synchronous resets and using output registers is an advantage as it reduces the clock-to-out (Tco).
The other reason for using synchronous resets is the flexibility for the tool to either hook the reset directly to the R or the CLR pin or merge the reset signal to the datapath. This flexibility reduces the number of control sets in the design and allows the placer to pack more flops into the same slices (refer to Chapter 5 of UG949: Ultrafast Design Methodology Guide for the Vivado Design Suite for control sets).
The final reason for recommending synchronous resets is, synchronous reset is automatically timed and do not need any special timing constraints. Synchronous resets are predictable ( the clock edge) when compared to asynchronous resets because in asynchronous resets the release of the reset is not always predictable - it can happen at any time.
Active-high and Active-low Resets
For control signals in general and resets in particular, it is recommended to use either an active-high or an active-low reset throughout your design. I have observed RTL issues where one hierarchical module designer assumed active-high while another designer assumed active-low for the same reset signal. The simulations were all done at the hierarchical level by the RTL engineer so they all passed. When the FPGA was being tested in the lab it was noticed that one hierarchical block was always in reset. After a lot of debug the issue was identified to problematic reset coding. The choice of type of reset - active-high or active-low needs to be decided at the planning stages itself to avoid any surprises later in the design stages. The decision should be based on the system architecture, any legacy RTL code that will be reused and the reset styles in third party IPs.
If possible, always use active-high resets (as active-low resets require an inversion adding a LUT in the path) when using Xilinx FPGAs. If your design has active low resets, the reset synchronizer and the RTL code snippet are as shown in Fig. 4 and Fig. 5 below:
Fig. 4: Active low asynchronous reset synchronizer
Fig. 5: RTL code snippet for active-low asynchronous reset synchronizer
In the next blog I will cover a few tricks and tips on combining multiple asynchronous resets, sequencing resets across clock domains, an insight into how you can manage reset behavior in Vivado and finally how Vivado manages resets in your design.
08-21-2018 02:50 AM
thanks for the great article, looking forward for the next blog!
Reset in modern FPGA's & SoC is very poorly treated in books, if treated at all. It's a hugely underestimated subject in FPGA design.
it would be worth mentioning that Xilinx FPGAs have a few different FF primitives (as described in UG799) - it's a subtile difference in your blockschematics. Also the meaning of 'preset' vs 'set', and 'clear' vs 'reset'. I learned this in this and this interesting forum thread on resets:
- preset - asynchronously load a 1 (FDPE)
- clear - asynchronously load with 0 (FDCE)
- set - synchronously load with a 1 (FDSE)
- reset - synchronously load with a 0 (FDRE)
If I may be so free : some ideas for future blog-follow ups:
* a (to me at least) confusing aspect in system reset is the 'initial' value that every FF gets from the bitstream loading, as in the first forum thread I mentioned above (look for the interesting answer of avrumw)
* also the 'STARTUP2' block is still something osbcure to me, when and how to use it (I'm a Zynq user)
* the Zynq system reset IP would be interesting to explain a bit more in detail, and how to handle DCM locked, multiple clock domains, .. .see this thread for ideas
* how to correctly simulate the GSR, INIT values, reset CDC, ... would also be interesting, see for example this thread for ideas
thanks again for this great blog post!
08-21-2018 12:58 PM
Code snippets obviousy add greatly to the redabilty
any chance of vhdl for those of us tha dont write verilog please
should be easy to guess whats required, but if you could round off the article please
08-26-2018 03:14 PM
I would like to make a couple of observations/comments/follow ons...
But, first, let me say that I wholeheartedly agree with the concepts of this post. Reset methodology must be planned in advance. It should part of the three absolute musts for any design:
- plan your clocking methodology
- plan your reset methodology
- plan your I/O interfaces
(in addition to actually architecting the rest of yor design).
Ad-hoc reset style leads to problems...
And the basic questions
- do I need to reset everything (global vs. non global)
- and if not global, what do I reset and what do I leave for the INIT/GSR?
- do I use synchronous or asynchronous resets
- do I use active high or active low resets
are part of this planning.
And resets (particularly global resets) have a significant cost:
- they create one additional timing path for every flip-flop that is reset
- these paths often share the same startpoint (the "master reset")
- meeting timing on this can be very disruptive to the placer (which will cluster all logic around this master reset)
- they create lots of additional routing, which can prevent other (critical) paths from getting access to the routes they want
So - pay attention to resets.
Now a couple of comments.
Regarding active low resets. In most architectures there is no cost for active low resets - but not all. Take a look at this post on active low resets; aside from the more mundane problems (it's easier to make mistakes) and the fact that they don't make sense anymore (and haven't in decades), they do have a cost in the 7 series architecture.
Regarding not resetting datapaths: This is true as long as the datapath has no feedback. Many data paths do have some kind of feedback - anything that is doing summation, for example, or anything where a condition from "later" in the pipeline affects the behavior of the datapath "earlier" in the pipeline are examples of feedback. If you only resetting the first pipeline stage of the datapath, then the feedback input will be "unknown"; this unknown will combine with the reset value propagating down from the first pipeline stage and end up with another potential unknown. This can block the reset condition from further propagating down the pipeline path.
Regarding asynchronous resets: the statement
If there are any asynchronous resets in your design, the recommendation would be to synchronize the asynchronous reset.
needs to be significantly strengthened. In most cases, failure to properly synchronize the deasserting edge of an asynchronous reset will create a system that is prone to reset failures. In such a system, the deassertion of reset has a finite (potentially small, but still finite) probability of causing the system to crash immediately. This is probably the most common error regarding asynchronous resets; they are called "asynchronous" since the reset action occurs immediately (as opposed to waiting for the next clock edge) - it does not mean that the reset can be deasserted "any time it wants" - it must be deasserted synchronously to the clock.
Let's illustrate this with an example. Let's say we have a flip-flop with an asynchronous clear (CLR). However, the D input of the flip-flop has the value 1. Now lets look at the behavior of this flip-flop during different conditions of the arrival of the deasserting edge of CLR and the rising edge of CLK
- if the rising edge of CLK arrives before the deasserting edge of CLR, then the flip-flop remains in reset and hence the Q remains 0 through the next clock edge
- if the rising edge of CLK arrives after the deasserting edge of CLR, then the flip-flop is no longer in reset during the CLK edge, and hence the Q output goes to 1
- if the two are too close to each other than
- the flip-flop may remain 0
- the flip-flop may transition to 1
- the flip-flop may go metastable
In the last instance, there is no what to know/predict which of these cases it is. Clearly if the FF goes metastable, your system can crash.
The only time you can consider using an asynchronously deasserted asynchronous reset (and I do not recommend this) is if there is absolutely no possibility that the D inputs of the flip-flops can be anything other than the reset condition during/on the first clock after reset. In this case, since the flip-flop's output will be the same regardless of which condition (the deassertion of CLR or the rising edge of CLK) occurs first. Again - this is not a recommended practice - I am only mentioning it because of what I am going to discuss below.
And it's even worse. Lets say we have a state machine that has 3 bits. On the "first" clock after the deassertion of CLK it is supposed to transition from the 000 state to the 111 state. If there is a race between CLR and CLK, then each of the 3 bits will independently resolve as above. Even if none of them goes metastable, if two of them go one way and the other goes the other way, you can literally end up in any state on the clock "after (really simultaneous with)" the deassertion of reset. Here, even though you don't have metastability, your system has crashed.
And furthermore, since the paths involved (from the source of the reset to the flip-flop CLR pins) are not timed (assuming you let them be asynchronous and hence use something like a set_false_path on them), the routing on them can be different between the three bits - maybe even significantly different - which will signigicantly increase the probability that your state machine crashes on the deassertion of reset.
So this kind of reset won't even work if the D input of the flip-flop is the reset value during/on the clock after reset. Since the paths aren't timed, the skew between when the CLR deasserts at the first flip-flop vs. when it deasserts at the last one can be more than one clock. So, for cases like this, you could only consider this kind of reset if there is some other condition that guarantees that the D inputs remain at their reset state for "some number" of clocks after the deassertion of the asynchronous CLR - essentially there is some other synchronous reset or enable that holds the flip-flops in their reset state until "long enough" after the CLR is deasserted. Again, this is not recommended, I am only mentioning it because of what I am going to discuss below.
Finally I want to point out that the "initialization" of flip-flops in Xilinx devices (the INIT value) is controlled by the internal configuration signal called GSR. The GSR is part of the architecture of the FPGA, and hence does not use any fabric resources (hence is free), but still goes to every flip-flop, forcing them to their INIT value. The GSR is automatically asserted during configuration, and is deasserted near the end of the configuration cycle, allowing the flip-flops to begin functioning normally. However, the GSR is a slow propagating signal, so
- it does not arrive at every flip-flop at the same time and
- it does not deassert synchronously with any user clock
For these reasons, it should be considered as an asynchronous reset that is deasserted asynchronously to your clock (with potentially large skew). It has all the same problems:
- if a flip-flop can potentially transition to it's non-reset state immediately after GSR deasserts, the GSR is insufficient
- if a set of flip-flops is used to code a coherent value or state, it is required for all FFs to come out of reset at the same time, which the GSR cannot guarantee - again, the GSR is insufficient for this case
So this all leads to the fact that the GSR cannot be relied on in some cases; for those cases you need an explicit (i.e. coded) reset mechanism. It can be very hard to identify exactly which ones it can and cannot rely on the GSR. Getting this wrong can lead to devices that occasionally crashes on reset. However, getting it right can result in significantly reduced resources used for resets, so designs that can meet timing more easily (or go faster) fit in smaller devices, and consume less power. So, back to the original point... Resets are complicated!
09-04-2018 08:17 AM
thanks @avrumw for the additional remarks!
you wrote for a global reset, that the placer will cluster all logic around this master reset -> is there a way to solve this? Like inserting a kind of buffer to 'split' the reset, and repeat this several times, to create a 'tree' of reset signals, such that the placer can better spread the logic?
09-05-2018 08:57 AM
09-05-2018 09:08 AM
you wrote for a global reset, that the placer will cluster all logic around this master reset -> is there a way to solve this?
Convetional "buffering" won't make a difference - physical distance means delay, and the path delay (whether it is all routing or a combination of routing and LUT delay) must remain under one clock period for static timing. It is this that causes the clustering.
In some cases (although this is not a highly recommended solution), you can consider using a BUFG to distribute the reset signal. Since the global clock network is balanced, the placement of flip-flops won't matter (or at least matter much) to the static timing path from the master reset to all the leaf flip-flops - so it probably won't cluster. But, the delay through the BUFG is not short, and at very high clock frequencies it simply may fail timing on all paths... (And, of course, it uses a BUFG).
The only other way to "buffer" this is to pipeline the reset through a tree structure. For example, have a reset pipeline register in each major sub-block, and maybe at each sub-sub-block as well (depending on the size of the design). I prefer to do this manually (rather than have the tools replicate the pipelined resets) since it makes sense to have a sub-module "clustered" around it's local copy of the reset (this can actually be beneficial for the placement). But you could also consider just inserting the pipelining and let the tool do the replication (I haven't tried this)...
Of course, it now takes you one (or two) more clock periods to enter and leave reset.
But this is also not a widely adopted reset methodology - I have used it in the past (and it seems to produce good results), but it is not (at least to my knowlege) one that has been endorsed (or even analyzed) by Xilinx.
01-21-2019 08:38 AM
@stevet: Apologies for the late response. I was traveling on business and then on vacation.
The clock for the FF should ideally be from the input clock. The PLL/MMCM output clock wouldn't be stable before the PLL/MMCM is locked.
06-25-2019 10:56 AM
@avrumw you mentioned that there are 3 aspects that need to be planned in advanced:
- plan your clocking methodology
- plan your reset methodology
- plan your I/O interfaces
I understand the importance of the first two. Could you please explain why you think planning I/O interfaces should also be done in advanced?
12-02-2019 10:44 AM