UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar helmutforren
Scholar
3,245 Views
Registered: ‎06-23-2014

Confused about reset timing

I'm using Vivado 2017.1, SystemVerilog, Kintex 7 160T.

 

QUESTION ONE:

The following code doesn't behave as I would expect:

wire rst;
reg [63:0] rstpipe = 64'hffffffffffffffff;
assign rst = rstpipe[63];
always @(posedge SYS_CLK) rstpipe[63:0] <= {rstpipe[62:0], 1'b0};
wire rstbit0 = rstpipe[31];

The SYS_CLK is 200ns period.  I've connect rst and rstbit0 to output pins that drive LEDs.  When I probe the two LEDs and trigger on the config DONE signal, I see that rst is high for roughly 2 clock cycles (+/- 50ns) and rstbit0 is high for roughly 1 clock cycle (+/- 50ns).  I would have expected instead that rst was high for 64 clock cycles and rstbit0 high for 32 clock cycles.  

 

Furthermore, the original code set rstbit0 = rstpipe[0].  Amazingly, in this version, the timing for rstbit0 was identical, roughly one clock cycle (+/- 50ns).  That means the change from tapping bit 0 to tapping bit 31 made zero change.  I just plain don't understand it.

 

Is there some trivial error in this code?  Is the startup clock oscillating ridiculously fast before settling into 200ns?  What's up.

 

QUESTION TWO:

Meanwhile, I remain confused about resets.  I read that local resets are better than global, and I tried to implement what I think this meant, but I think I messed up.  Be aware that my project has nearly 100 modules, each in a separate .sv file, with the "[call] invocation tree" being at least 6 levels deep.  The 160T is far more than 50% used up.

 

However, the references I find don't describe what local resets really means, in a manner that I'm able to understand and implement.


Specifically, I found several months ago this: https://www.xilinx.com/support/documentation/white_papers/wp272.pdf

Page 5 has two sections...

 

For 99.99% of the cases, they mention ram initialization.  But if I'm writing in SystemVerilog and have a "reg", such as "reg rst", then how do I implement what this PDF suggests?  Do I do "reg rst = 1;"?  Is that what it means?  If so, then this is in essence what I've done in question one above (were it a single bit rather than 64 bits, or for each of 64 separate bits).  If I'm doing this correctly with regard to WP272 page 5 top half, then could someone please tell me so?

 

For 00.01% of the cases, they mention a series of flip flops.  Aren't the four flip flops in figure 7 exactly analogous to my 64 bits for reg in question one above?  Can someone please tell me if I'm understanding this correctly or incorrectly?

 

Next, I'm not directly invoking flip flops.  I'm writing SystemVerilog code in always blocks.  I first learned Verilog by looking at some admittedly bad code.  Generally speaking, it looked like this:

always @(posedge SYS_CLK) begin
    if (rst) begin
        regone <= 1;
        regtwo <= 2;
        regthree <= 3;
    else
        /* Application logic with regone, regtwo, regthree */
    end
end

I have multiple always blocks in each of my 100 modules.  At first, a single "rst" reg at the Top.sv fed through to all of them.  Obviously this is a global reset.  So how do I get from where I am to the concept of local reset?

 

I did build a sequence tree to lower my fanout, but I don't think this is what WP272 means.  That is, at several levels in the modules I added "always @(posedge SYS_CLK) rst2 <= rst3;"  I counted my levels of clock delay and made everything use a rst0, it all originating from a rst3.  A mess.  It also means after initial configuration, there's a time delay from rst3 applying to rst0 applying, and this time delay is putting some external hardware (yet to be connected) at risk.  This is what has brought me to asking the question again.

 

So, once more,  how do I get from where I am to the concept of local reset?  

 

(((

I can't help but postulate a little, having phrased the question.  I could take all my registers in the "if(rst)" clause, remove this clause altogether and leaving just a "if(!rst)" clause, and then add initial values to all the reg definitions, such as "reg regone = 1; reg regtwo = 2; reg regthree = 3".  That makes the reset of the vars like WP272 page 5 99.99% of cases.  But then there's still the rst itself that needs to reach everywhere. [bookmark one]

 

In WP272 I read "Establish the critical parts of the design that have to be released synchronously with an associated clock domain. A localized high-performance reset network can then be inserted to control only those flip-flops that require a localized reset."  Ok, so I can do that and already have to some extent.  I still don't understand how to get from that sentence to actual implementation.

 

In reality, most of my modules are interconnected by FIFOs.  So as this may be the saving grace.  I'm thinking, the FIFOs must all be instantaneously reset coming out of config.  Doing local versions of question one above would accomplish that.  Once in reset, their "empty", "full", and "prog_full" signals that I use should all be asserted.  They can come OUT of reset at totally different clock cycles.  No problem.  The existing logic for those three signals will prevent mishaps. [bookmark two]

 

So if I do what I say in paragraphs above for [bookmark one] and  [bookmark two], then all my resets really ARE local and all my modules should play well with one another.

 

BUT THEN I need a soft reset as well.  I guess the soft reset needs to go active globally.  This leaves me with my existing tree of "always @(posedge SYS_CLK) rst2 <= rst3".  They will arrive at different times, but I'm thinking the FIFO controls will keep things playing nice.  But then as well I need the "if(rst)" clause back.  DARN IT.

 

So now I'm stuck back where I started.  With configuration time reset only, I did OK.  But if I need (want) a soft reset as well from pressing an external button like the KC705 has on it, then I'm once again confused.

)))

 

Thanks,

Helmut

 

0 Kudos
30 Replies
Highlighted
Scholar jmcclusk
Scholar
3,232 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

Hi @helmutforren !  Let me share a little of my experience with resets.  They are a rather necessary evil, absolutely required in ASIC design, and somewhat optional in FPGA design.  I have been tending to leave them out in my newer modules these days, because they just clutter things up.   

 

But!  having resets sometimes is required, especially when the software guys want a soft reset controlled by a register bit, so they can restart a hung module, or just start over after mucking up the register settings for a block.   So then the question becomes, synchronous or asynchronous reset?   Xilinx has for many years preached the religion of synchronous resets, but I've found out through hard experience that sometimes asynchronous resets are easier to route in crowded designs.  I explain this phenomenon by the fact that asynchronous resets can only use the SR input on a logic slice, while a synchronous reset is typically just folded into the logic cone of all the FF's connected to it.   This can result in deeper logic cones, more LUT's between registers, and a slightly more challenging netlist to place and route.    In theory, we can direct a synchronous reset to use the SR input on the slice using the DIRECT_RESET attribute, but I haven't tried this yet (I wish I had now!).   I expect that using the attribute will place both synchronous and asynchronous resets on equal footing,  place and route wise.

 

Verilog example of DIRECT_RESET:

(* direct_reset = "yes" *) wire rst3;

 

The issue of asynchronous vs synchronous reset is so serious that I've devoted significant time at more than one job to write a ruby script that converts VHDL code to support both modes (controlled by a generic, so I can flip it at will).    It should go without saying that all asynchronous resets MUST be sourced by a driver in the same clock domain as the receiver, otherwise Bad Things Will Happen (tm).  

 

One further thing that is significant to both local and global resets.   you SHOULD NOT SPLIT THE RESET NET IN THE RTL.   It used to be all the rage, and fashionable to do this, because the place and route tools were simply incapable of handling high fanout nets that needed to be partitioned with duplicated drivers.  I see old code all the time with this.   But it's no longer needed.  Vivado is quite capable of splitting a high fanout net, and duplicating multiple drivers (always a register) in the phys_opt_design stage.     It does a great job, far better and faster than a human, based on the approximate timing derived from the design placement.  phys_opt_design is even capable of deciding *which* high fanout nets need to be optimized, based on estimation of timing slack.  You can also tell it which ones to split, using the -force_replication_on_nets option.  The only help that Vivado really needs is a pipeline of at least 2 (or even 3) registers on the reset, so that it can duplicate the last register (or 2 registers) to create a nice geometric distribution network.  

 

One last thing.   I do worship at the Church of the Synchronous Reset.   

 

Hopefully,  @avrumw will give us a sermon on the True Faith.

Don't forget to close a thread when possible by accepting a post as a solution.
Tags (1)
Scholar helmutforren
Scholar
3,219 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

@jmcclusk thanks very much for taking this on as well.  You're written a lot for me to think about.  I want to analyze it closely, paragraph by paragraph.  I want to get a quick response to you, so I'll ask right now about only the last paragraph.  I'll try to ask about the others later.

 

You wrote:

One further thing that is significant to both local and global resets.   you SHOULD NOT SPLIT THE RESET NET IN THE RTL.   It used to be all the rage, and fashionable to do this, because the place and route tools were simply incapable of handling high fanout nets that needed to be partitioned with duplicated drivers.  I see old code all the time with this.   But it's no longer needed.  Vivado is quite capable of splitting a high fanout net, and duplicating multiple drivers (always a register) in the phys_opt_design stage.     It does a great job, far better and faster than a human, based on the approximate timing derived from the design placement.  phys_opt_design is even capable of deciding *which* high fanout nets need to be optimized, based on estimation of timing slack.  You can also tell it which ones to split, using the -force_replication_on_nets option.  The only help that Vivado really needs is a pipeline of at least 2 (or even 3) registers on the reset, so that it can duplicate the last register (or 2 registers) to create a nice geometric distribution network. 

 

Please clarify "SHOULD NOT SPLIT THE RESET NET IN THE RTL".  So you mean the way I did "always @(posedge SYS_CLK) rst2 <= rst3", or do you mean something else?  I **might** have done that because I was getting timing errors, and doing so removed some of the timing errors. Bottom line, I need to understand what "split" means.  Then, your last sentence sounds like what I was doing.   I'm just having difficulty getting from what you write to examples in my mind of what the [System]Verilog code looks like.

0 Kudos
Scholar jmcclusk
Scholar
3,213 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

I seized upon the sentence "I did build a sequence tree to lower my fanout,",  so I assumed that you were splitting your reset signal by inserting registers at lower levels of the hierarchy.   In the old days, this usually helped, but now it just hinders phys_opt_design from doing it's job.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar helmutforren
Scholar
3,207 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

Yes, @jmcclusk, that's exactly what I did.  So, I take it that "split" means what I did, and you way I shouldn't need to anymore.  I may have done it in an attempt to "get local".  But it wasn't a good attempt.

 

What about QUESTION ONE?  Why do I not get the behavior I expect?  rst goes high for about 2 clocks rather than 64.

 

Stepping back one paragraph in your earlier answer:

The issue of asynchronous vs synchronous reset is so serious that I've devoted significant time at more than one job to write a ruby script that converts VHDL code to support both modes (controlled by a generic, so I can flip it at will).    It should go without saying that all asynchronous resets MUST be sourced by a driver in the same clock domain as the receiver, otherwise Bad Things Will Happen (tm). 

I realize that putting a signal into an SR flipflop would be asynchronous, and putting a signal into the clock input of a D flipflop would be synchronous.  But I'm not coding flipflops.  I'm coding always blocks with logic statements inside them.  Because the always block is @(posedge SYSCLK), I consider it synchronous.  I don't know how I could make it asynchronous.  I could use "always @*" and it would be asynchronous, but I couldn't set a reg in that always block, and then ALSO set the reg in a synchronous always block for my application code.  See my second code block in the original post.  In addition, I pass a rst variable to other Xilinx IP, such as flipflops.  I could go look in the doc, for sure, to see if they're synch or asynch.  But I still have no clue if I want asynch instead of the synch I appear to have now.

 

Stepping back another paragraph:

But!  having resets sometimes is required, especially when the software guys want a soft reset controlled by a register bit, so they can restart a hung module, or just start over after mucking up the register settings for a block.   So then the question becomes, synchronous or asynchronous reset?   Xilinx has for many years preached the religion of synchronous resets, but I've found out through hard experience that sometimes asynchronous resets are easier to route in crowded designs.  I explain this phenomenon by the fact that asynchronous resets can only use the SR input on a logic slice, while a synchronous reset is typically just folded into the logic cone of all the FF's connected to it.   This can result in deeper logic cones, more LUT's between registers, and a slightly more challenging netlist to place and route.    In theory, we can direct a synchronous reset to use the SR input on the slice using the DIRECT_RESET attribute, but I haven't tried this yet (I wish I had now!).   I expect that using the attribute will place both synchronous and asynchronous resets on equal footing,  place and route wise.

 

Verilog example of DIRECT_RESET:

(* direct_reset = "yes" *) wire rst3;

 

So I latch on to "asynchronous resets are easier to route in crowded designs".  My design is crowded.  But my always code seems it must go the synchronous route.  So I gather I might EXPERIMENT with this (* direct_reset = "yes" *).  However, I think context is missing.  The wire is "rst3".  If I use "if(rst3)" in my always block, is that the context needed?  Would synth or impl or routing figure out that this is what you're talking about in your paragraph?  I haven't worried to date about the exact flip flop level implementation of what I code, for the most part.  Not totally, but for the most part.

 

Bottom line, I still have no idea how to implement/integrate any of your advice into my source code!  Darn!  I don't know if I'm too dense, or too tired, or what.  I have been doing regular programming for 40 years, have a PhD in EE, electronics design for 40 years too, been doing FPGA for 2 years now.  But I think visually/graphically and not verbally.  So when I read your writing, I generate visual images of code, which in my mind is not a sequence of text but a hierarchy of blocks.  So maybe it's just the way my mind works.  I'm having trouble getting from your words to blocks of code.

 

 

0 Kudos
Scholar jmcclusk
Scholar
3,199 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

Perhaps you should give this a try first:   Remove the lower level registers on your reset net, and make sure that your top level reset driver has 2 or 3 registers in sequence.   Make sure your declaration for the reset wire has the DIRECT_RESET attribute so that Vivado will recognize it as a reset network.    

 

Also, make sure that phys_opt_design is enabled in your Vivado compilation flow.  Choose -directive Explore or AggressiveExplore

Then launch synthesis and place & route.  When the routing is done, load the routed design in vivado and select your reset net to see how it's routed.  If all is good, you should see the reset net fanning out to multiple registers, and then fanning out from each register as a local reset in that geographic region.

 

Looking at your code for the rst signal that is 2 clocks instead of 64, clearly something weird is happening with the synthesis.  Have you looked at how this appears in the schematic?   I would expect it would generate a pair of SRL32 cells, with the rstbit0 connected between them.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar helmutforren
Scholar
3,132 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

Thanks @jmcclusk.  I'll try those things as soon as I get my "current 325T" project to build without Vivado crash, and also catch up with schedule items I missed because of the delay.  This means from several days up to two weeks from now.  (ref: https://forums.xilinx.com/t5/Synthesis/EXCEPTION-ACCESS-VIOLATION-in-Vivado-2017-1/td-p/824497 )

 

Note about question one:  The clock is 200MHz, not 200ns.  LOL.  That makes 64 times the 5ns period equal 320ns.  Then for the lower bits, if there's some delay time between config done and clock beginning through the design, then that accounts for its delay.  Overall, I think it's actually a non-problem.  Well, only a problem in my head!  (I need a longer reset anyway due to a power supply dip.  I think I'll change the shift to a down count subject to "if" the rst bit (MSB) is still set.  I'll then be able to get my several ms desired reset.)

0 Kudos
Scholar helmutforren
Scholar
3,042 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

 

@jmcclusk, during 40+ min builds of my "current 325T" project as I move it forward for specific needs this week, I intend to follow your reset advice on a different project, my "current 160T" project.  

 

With that in mind, I need a little "hand holding", please.

 

 

Specifically, I want to "When the routing is done, load the routed design in vivado and select your reset net to see how it's routed.  If all is good, you should see the reset net fanning out to multiple registers, and then fanning out from each register as a local reset in that geographic region."  I want to do that with the build as it is now, and then after that "Also, make sure that phys_opt_design is enabled in your Vivado compilation flow.  Choose -directive Explore or AggressiveExplore Then launch synthesis and place & route."  (The quotes are from your post.)

 

However, I actually DO NOT KNOW HOW to select my reset net to see how it's routed. 

 

=================================================================

The text below was written AFTER I got rid of Xilinx IP module "proc_sys_reset" and instead changed this version of code to use a register initialized at configuration.

=================================================================

 

My rst code looks like this:

 

    reg [7:0] rstcounter = 8'hFF;   // Start reset counter at 255.  Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits )
    wire rst = rstcounter[7];       // Use high bit as rst signal 
    always @(posedge SYS_CLK) begin
        if (rst) begin
            // If rst bit is still set, count down
            rstcounter <= rstcounter - 1;
        end else begin
            // Otherwise, if rst bit has gone low, stop counting
        end
    end

From my recent hardware debugging efforts with my work associate, this should do what I want.  It should also be *sufficiently* consistent with your suggestion, @jmcclusk, of having a 2 or 3 register sequence.

 

QUESTION: Exactly HOW do I do what you said, "load the routed design in vivado and select your reset net to see how it's routed"?  I know how to "Open Implemented Design" to get both a "Device" tab that looks like the chip and a "Netlist" tab.  I wen to that netlist tab and found "Top", within that "Nets", and within that my signal "rst".  When I click on it, it highlights a couple dozen white ARROWS on the Device tab.  This gives me the impression that "rst" only goes to those places.  IS THIS THE ROUTING TO WHICH YOU REFER?

 

Note that it does NOT go to everywhere that I know I use "rst".  However, I see this signal right below that's called "rst_BUFG".  This did NOT come from source code I wrote.  I suspect it might be automagically generated as you have suggested.  When I click on it, a much larger nest of arrows gets highlighted.  I am *guessing* that Vivado routed my "rst" through a BUFG to distribute "rst_BUFG" throughout the chip on a clock capable route.  IS THIS WHAT YOU MEANT?

 

Note that I do not see "routing" like I am accustomed to seeing on PCB designs.  Maybe I'm not supposed to.  But this was my expectation, so I'm not sure if I'm seeing what I'm supposed to see.  I don't see a list of routing directions, either, like I see when I look at a failing timing report.

 

Note that when I click on "rst" and look at Net Properties, then Connectivity, then Load Net Delays, they appear to vary from 308ps to 3514ps.  These are all LESS THAN my 5ns clock period.  I think this is GOOD.  When I do the same for "rst_BUFG", they vary from 1432ns to 1389ns.  Now, I DID NOT CONFIRM that "rst" actually drives "rst_BUFG", and I didn't look for the net delay from "rst" that might actually drive "rst_BUFG".  But if I simply add up the worst cases, I get 3514+1389 = 4903.  That's still barely less than my 5ns clock.  I think this is also PERHAPS GOOD.  I don't know about the BUFG propagation delay internal to itself.

 

Anyway, I do NOT think the above is the way you intended me to look at it.  It's not "clean" or "obvious".  Did you intend a different or further way?  (See NOTEs below.)

 

NOTE: I find Implementation (right click) / Implementation Settings / Implementation / Options / Description / Opt Design (opt_design).  Then I find "-directive".  It is currently "Default".  I have NOT yet set it to something else.  I will and will rerun.  However, it has "Explore", "ExploreArea", and "ExploreSequentialArea" options.  I don't find a "AggressiveExplore" option.  Perhaps this will give me something seemingly more like you are saying.  EDIT: Adding -directive "Explore" as described didn't change what I see in "Device" or "Netlist" tabs.  I'm still assuming there must be a routing list somewhere.  Note that I looked this time and my module beneath Top.sv, that uses "rst", when inspected in the "Netlist" tab, does *not* have a signal "rst" but *does* have a signal "rst_BUFG".  This reinforces idea that implementation has inserted the BUFG in order to route a buffered "rst" to my lower level modules.  Still haven't tried "DIRECT_RESET".

 

NOTE: I have not yet set "DIRECT_RESET".  I'll go do this, too, now...

 

Below I'm pasting what I see for both "rst" and "rst_BUFG".  I added red arrows to point things out.  For "rst_BUFG", I searched for that text to make sure it wasn't *I* who created it.  The only places found were in reports.

 

rst net.jpg

 

rst_BUFG net.jpg

 

 

 

=================================================================

The text below is OLD and you might IGNORE it.  It was written BEFORE any changes.  I was using Xilinx IP module "proc_sys_reset" to create the "rst" signal, and then sending that "rst" signal everywhere.

=================================================================

 

Specifically, I "Open Implemented Design" and get a new "Device" tab that is a graphic of the device, with dull blue areas marked as used areas.  I get a "Netlist" tab as well.  Under my "Top" is "Nets" and under that is "rst".  When I click on reset, it draws many, many white arrows on the Device, giving me the impression that this "rst" net goes to all the places at the ends of the arrows, having been driven from the single root.  I get the impression that it's a single huge fanout.  When I zoom into the root of the arrows, it is the Q pin of a single block with SR,D,CK,CE inputs (mouse over says Reference: FDRE, Type: Flop & Latch, BEL: DFF, Site: SLICE X4Y195; note this is a Kintex 7 160T). 

 

I DO NOT KNOW if this is what you meant to do.  It does not seem to be giving me info about HOW the signal is routed.  Net Properties says "Type: LOCAL_CLOCK (local clock)" and "Flat cell pin coujnt: 3901". Net Properties Connectivity net delays vary from 274ps up to 9904ps, with most in the range from 5000ps to 6000ps.  That's 5ns to 6ns.  My clock is 5ns.  So delay is close to or greater than one clock.

0 Kudos
Scholar jmcclusk
Scholar
3,033 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

Wow @helmutforren!  you've been cranking hard on the reset!   First of all, you need to learn how to do two things with Vivado.

 

1.  F4 is your good friend.   It summons up a schematic of whatever you have selected.  Want to know if there is a clock buffer driving your reset net?    Hit F4.    Do you suspect that Vivado goofed up on synthesizing your FSM?    Hit F4.    Trying to understand the logic driving an IO buffer?    Hit F4.   Want more fun?  select a net or a cell in the schematic, then flip to the device tab.   There is the net!  (or cell), nicely highlighted.

 

2.  Vivado won't show you the detailed routing by default.  It's horrendous on big designs, so the default device display will only show ratnests (white arrows).  These are usually adequate to get a sense of where things are connected.  The button to toggle detailed routing display looks like two horizontal rows of pins, connected by a green line.   I only use this when I want to verify extremely low level routing details, like the inputs to a LUT or FF at the slice level.

 

It looks like Vivado did insert a BUFG as a high fanout driver for your reset network..   This is all fine, as long as your device die isn't too huge, or your clock frequency isn't too high.  This can be suppressed, in a number of ways.  You can limit the number of inferred BUFG's in synthesis to 0  (in the synthesis settings "-bufg = 0"), which is typically what I do.  I want to know exactly how many clock buffers are in the design, and why they are used, so I instantiate all of them.  

 

Bear in mind that net-splitting and driver duplication are timing driven.   If your timing margins are easy... nothing will happen to a high fanout net.   

 

 

 

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar helmutforren
Scholar
3,029 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

@jmcclusk,

 

Regarding F4, I see it's also menu option Tools / Schematic.  BTW, Tools / Show Hierarchy looks interesting.

 

While I may not want to wait for detail (unless I'm done for the night!), I don't see the detailed routing button that looks like two horizontal rows of pins connected by a green line.  Do you know the menu equivalent.  (In all my past software GUI's, I always have both a menu method for every tool bar icon or shortcut key, in case the user can't remember.)

 

Ack RE synthesis setting -bufg=0.  If I understand you, you mean you do it MANUALLY.  However, didn't you suggest I let Vivado do it automatically for me?  Please clarify what you suggest that *I* might do in THIS regard.

 

When I implemented with -directive Explore, I can select Netlist / Top / "rst" and press F4.  I see the schematic.  Yeah!  I see it going to a few places, including "rst_BUFG_inst".  I think that proves that "rst" really does drive "rst_BUFG", as assumed earlier.  The "rst" also goes to my huge second level module, when I know that inside that module gets "rst_BUFG" and not "rst".  Investigating... The Top schematic snippet had "rst" go to the "Q" input of second level module.  So I double click on that module and find Q.  That Q goes to...  Wait, on second attempt I double clicked pin rather than module in schematic.  That expanded only that pin path in the module, rather than the whole module.  Now I see both Top level and second level, for signal "rst" and it's alias "Q" inside second level.  In fact, it's showing *multiple* levels deep expanded inside second level, but only for "Q" and subsequent aliases.  Nice.  I see the full fanout there.  So indeed "rst" is going a couple dozen places, just like the white arrows I saw.  That's NOT, however, the bulk of the reset signal, which is handled by "rst_BUFG".  I'm sure if I get that schematic, I"ll see it ALSO going into second level module...  Yep!  Double click on that pin for second level module (the inner half of pin, not the outer half), and I get full expansion of all the places "rst_BUFG" goes at multiple levels.

 

INDEED: I see that the above is what you meant by looking at how it gets routed.  Thanks.

 

NEXT: I ran a subsequent build using "(* direct_reset = "yes" *)" as illustrated in the code below:

    (* direct_reset = "yes" *) reg [7:0] rstcounter = 8'hFF;   // Start reset counter at 255.  Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits )
    (* direct_reset = "yes" *) wire rst = rstcounter[7];       // Use high bit as rst signal 
    always @(posedge SYS_CLK) begin
        if (rst) begin
            // If rst bit is still set, count down
            rstcounter <= rstcounter - 1;
        end else begin
            // Otherwise, if rst bit has gone low, stop counting
        end
    end

After doing that, I can NO LONGER FIND "rst" in the Netlist???  I realize it's an alias of rstcounter[7].  I see TWO versions of that in the Netlist, "rstcounter[7]_i_1_n_1" and "rstcounter[7]_i_2_n_1". Doing F4 shows "rstcounter[7]_i_1_n_1" is an LUT3 .O pin going to an FDRE D pin, FDRE named "rstcounter_reg[7].  Ah ha???  Double click the Q pin of that "rstcounter_reg[7]" and I think I see a reset network.  When I highlight that network, it back-highlights net "rstcounter[7]" in the Netlist tab, that being detail underneath net "rstcounter (8)".  (Doing F4 on "rstcounter[7]_i_2_n_1" looks less interesting, part of down counter I guess.)

 

So, it appears that with "(* direct_reset = "yes" *)", my original "rst" net that I saw after the prior implementation got replaced with a new net named rstcounter[7].  That net goes to a bunch of places.  Almost all seem to go to similar pins:   FDRE(.R()), FDPE(.PRE)), FDSE(.S()) as well as a few LUT6's.

 

OOPS...  Net Properties / Connectivity / Net Delays vary from 249 to 6998.  This sounds like trouble to me.  I think I preferred the prior build WITHOUT the direct_reset.  Now that I better understand, I see in that prior build, Netlist / Net Properties / Connectivity / Net Delays, that there is a "rst_BUFG_inst".  I could have seen that earlier to confirm "rst" drives it.  Anyway, the net delay for it is 868ps.  Meanwhile, the longest net delay for "rst_BUFG" is 1848ps.  So those two net delays add up to 2716ps.  I still don't know the buffer delay or rise time.  Can I find that somewhere?  Do I need to?  Or just trust the timing report.  (Note, there are timing errors ELSEWHERE that I will deal with separately.)  Max "rst" net delay itself is 3514ns.  Oh, maybe I also need "rst" rise time as well...  maybe.  (Actually fall time for releasing reset!)

 

TIMING ERRORS:  Oh, in fact, on prior implementation without "(* direct_reset = "yes" *)" , I see multiple timing errors involving "rstcounter_reg[7]/C".  So Vivado did not magically make this work out for me by adding a buffer.  Now, however, I think I can click on this error and see those buffer times, maybe...?   Yes indeed.  Won't trouble you with analysis there.  HOWEVER, I think this might be where I was when I ***MANUALLY*** broke up that distribution.  Might I still need to do that?  Or might the direct_reset fix this...  nope...  timing errors similar on it.  I think I must stick with manual breakup of reset network????  YOUR ADVICE, @jmcclusk

 

 

 

0 Kudos
Scholar helmutforren
Scholar
2,926 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

I've looked at just one of those rst / rst_BUFG paths that was too slow.

 

In the image pasted further below, markers 1, 2, and 3 below show how this path is 0.188ns too slow.  That was the worst (slowest) one involving rstcounter_reg[7]/C.  That's not a big number to get around.  Marker 4 shows the falling edge delay of the "rst" and "rst_BUFG" signals, as well as the fanout delays.  Some manual break-up of these would get past them.

 

@jmcclusk this introduces a question.  Previously, I made a tree with multiple nodes at each level, generally consisting of a single rst3, it driving multiple rst2, each driving multiple rst1, each driving multiple rst0.  Then mulitple rst0 in multiple modules was used to reset the logic.  The design ensured that all of the rst0's were the same "age", the same number of clock cycles after the original rst3.  (3 clocks, in fact.)  It passed timing.  I still have that code.  In fact, my "current 325T" project is still like that.  Should I just keep it that way, replacing the top end proc_sys_reset() module with my rst_counter initialized at config time?  This would solve my original problem where proc_sys_reset() didn't assert until a while after config, letting some peripheral hardware get mis-configured, while also keeping my meets-timing solution of manually breaking up the rst tree.

 

rst net example slow path.jpg

0 Kudos
Scholar jmcclusk
Scholar
2,925 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

Ok..  first of all,  you  don't want to apply the direct_reset attribute on your counter..   that's not going anywhere..  next,  you want to buffer the reset signal with a couple of D registers before distributing it.   These registers should NOT have a reset function, but should be just plain old D type flip flops.   Then on the net that goes everywhere to the reset loads,  apply the direct_reset attribute.    You should be able to see this in the place and route now.    You should not need to use a BUFG to distribute the reset network, but there is a nifty trick available if you decide to use it.   I'll explain below.   If you've done this right, the routed design should show that the final register that drives the reset network has been duplicated N times, where N can be from 2 or 3 up to the high teens, depending on the timing requirements. 

 

BUFG reset tricks:   Using a split net with duplicated drivers works pretty well, but consumes a fair amount of registers and routing resources..    Another method of reset distribution is using a BUFGCE element.   A clock buffer with a clock enable.   This is how you connect it, and it only works if your clock is sourced from an MMCM or other clock source that can source a half rate clock.  Suppose your clock is running at 200 MHz, sourced from an MMCM or PLL.   You configure another output of the MMCM to produce 100 MHz, and then connect your BUFGCE input to that clock.    Then you connect the clock enable of the BUFGCE to a pulse generator on the 100 MHz clock domain that produces a single pulse (your reset), thus letting a single cycle of the 100 MHz clock slip through the clock buffer.    It sounds complicated, but the reason for doing this is that the 100 MHz clock is perfectly aligned with the 200 MHz clock, and since the clock trees pretty much have identical delays,  you can distribute the reset pulse to a huge area of the chip with almost constant delay, that coincidently, matches the clock delay at 200 MHz.   Thus for even very high speed clocks, timing problems on the reset don't really exist.  This can help with timing problems in other nets, since you don't need the fast routing paths for the reset.

 

 

Don't forget to close a thread when possible by accepting a post as a solution.
Scholar helmutforren
Scholar
2,918 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

Thanks, @jmcclusk, I'll let your "Ok.. first of all" post settle into my mind over night.

 

Did you see my more recent post with image and 4 red markers?

0 Kudos
Scholar jmcclusk
Scholar
2,916 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

To answer your question about your manually split reset network.   If it works, that's good, but it's highly unlikely that your reset subnets are optimally placed, and it's even possible that they will affect the placement of the logic so it can satisfy the timing needed to meet the setup times for the reset inputs.   It's my philosophy that you get the most performance out of the design when you make the router's job really easy, with highly pipelined logic, and easy to meet setup times on things like reset.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar helmutforren
Scholar
2,905 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

Hmmm...  My current rst is high immediately upon completion of configuration.  Call that time 0.  If I put a series of D flip flops without resets after that, then the output of the last D flip flop will be indeterminate for a couple of clock cycles before becoming 1.  That will mess up my peripheral hardware that needs the reset asserted beginning at time 0.  Prior to time 0, the FPGA output pin is high impedance, which also looks like a 1 to the peripheral hardware.  So the peripheral hardware sees a 1 immediately upon power application, and this 1 persists until after my FPGA reset releases.

 

Could I just use my prior code posted, setting direct_reset on "rst" but not "rstcounter"?

0 Kudos
Scholar jmcclusk
Scholar
2,903 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

The direct_reset attribute won't work on a split tree.   It will be blocked by the registers at the lower level.    But it's not a big deal to set initial values on a register so they are set to 1 on powerup.    See page 231 of UG901.   

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar markcurry
Scholar
2,901 Views
Registered: ‎09-16-2009

Re: Confused about reset timing

@@jmcclusk

 

Thanks for the references to the "direct_reset" attribute.  I wasn't even aware of it's existence.  Reading the documentation in UG901, however, I wonder just when it's a good idea to use...

 

My general experience is that attributes like these that force the tools hands may not always get the results one desires.  Without the attribute, you're letting the tool be flexible with directly tying to the reset pin, or allowing the tool to move the generation into the proceeding data path.  It's an optimization problem - let the tool do it's job right?  (I've no data to back this up, as I've not used the attribute yet.) I'm inclined to not go adding these to all my reset generators until I see a need for it.  But it's interesting to know that the tool is available - so again thanks for the pointer

 

@helmutforren

 

I've not digested all the details of your problems here, but I will say this - be wary of the advice wp272.  In my opinion much of the advise in that document borders on recklessness, and is not substantiated by hard data at all.  The whole reset local vs reset global was pretty much garbage when the document was released in ISE days, and is next to useless now in Vivado.

Even in ISE, boundary optimization and full hierarchy flattening was default.  Which basically caused the synthesizer to rip up one's carefully constructed "local reset" trees, and re-optimize them back to one source. 

 

In vivado it's even better - with phys_opt - fanout problems are handled with placement aware algorithms.  The tool's just going to do a much better job in building a tree than you can, in almost all cases.

 

In ISE, one could probably carefully craft a hand built tree - with careful use of the tools to prevent them from re-optimizing your hand crafted tree.  (But notice WP272 says nothing of how to do this).  In vivado, it's even harder to prevent the tool doing these optimizations - and mostly pointless, as the tools ARE good at doing this job.

 

And don't even get me started on the utterly made up numbers of "99.99%" you don't need a reset.  That's the truly reckless part of that document.  The loss in the "global" vs "local" reset issue is just lost engineering and tool time.  Being wrong with using/not using a reset?  Those are the types of troubles that show up in the field 1-2 years after a product has launched, and causing MAJOR heartache...

 

Regards,

 

Mark

Scholar markcurry
Scholar
2,898 Views
Registered: ‎09-16-2009

Re: Confused about reset timing


@jmcclusk wrote:

 But it's not a big deal to set initial values on a register so they are set to 1 on powerup.    See page 231 of UG901.   


Be wary here.  Relying on initial values is another piece of Xilinx advice given much too freely, without consideration.

 

It's not initialized to 1 at power up.  It'll be a 1 after FPGA configuration.  But do you know the rest of your design state after FPGA configuration - what about your clock?  If the clock is sourced from a PLL (either on board, or on FPGA via an MMCM), the PLL is likely NOT locked yet - meaning the clock is indeterminate.  Your "config time" init of a 1 is not going to last.  Every Xilinx model will have that FF go back to the X shortly later (because of the indeterminate clock).

 

Another source of trouble - your circuit may require a reset operation - but not a FPGA configuration.  A common example - any PCIE design must be sensitive to PCIE_PERSTN - it's required by the PCIE standard to be active reset until the PCIE_CLK is stable.

 

Clocks recovered from clock/data recovery cicuits (ie. GTx serdes) are going to be unknown at the end of configuration too.

 

Mixing up "FPGA Configuration" with "Reset" can get you into tricky trouble in many inconvenient ways.

 

Most reset synchronizers circuits will pass the ACTIVE edge of reset through to all logic asynchronously, and the insure the INACTIVE edge of reset is properly synchronized to the same clock (that the destination flop uses).

 

Helmet, I don't think you should be having trouble here with a Kintex 7 160T at 250 MHz (which is what I think I see for your clock rates).  We're doing similar designs, and are Reset HEAVY users (we reset just about everything after properly synchronizing the inactive edge of reset).  We're hands off, and just let the tools do their job, and haven't seen readily noticeable issues with our results in Vivado.

 

Regards,

 

Mark

 

 

Scholar helmutforren
Scholar
2,878 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

@jmcclusk has been very helpful in helping me to learn more about things I could do.  @markcurry I do also sincerely appreciate your input.  So I'll try to pose the newest questions in terms respectful of both of your advice.

 

1 - INFO) I have several dozen state machines, each coded in SystemVerilog.  For each, my code is mostly of the two-always block type, where state machine variables are initialized in the first always block, in the "if (rst) begin ... end" clause.  The "else begin ... end" clause contains the state machine variable advance from next value to current value.  I want all of these state machines to go to a known value at configuration as well as in response to an external reset button.  I am further contemplating adding a soft "reset" command to my serial port command line interface (PicoBlaze driven quiet successfully, by the way, @chapman; sincere thanks for the PicoBlaze platform).

 

2 - INFO) Originally, I had simply an active high "rst" variable that went everywhere.  There were no fancy attributes or anything else.  I was and continue to use Vivado 2017.1.  There were numerous timing violations, however, associated with the propagation of this "rst".  @markcurry, the tool was ***NOT*** automagically fixing this. 

 

3 - QUESTION) Why might that be?  Was I missing some preference to make it work?  Had I unknowingly set some preference to make it not work?

 

4 - INFO)  Meanwhile, I had read many comments about local resets, including WP272.  I didn't really understand what was meant by "local", or how local resets could be invoked from a single event -- such as my hard reset button or soft reset command --and I still don't understand.  Nevertheless, reading those and having the timing violations, I added the distributed reset tree that I described, starting with rst3 and going through rst2 and rst1 to get to rst0.  This FIXED my timing violations.

 

5 - INFO) However, I had been testing to date by configuring my KC705 325T 200MHz dev board via jtag.  When I programmed flash and tried configuring it on power up from flash, the reset didn't work properly.  My root "rst3" signal was NOT properly generated.  Upon inspection, I realized that at the time it was tied strictly to the hard reset button.  That circuit had no RC like the microprocessors I've used for literal decades.  So reset wasn't happening on power up config from flash and no manual button push.  Therefore, not yet knowing about initial values via SystemVerilog code like "reg rst3 = 1;", I searched around for any reliable reset source.  I found the Xilinx IP called proc_sys_reset().  I figured that they must be doing something inside to ensure proper reset after power on configuration from flash.  It worked.

 

6 - INFO) Later, my associate is working with our custom 160T target board.  It connects to peripheral hardware.  My control signals were going active for a moment after power up.  I found that this was because my rst3 signal was NOT YET active high at time 0 when config completes.  It only because active a moment later.  During this not-yet-reset period, who knows the value of things, and the peripheral hardware was accidentally activated.  Only then did I learn about the initial value setting via "reg rst = 1". Now, this 160T project as forked off from my 325T project **prior** to my adding the reset tree and the 3...0 suffices to the "rst" register.  Therefore, as soon as I simply added the initial value to rst (actually via SystemVerilog command "initial ... rst = 1;", through which I first learned of the ability), my control signals began behaving better.  A little more startup cleanup associated with fabric wrapping my PicoBlaze command line interface, and the control signals were perfect.  It was all about making sure that rst was truly active at time 0.  So I had learned that lesson.  In my associate's 160T project, I got rid of the proc_sys_reset() and instead used a version of "reg rst = 1".  

 

7 - PROBLEM) Now, however, I knew I had a problem.  That 160T project still had timing violations that we were ignoring for the moment, while testing the power-up behavior of the peripheral control signals.  I knew that some of those were due to the global propagation of "rst", that was NOT getting fixed successfully by Vivado.  It was indeed inserting a single BUFG, but that still didn't solve the timing violation, as I outlined one or two posts before now, on this thread.  Meanwhile, my 325T project had the distributed tree, where I knew for sure that all of my rst0 variables would go active at least 3 clock cycles late, which could be even longer depending on clock behavior (see KC705 reference design for 200MHz crystal clock).  My 325T project passed timing, but wouldn't do the reset correctly.  The 160T project did the reset correctly, but won't pass timing.

 

8 - QUESTION) So now, finally, how do I resolve the above?  How do I both insure rst is active throughout my project and throughout the chip beginning at exactly time 0 (completion of config), yet also meet timing for the release of rst?

 

9 - PROPOSAL) I had in fact already thought of one method, and @markcurry hints at it by **SEPARATELY** describing the ACTIVE and INACTIVE edge of rst.  Combine that with the thought of local resets out of config.  Below is my proposed code.  I think it does everything needed.  I realize the rst3...rst0 tree fights with the optimizer.  I will only do that portion if the optimizer continues to fail in getting these parts to meet timing.  

    // ONE GLOBAL rst.  
    // This will stay high for 127 clocks, getting past some slow peripheral power supply rise times.
    // The hope here is that Vivado will route "rst" throughout the chip in a smart, timing passing manner (even though it didn't originally).
    // If it doesn't, then I add a manual tree like I did before, starting with rst3 here and eventually getting to rst0 at each local module.
    // MEANWHILE, for the hard button or soft command reset, we can set rstcounter once again

    reg [7:0] rstcounter = 8'hFF;   // Start reset counter at 255.  Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits )
    wire rst = rstcounter[7];       // Use high bit as rst signal 
    always @(posedge SYS_CLK) begin
	// For all three reset cases:  Configuration, Hard Button, Soft Command.
        if (rst) begin
            // If rst bit is still set, count down
            rstcounter <= rstcounter - 1;
        end else begin
            // Otherwise, if rst bit has gone low, stop counting
        end

	// For two reset cases:  Hard Button, Soft Command.
	if (CPU_RESET || SOFT_RESET) rstcounter <= 8'hFF;
    end

    . . .

    // OPTIONAL DISTRIBUTED RESET TREE.  These instructions are piecemeal distributed in the code.  
    // I realize that this fights with the optimizer, but if the optimizer can't meet timing, I see no other choice.

    reg rst2, rst1, rst0;
    always @(posedge SYS_CLK) rst2 <= rst3;
    always @(posedge SYS_CLK) rst1 <= rst2;
    always @(posedge SYS_CLK) rst0 <= rst1;

    . . .

    // MULTIPLE DISTRIBUTED local_rst PER EACH OF MY MODULES, with one or more state machines in each module
    // The idea here is that, even if I must add a delaying tree to the global "rst" and I receive a delayed "rst0" instead of "rst", my "local_rst" here will still be active at time 0.
    // 		That is, local_rst will be active at time 0 and will self-hold itself for 7 clocks.  Those 7 clocks exceed the 3 clock time delay for the global reset active edge to get through the tree.
    // Therefore, local_rst will go active at time 0 and remain active when rst0 arrives, then rst0 will hold local_rst active until rst0 goes inactive.
    // Note that local_rstcounter is much, much shorter than the global rstcounter, so this code assumes local_rstcounter has finished its job before rst0 goes low.
    // 		Otherwise, local_rstcounter may have more countdowns to do, and local_rst will release more than one clock after rst0.  
    //		This might be OK, nevertheless, if all modules have the same code and behave the same, thus remaining synchronized in the release of reset.
    //		So put this code in a separate little module, invoked by every other module that needs it.
    // MEANWHILE, if there's a later hard button or soft command reset, rst0 will go high and the last command in the always block below will set local_rst high again, 
    //		with one clock cycle delay to both active and inactive edges.  Note that the one clock cycle delay of the inactive edge is the same as for the configuration reset case.
    //		Furthermore, during configuration reset, rst0 will ALSO be high, and that last statement will get invoked as well.  No harm.  The bit was already set anyway due to the inital value as well as prior if/else block.

    reg [3:0] local_rstcounter = 4'hF;	// Start reset counter at 15.  Will count down to 7 then stop, so config_rst duration is 8 clocks.
    wire local_rst = local_rstcounter[3];       // Use high bit as rst signal 
    always @(posedge SYS_CLK) begin
	// For all three reset cases:  Configuration, Hard Button, Soft Command.
        if (local_rst) begin  // If local_rst bit is still set, the CONSIDER count down
	    if ((local_rstcounter == 4'h80) && rst0) begin	// Then the next countdown will result in local_rstcounter[3] and therefore local_rst going to inactive zero.  But rst0 is still high, so do NOT decrement
		// Don't decrement because rst0 is still active high
	    end else begin
		// Go ahead and count down, either because we're not at 4'h80 yet, or because rst0 is no longer held
		local_rstcounter <= local_rstcounter - 1;
	    end
        end else begin
            // Otherwise, if rst bit has gone low, stop counting
        end

	// For two reset cases:  Hard Button, Soft Command.
	if (rst0) local_rst <= 1;
    end
0 Kudos
Scholar jmcclusk
Scholar
2,872 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

your solution will probably work fine, @helmutforren, but I think it will be worth your time to try the physical synthesis available with "phys_opt_design".  Once you get that flow working, it will give you superior results, timing wise. 

 

Yesterday, I was thinking about the difficulties in distributing a reset using a clock buffer, after looking closely at the clock buffers in the Ultrascale+ HDL library.   I've come to the conclusion that Xilinx needs to slightly modify the driver logic in the clock buffer to really do a nice job for reset distribution.  Sadly, it's not on the Marketing Requirements Document (MRD) when they invent a new FPGA family.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar markcurry
Scholar
2,661 Views
Registered: ‎09-16-2009

Re: Confused about reset timing

Some quick notes - in random order.

 

Again, I do NOT recommend generating / maintaining the local_rst at all.  Just use your global reset, AND run phys_opt_design.

 

    // OPTIONAL DISTRIBUTED RESET TREE.  These instructions are piecemeal distributed in the code.  
    // I realize that this fights with the optimizer, but if the optimizer can't meet timing, I see no other choice.

    reg rst2, rst1, rst0;
    always @(posedge SYS_CLK) rst2 <= rst3;
    always @(posedge SYS_CLK) rst1 <= rst2;
    always @(posedge SYS_CLK) rst0 <= rst1;

This is GOOD in all cases, and is NOT fighting the optimizer - it's helping.  It allows the optimizer to push the pipelined versions of the reset around the physical FPGA as needed.  The local reset you have lower is what's fighting the optimizer.

 

One note - however - talk about clock domains.  Your global rst is generated with "SYS_CLK" - is SYS_CLK free running from an oscillator (or similar?).  Does the global rst ONLY go to registers on the SYS_CLK domain?

 

From another angle - here's what we do - it's much simpler, and just works.  We have a global "clocks and resets" modules, it contains, among other things, logic to generate the system reset, and synchronize that system reset to each clock domain.  At the leaf level modules, we do NOTHING special - just use the reset (synchronously) on the same clock domain.

 

Within our "clocks and resets", we have a similar circuit to stretch the reset that you have.  But then we put that "system stretched reset" through synchronizers:

module sync_reset 
#( parameter STAGES = 2 )
(
  input wire clk_i,
  input wire reset_i,
  output wire reset_o
);

reg [ STAGES - 1 : 0 ] reset_i_D;
always @( posedge clk_i or posedge reset_i)
  if (reset_i)
    reset_i_D <= {STAGES{1'b1}};  // Assert on reset in
  else
    reset_i_D <= { reset_i_D, 1'b0 };

assign reset_o = reset_i_D[ STAGES - 1 ];

endmodule

Here, the "stretched reset" is the input.  The output reset, has it's inactive edge synchronized to it's respective clock.  Yes it's using an asynchronous reset flop.  That's ok - it's not as efficient in FPGAs, but it's only a small amount of logic - but we want the ACTIVE edge of reset passed through asynchronously. 

 

We instantiate the sync_reset module "N" times - for each clock domain.  The output of the sync_reset then goes to just about ALL logic on that clock domain (without any other special handling)

 

There's nothing else.  We run phys_opt_design, and haven't had any issues with logic running up to 250 MHz on Kintex, Ultrascale, and Ultrascale+. 

 

If we did run into fanout/timing problems - i'd look into @jmcclusk suggestions of using a BUFG for reset distribution.

 

Regards,

 

Mark

 

0 Kudos
Scholar helmutforren
Scholar
2,655 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

Continued thanks to both @jmcclusk and @markcurry.

 

For BOTH of you, you mention "phys_opt_design".  @jmcclusk says "Once you get that flow working".  @markcurry says "run phys_opt_design".  Well, I guess I need to make sure exactly what you two mean by that.  From my perspective, running Vivado 2017.1 on Windows 10, I see Flow Navigator / IMPLEMENTATION (right click) / Implementation Settings... / Project Settings / Implementation / Options / Description / Opt Design (opt_design)Ah ha.  For completeness I scrolled down.  I also see "Post-Place Phys Opt Design (phys_opt_design)".  I see "is_enabled" with a checkbox that is NOT checked.  Please confirm that you are in essence asking me to check that box.  WHEW!  I think I finally figured this one out.  Since it appears to have NOT been on, then my prior complaint about optimization not making my reset pass timing seems now moot.  I will restart that build in a moment with this checkbox on.  (Also, I am born American English speaker, but engineer, not English Major.  Verbage sometimes confuses me when it's not detailed exact.  @jmcclusk said "Once you get that flow working".  When I checked the box, thus selecting that line, the hint at bottom said "Optionally run this step as part of the flow."  Now use of word "flow" makes better sense.  It is an EXTRA STEP in the flow, not just a configuration of a step already in the flow.  Meanwhile, @markcurry said "run".  Same deal.  Now use of word "run" makes sense.  It did NOT make sense to me prior to seeing this hint.  Again, note that I thought "opt_design" I found before was the same as "phys_opt_design".  But now I realize it isn't.  I mentioned "opt_design" earlier, but neither of you noticed my error, LOL.)

 

Now on to @markcurry 's post preceding this one here...

 

@markcurry, if I do *NOT* insert my rst3...rst0 breaks in the tree, then my global reset should reach all places beginning from time 0.  Indeed, this makes the local reset logic unnecessary.  The local logic was only necessary when the tree stages caused a delay in the rising edge.  (Hmmm... I could have initialized rst2...rst0 with 1 as an alternative solution rather than the local stuff.  But hopefully that's moot now.)

 

The VAST MAJORITY of my reset goes to the same SYS_CLK domain.  I do have a few other domains, however.  I was going to address those later.  I already have methods to go through dual FF's with a metastable constrained signal between.  (LOL, like somebody else like me would understand that last sentence, not already knowing this solution.  I'm guilty of it, too!)  I was probably going to feed rst through there as well.  Those do NOT have the external hardware constraint that they have to be active at time 0, so that dual FF solution would probably be fine.  But if I wanted them to be active at time 0, I would probably do something similar to my recently proposed code for local_rst.

 

I need to prepare for a meeting.  After the meeting, I will focus on @markcurry 's module sync_reset().  I'll post an additional reply after this, since I don't have time to add it on to this reply.

 

A build with "phys_opt_design" enabled in the implementation settings is running now.  (Hey, I like that underlined language for being more clear about it.)  This also has direct_reset removed.  Code below:

    reg [7:0] rstcounter = 8'hFF;   // Start reset counter at 255.  Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits )
    reg rst = rstcounter[7];
    always @(posedge SYS_CLK) begin
        if (rst) begin
            // If rstcounter[7] bit is still set, count down
            rstcounter <= rstcounter - 1;
        end else begin
            // Otherwise, if rstcounter[7] bit has gone low, stop counting
        end
    end

 

 

------------------------------------------------------------------------------------------

 

 

 

 

For completeness, note that prior to this post, I had just completed a test build with this code below.  It tried to follow @jmcclusk 's advice.  I put the direct_reset on STRICTLY the "rst" register.  I tried to put two FF's before rst.  Indeed, F4 showed me the schematic for rstmid and it was between two FDRE's.  Also, for "rst" the F4 showed me the schematic, and double clicking on an inner sub-module pin expanded everything.  it *ALL* appears to be going to the R pin of all the flip flops on all levels, just like @jmcclusk said.

 

This means that the code below implemented, I believe, @jmcclusk 's advice properly and accomplished [his/her] goal.  HOWEVER, based on @markcurry 's advice, I'll try without direct_reset instead, using how I now understand phys_opt_design.

    // NOTE I AM *NOT* GOING TO STICK WITH THIS CODE.  READ THE POST TEXT ABOVE THIS CODE BLOCK!

reg [7:0] rstcounter = 8'hFF; // Start reset counter at 255. Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits ) reg rstmid; (* direct_reset = "yes" *) reg rst; always @(posedge SYS_CLK) begin if (rstcounter[7]) begin // If rstcounter[7] bit is still set, count down rstcounter <= rstcounter - 1; end else begin // Otherwise, if rstcounter[7] bit has gone low, stop counting end rst <= rstmid; rstmid <= rstcounter[7]; end

 

0 Kudos
Scholar helmutforren
Scholar
2,647 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

I still need to focus on @markcurry 's module sync_reset(). 

 

In the time passed, however, my first run with implementation setting "phys_opt_design" enabled.  Interesting results.  The source code for generating "rst" is below.  See further below for more text from me.

 

 

    reg [7:0] rstcounter = 8'hFF;   // Start reset counter at 255.  Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits )
    reg rst = rstcounter[7];
    always @(posedge SYS_CLK) begin
        if (rst) begin
            // If rst (aka rstcounter[7]) bit is still set, count down
            rstcounter <= rstcounter - 1;
        end else begin
            // Otherwise, if rstcounter[7] bit has gone low, stop counting
        end
    end

 

When I open the implementation and look at the Netlist, "rst" no longer appears in the Top module.  The closest I can get is "rstcounter[7]_i_2_n_1".  F4 shows a simple schematic with that signal feeding an LUT2 named "rstcounter[6]_i_1" (part of the countdown logic I guess) and "rstcounter[7]_i_1" (which is an LUT3 receiving signal at I1 and having an unconnected O pin).  When I double click on the O pin to see more circuitry, and zoom out, i see MAGIC!  This signal properties say its name is "rstcounter[7]_i_1_n_1".  Yep, that was in the list above the "rstcounter[7]_i_2_n_1" I started from.  The MAGIC is that it now feeds 7 different FDRE's with names "rstcounter_reg[7]" followed by "rstcounter_reg[7]_replica" through "rstcounter_reg[7]_replica_5".  Indeed, this must be the phys_opt_design magic.  Fabulous.

 

HOWEVER.  I look at timing errors.  There are many involving "rstcounter_reg[7]_replica_*".  It appears that this did NOT automagically make sure timing passed.  Darn! (please don't let the large attention getting font upset you.  I'm not "yelling" or angry.  I'm just making sure a busy person notices it rather than not notices, buried in my very long writing.)

 

In other words, this won't work as is.  So I'm looking at the timing detail below.  Hmmm...  ok.  Under "Source Clock Path", the "rstcounter_reg[7]_replica_2/C" must be the CLOCK pin of the FDRE I saw for replica 2.  Understood.  This means that under "Data Path", the"rstcounter_reg[7]_replica_2/Q" is the Q pin of that FDRE, and before fanout it's seen at 4.396ns.  This then appears to go through a BUFG in order to fan out 3864 times.  It then gets to a LUT1 that drives "rx_mmcm_lckd_i_1/O".  I realize I blacked out that module.  But it does not have such a signal name.  That signal name is in 3 *other* modules, not the one named.  Whatever.  Chalk it up to a little bug in Vivado or misunderstanding in me.  Through a couple assign renames, rx_mmcm_lckd does indeed exist inside of an "if (rst) begin ... rx_mmcm_lckd <= 0; ... end else begin ... rx_mmcm_lckd <= 1; ... end" block.  Note there's not a real MMCM here, so this code is simply setting the locked flag as soon as we exit reset.  Anyway, there is indeed an intended path from rst to rx_mmcm_lckd.  

 

Next signal in the Data Path is "local_data_reg".  That signal *IS* in the module named that I blacked out.  This must be bit 0 of a structure "local_data".  (That is, I have "structurename local_data;" declared.  So the point here is that we must have just been optimized and passing through rx_mmcm_lckd, and now we're at this local_data.  Well, it doesn't exist in the "rst" clause -- perhaps a bug in my code -- but it does exist in the ELSE clause of the "rst".  So there's the tie-in.  

 

Bottom line, "rst" indeed goes to "local_data", and the path is TOO SLOW.  The optimizer did NOT fix the timing problem.

 

If the phys_opt_design can't fix the timing problem, don't I have to do it myself?

 

rst replica example slow path.jpg

 

 

0 Kudos
Scholar markcurry
Scholar
2,645 Views
Registered: ‎09-16-2009

Re: Confused about reset timing

Why is there a BUFG in the data path within your timing report?

 

There should only be BUFGs on the src / dst flip flop clocks.

 

Are you trying to use a BUFG to generate your reset tree?

 

Regards,

 

Mark

0 Kudos
Scholar markcurry
Scholar
2,640 Views
Registered: ‎09-16-2009

Re: Confused about reset timing

Also, remove whatever logic you have at the output of the global reset.

The global reset you generate should go to a FF reset pin, and no where else.

 

In your report, the reset is shown as going through a BUFG (don't do that) THEN a LUT1 (perhaps to invert it?), before arriving at the destination FF.

 

Remove both of these, and you should have better results.

 

Regards,

 

Mark

0 Kudos
Scholar jmcclusk
Scholar
2,639 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

And this is why I don't let Vivado infer BUFG's when it feels like it.    I always go into synthesis properties and set -bufg to ZERO.

 

One other point, here.   It's messy to replicate the most significant bit of a counter.   It's much better to buffer the reset with a couple of clean D registers.  These can be duplicated easily.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar helmutforren
Scholar
2,630 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

@markcurry, I did not explicitly place the BUFG in the "Data Path".  That must have been inserted by Vivado.  Thanks to @jmcclusk I know to use F4.  I see FDRE named "rstcounter_reg[7]_replica_2" with "Q" output going directly and strictly to a BUFG named "p_0_in_BUFG_inst".  The O from that BUFG goes to a bunch of usage places.  Wierd.  It ALSO loops back to the CE of the rstcounter_reg[7]_replica_2.  This feels like part of my counter.  Maybe I have to buffer away from the counter as @jmcclusk if not simply remove it.  I think I'll buffer away akin to @jmcclusk recommendation and rebuild.  I believe the code below will do this.  I ensure rst is low at time=0 by initializing it and rstmid1 and rstmid2 to 1.  I separate from the counter by using the else clause of counter bit 7 to set rstmid1 to 0.  Signal rstmid1 might still be tied to counter logic since it's in the else clause.  So the additional rstmid2 and rst should give clean cascading FF's.  THEN maybe we'll see if the BUFG gets inserted after the FF's by Vivado, or if it was there because of the counter loopback.

    reg [7:0] rstcounter = 8'hFF;   // Start reset counter at 255.  Will count down to 127 then stop, so rst duration is 128 clocks (that's 2^(N-1) for N=8 bits )
    reg rstmid1 = 1;
    reg rstmid2 = 1;
    reg rst = 1;
    always @(posedge SYS_CLK) begin
        if (rstcounter[7]) begin
            // If rstcounter[7] bit is still set, count down
            rstcounter <= rstcounter - 1;
        end else begin
            // Otherwise, if rstcounter[7] bit has gone low, stop counting
            rstmid1 <= 0;
        end
        rstmid2 <= rstmid1;
        rst <= rstmid2;
    end

@markcurry, you write "Also, remove whatever logic you have at the output of the global reset.

The global reset you generate should go to a FF reset pin, and no where else."  Is that question moot given my paragraph above, or does it still stand?  I think maybe my paragraph and code above accomplish what you suggest.

 

@markcurry, you ask me to remove the BUFG and a LUT.  I didn't put the BUFG in on purpose.  Discussed earlier.  The LUT is indeed from an inversion.  I can remove the inversion.  Done.

 

NEVERTHELESS, was Vivado going to try different strategies to make timing?  What if no matter what, it still can't make timing?  Doesn't that leave me having to manually fix things?  I'm starting to feel @jmcclusk's pain and thinking about setting the synthesis property -bufg to zero.  (BTW, I understood your sentence perfectly this time.  I know where to look for that flag.  I see it.  Hmmm...  it's currently 12.  Might I actually need to INCREASE it so that Vivado can solve my problems?  I'll try that.  Setting to 24 now.)

 

new build running...  still need to focus on @markcurry example code...

 

0 Kudos
Scholar jmcclusk
Scholar
2,624 Views
Registered: ‎02-24-2014

Re: Confused about reset timing

@helmutforren   Hmmm...  it's currently 12.  Might I actually need to INCREASE it so that Vivado can solve my problems?  I'll try that.  Setting to 24 now.)

 

NO!!!     Less is More!!  LOL...  I'm laughing at what Vivado will do to your design now.    Remember, Vivado is simultaneously smarter than any human about certain things (like placement, and routing)... and dumber than a bag of hammers about others.  

 

you will see.   set -bufg to zero, and Vivado won't do stupid things like inserting a BUFG on your reset network.   However, doing this may require the actual insertion of BUFG cells in your code for your actual clocks.   I always do this anyway.    

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Scholar markcurry
Scholar
2,621 Views
Registered: ‎09-16-2009

Re: Confused about reset timing

I didn't think that Vivado would ever infer a BUFG on anything other than a clock.  That behavior is confusing me.

 

I'm quite confident getting rid of the BUFG (and the inverter) will solve your problems. 

 

That's the thing with using BUFGs for resets - BUFGs have huge latencies - which was killing your single cycle timing results.  When people solve fanout problems by using a BUFG, then they need to account for the high latency - usually with a multi-cyle path.

 

Regards,

 

Mark

0 Kudos
Scholar helmutforren
Scholar
2,602 Views
Registered: ‎06-23-2014

Re: Confused about reset timing

It's interesting. Your advice, @jmcclusk and @markcurry, is both converging and diverging at the same time.  I'll explain in a moment.

 

For now, know that I'm going to try both directions regarding synthesis setting for -bufg.

 

About the divergence... 

@markcurry doesn't speak of -bufg and therefore implies leaving it non-zero, perhaps the default of 12.  You further suggest that getting rid of the inverter will solve my problems.  Well, it may indeed on the specific example path.  However, Vivado in general has already shown that it is unable to automatically make all the reset timing work.  If it could, it would have reorganized how it broke up the path so that this particular timing issue wouldn't have happened.  For example, many of the reset loads are driven directly by the original rst, while others are driven via the BUFG.  It could have moved this worst one from the BUFG sub-tree to the root sub-tree.  Yes, I realize it may have already done that many times already and had to give up on diminishing returns.  But if it's going to have to give up, then it does NOT have a strategy that will always work.  My one inverter and thus BUFG in this case is just one case.  There will be hundreds of other cases and strategy failures.

 

I'm building without the BUFG now in the middle of the night.  I think I forgot to kick it off before stopping work.  We'll see what pops to the top of the timing violation list next.  I know there were multiple reset related ones last time, so it will be reset related.

 

About the convergence...

Meanwhile, @jmcclusk, if I don't let Vivado insert BUFG's in my reset path, then how is my reset path ever going to pass timing?  You haven't said (recently) explicitly.  Please suggest something explicitly.  I am guessing you'll say -- or repeat, sorry if I don't recall our complete exchange here; I've gone around in circles so many times) -- I'm guessing you'll say that I need to manually break up the reset tree.  But perhaps you've spoken against that in the past.  So how, please?

 

At the same time, @markcurry, you mention that BUFGs have huge latencies and that this is usually solved with a multi-cycle path.  Well, I believe my rst3...rst0 breakup of the reset tree is exactly that, a multi-cycle path.

 

If @jmcclusk is indeed suggesting to manually break up the reset tree, and @markcurry has implied it as well just now, then you have converged on your advice.   I already have the tree broken up, and it has most recently passed timing.  Out of all of this, all I would need to modify on my "current 325T" project is to replace that proc_sys_reset() origin of reset with some flavor of the initialized counter code I've posted here, then also make sure my rst2...rst0 registers are initialized to 1.  (They weren't initialized to anything before and left a post-config ambiguity.  I tried to fix that once with local_rst.  Then I realized I could fix it with initializers.)  The result would be a future strategy of always doing it this way.  (Darn, I still need to study @markcurry's example code.)

 

@markcurry's sync_reset() code

OK.  I studied the code finally (pasted a ways below).  It's actually conceptually identical to what I've already posted a few times, the only difference being the asynchronous response to rising reset_i via including "posedge reset_i" in the always condition.  Therefore, it doesn't add much new.  In my application, I would need reset_i_D initialized to all 1's so that it would assert reset at time 0 after configuration.  Even an async application of reset_i that was 1 at time 0 would allow some time delay before this sync_reset() code produced a known (active high) value.

 

When you say "We instantiate the sync_reset module "N" times - for each clock domain", I think you mean ONCE for each clock domain.  How many different clock domains do you typically use?  Realize that if you use enough clock domains, then your instantiation of sync_reset module N times is in essence a manual break-up of the reset tree.  This puts you and me in the same conceptual design place -- break up that reset tree for the purpose of meeting timing -- even if you aren't fully realizing this fact.  If this is not the case, then I explicitly ask how can one get around timing problems with large projects in one clock domain, anyway? 

 

Yes, I do think about @jmcclusk mentioning explicitly adding BUFG's to a clock, and I do think about intentionally breaking my project up into separate clock domains anyway.  Most of my modules connect via FIFO's.  I'm already using dual-clock FIFO's to bridge a couple clock domain boundaries.  Just do more of that.  This could, in fact, give a bunch of relief to timing (although the single clock FIFOs already help with that) as well a relief to routing and placement.

 

 

Anyway, bottom line, I don't think your code example @markcurry actually helps.  There are implications or assumptions associated with it as I've described.  It is the answer or response to those that will be of help. 

 

Hey, I gave you a hard time in this section, @markcurry.  No offense intended.  It's all about the engineering and trying to communicate concepts.

 

Finally, to make this post easier for others to read, I'm going to repeat @markcurry's code right here:

module sync_reset 
#( parameter STAGES = 2 )
(
  input wire clk_i,
  input wire reset_i,
  output wire reset_o
);

reg [ STAGES - 1 : 0 ] reset_i_D;
always @( posedge clk_i or posedge reset_i)
  if (reset_i)
    reset_i_D <= {STAGES{1'b1}};  // Assert on reset in
  else
    reset_i_D <= { reset_i_D, 1'b0 };

assign reset_o = reset_i_D[ STAGES - 1 ];

endmodule
0 Kudos