cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
474 Views
Registered: ‎05-15-2014

State machine bug

Jump to solution

I just ran into some interesting things.  I'm trying to debug a state machine that is controlling some DSP blocks.  It is synchronous and sequential and should be fairly straightforward.  I created a minimalist version for testing.  Even the minimalist version gets stuck in certain states. Before diving in a couple of observations are in order.  I build the states based on gray code, thinking that it might minimize logic and transition problems.  It appears that doing so is a total waste of time, because the synthesis will reconstruct it as it sees fit.  At one point it developed its own code, somewhat line gray code, but apparently optimized to the particular state changes in question.  In another case, it converted it to 1 hot form (one FF per state).  In both cases, I actually had output ports to the state machine for debugging purposes--it simply threw in some LUTs to translate from its internal coding to what I had specified--strictly for use with my output port.

Now for the bug.  First the minimized source code, which is wrapped in an always block with a clock edge (and nothing else) in the sensitivity list:

case (fstate)
s0: begin
fstate <= s1;
end
s1: begin
fstate <= s2;
end
s2: begin
fstate <= s3;
end
s3: begin
fstate <= s4;
end
s4: begin
fstate <= s5;
end
s5: begin
fstate <= s6;
end
s6: begin
fstate <= s7;
end
s7: begin
fstate <= s8;
end
s8: begin
fstate <= s9;
end
s9: begin
fstate <= s0;
end
default: begin
fstate <= s0;
end

(unfortunatle this editor stripped out all the indentation tabs)

The implementation was one hot.  The implementation schematic consists of 9 FFs in a circular chain.  FF[0] is initialized to 1 and the others are initialized to zero.  Here is where I believe that the synthesis does not properly reflect the design.  The default clause is designed to catch cases where it is not in one of the ten defined states.  For a 1 hot state machine that would be any time there isn't exactly one FF high (and ALL the others low).  The most obvious case being if the 1 that is circulating through the ten FFs were to get dropped.  All would be zero and it would be stuck in an invalid state.  The default clause should mandate that a check be made for this and force a 1 into FF[0].

Beyond that defect, the logic, for some reason doesn't run.  I don't know if it is because of a startup meta-stability or what, but it remains stuck in state 0.  The decoding back from one hot to the gray code output does not properly address invalid states, so the meaning of being stuck in state 0 is ambiguous.  All it is is an n input OR gate such that if any of the states that would contribute to that bit are on, the bit is on.  If more than one bit is on, the output would still be on.  OTOH of no bits are on, no outputs are on, which reads out as state 0, when it isn't actually in state 0.  One could minimize this by not having a state valued as 0, and if multiple bits are on, presumably the output would be a different value, possibly an out of range one, but there is no guarantee.

So I'm left with three problems, one of which would be a non-issue if not for the other two:

    *  One hot transformation does not properly address the default clause of the case statement, which I regard as a blatent synthesis bug.  It is true that it shouldn't get in that state, but good defensive programming (even for hardware development) requires testing and recovery mechanisms which are not present.

    * My code above does not work as expected.  I see no reason for it.

    * (the non issue) Since the decode logic does not accurately reflect the source code for illegal conditions, I am unable to determine why the above code does not result in synthesis and implementation that work.  My guess is that the startup conditions somehow allowed the 1 bit that should be circulating to be lost.  However, since no bits on represents zero output, I have no way of verifying that.

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
432 Views
Registered: ‎06-14-2018

Re: State machine bug

Jump to solution

Hi @whelm 

This is way FSM optimisations are done. Default is optimized to save area, power. This also gives liberty to tool to re-encode FSM in most optimized way.

 If FSM is rencoded in one_hot LSB flop should be FDSE, so INIT value is 1 for that flop. This implies FSM in known state. Could you please re-check that ?

Also could you post sequential bloc of FSM.

 

In any case if you want logic that takes FSM from unknown state to known state there are few ways to do it. You may use fsm_safe_state attribute.

“auto_safe_state”: Uses Hamming-3 encoding for auto-correction for one bit/flip.


• “reset_state”: Forces the state machine into the reset state using Hamming-2
encoding detection for one bit/flip.


• “power_on_state”: Forces the state machine into the power-on state using
Hamming-2 encoding detection for one bit/flip.


• “default_state”: Forces the state machine into the default state specified in RTL:
the state that is specified in “default” branch of the case statement in Verilog or the
state specified in the others branch of the case statement in VHDL. For this to work, a
default or “others” state must be in the RTL.

Thanks,

Ajay

View solution in original post

0 Kudos
8 Replies
Highlighted
Xilinx Employee
Xilinx Employee
433 Views
Registered: ‎06-14-2018

Re: State machine bug

Jump to solution

Hi @whelm 

This is way FSM optimisations are done. Default is optimized to save area, power. This also gives liberty to tool to re-encode FSM in most optimized way.

 If FSM is rencoded in one_hot LSB flop should be FDSE, so INIT value is 1 for that flop. This implies FSM in known state. Could you please re-check that ?

Also could you post sequential bloc of FSM.

 

In any case if you want logic that takes FSM from unknown state to known state there are few ways to do it. You may use fsm_safe_state attribute.

“auto_safe_state”: Uses Hamming-3 encoding for auto-correction for one bit/flip.


• “reset_state”: Forces the state machine into the reset state using Hamming-2
encoding detection for one bit/flip.


• “power_on_state”: Forces the state machine into the power-on state using
Hamming-2 encoding detection for one bit/flip.


• “default_state”: Forces the state machine into the default state specified in RTL:
the state that is specified in “default” branch of the case statement in Verilog or the
state specified in the others branch of the case statement in VHDL. For this to work, a
default or “others” state must be in the RTL.

Thanks,

Ajay

View solution in original post

0 Kudos
Highlighted
Teacher
Teacher
411 Views
Registered: ‎07-09-2009

Re: State machine bug

Jump to solution
The way to approach logic design is to let the tools do what they can.

State machines are a typical example of where you can not do better than the tools.

My advise would be to code the machine in the easiest way for you to read, not worry about one hot, hamming distance etc.

http://arco.esi.uclm.es/public/doc/tutoriales/Xilinx/old_training/HDL%20Coding%20Techniques/basic-hdl-coding-techniques-part2.ppt

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Explorer
Explorer
365 Views
Registered: ‎05-15-2014

Re: State machine bug

Jump to solution

That all makes sense.  The only thing I would challenge is that if the case statement has a default clause, that should infer fsm_safe_state.  That is generally the reason people use default cases.  Basically the synthesis totally ignored the default case.  So why put it in?  

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
351 Views
Registered: ‎08-13-2007

Re: State machine bug

Jump to solution

I remember being surprised as well 15+ years ago that Synplify didn't cover the "when others" statement in a case statement unless I used a syn_encoding attribute of "safe"... I got into the habit of adding this when I needed the default/return logic actually included for recovery reasons (glitching clock, etc.)

More info here:

http://digsys.upc.es/csd/P06/designing_safe_vhdl.pdf

 

Vivado synthesis has a FSM_SAFE_STATE attribute as well:

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug901-vivado-synthesis.pdf

see page 52

 

Cheers,

bt

 

Highlighted
Scholar
Scholar
301 Views
Registered: ‎09-16-2009

Re: State machine bug

Jump to solution

Also, be aware of a Vivado 2018.* FSM bug:

https://forums.xilinx.com/t5/Synthesis/Vivado-2018-x-and-Verilog-case-statements-looks-like-an-errata/m-p/970365/highlight/true#M30979 

As I allude to over in that thread, I think the whole "SAFE" FSM stuff is mostly fluff.  Sure it allows a state machine to (potentially) recover from some low probabilty event (clock glitch, power drop, radiation, whatever).  But making sure just those very few flip-flops in a FSM (as identified by the tool) are more fault tolerant, when the vast majority of the rest of the FFs in the design don't have such fault tolerance?  We're talking < 1% : 99% ratios here.  I guess it's an easy an cheap thing to do in the FSM optimization, so mostly free, but do I worry (heck even think) about it?  Not really.

And Vivado's been (even without disabling the FSM tool as in the above thread) rather hit-and-miss with even identifying my state machine.  These days it's mostly a don't care for me if the tool identifies the FSM or not, whether it changes the encoding or not.  As long as it meets timing and formally matches my intent (as described by the RTL) I don't care what it does...

Reiterating @drjohnsmith advice - code the FSM for clarity, and don't worry about the rest.

My 2 cents.

Regards,

Mark

Highlighted
Explorer
Explorer
286 Views
Registered: ‎05-15-2014

Re: State machine bug

Jump to solution

Your point is well taken.  In my case I'm under fail-safe guidelines and a default case is mandatory, so I have to make sure it actually does something useful.  The one thing that sets it apart from other FFs getting flipped is that in most cases those FFs will exhibit a transient malfunction until they are reloaded, but in this case it will not recover until power cycled (or at least reset or new bit file loaded in).

How this came up was that the design simulated perfectly, both behavioral and implemented timing (and passed all timing constraints), but would not run.  I traced it down to a state machine stuck in an undefined state (in one case a on hot state machine with all FFs low, which isn't readily detectable if there is a state 0).  Adding fsm_safe_state and a reset to the design eliminated the problem.

My hypothesis is that the PL clock in question powers up at 1 GHz and is programmed to the correct frequence (~200 MHz) by the code running in the PS.  So when the PL starts, a lot of timing violations are occurring.  The others are don't cares, because it isn't collecting or using data at that point, but this one was consistently resulting in failure.

 

0 Kudos
Scholar
Scholar
278 Views
Registered: ‎09-16-2009

Re: State machine bug

Jump to solution

@whelm wrote:

My hypothesis is that the PL clock in question powers up at 1 GHz and is programmed to the correct frequence (~200 MHz) by the code running in the PS.  So when the PL starts, a lot of timing violations are occurring.  The others are don't cares, because it isn't collecting or using data at that point, but this one was consistently resulting in failure.


You probably already know this but it's worth mentioning for anyone reading along.  PLL output clocks should have a real reset - based off the PLL lock indicator - feeding ALL downstream logic of that generated clock.  Synchronize the inactive edge of said reset with the respective PLL output clock.  One's just asking for trouble in not reseting such logic, and depending on GSR or other "Configuration time" resets only.  

Regards,

Mark

Highlighted
Explorer
Explorer
272 Views
Registered: ‎05-15-2014

Re: State machine bug

Jump to solution

Good point.  In this case I'm not using the PLL/DLL.  The clock source is an external oscillator driving an ADC, which in turn supplies the PL clock.  Unfortunately it is factory programmed to start at 1 GHz, which is way beyond the ADCs capability as well as beyond what the PL can manage, but until the PS can execute the code to configure it over I2C, it is there.  I need to make sure any logic that could be disrupted by it gets reset or otherwise in a valid state once it is set properly.  And in case anyone is wondering, there are four separate clock frequencies (all near 200 MHz) that are needed, which is why this particular arrangement was used.  And as for the internal PLLs, they are orders of magnitude too noisy for a high speed 16 bit ADC.  They are fine for most digital purposes, but I'm looking at 90 fs jitter and phase noise of -129 dBc/Hz at 1 kHz offset (increasing to -162 at 20 MHz).  That's typically only available from a quartz oscillator.  This is the only programmable device I know of that can do that.

0 Kudos