cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Participant
Participant
1,944 Views
Registered: ‎09-26-2016

About one-hot FSM encoding

Hi,

 

I would like to ask a question regarding FSM encoding, in case if I set "-fsm_extraction" setting in Vivado to "one hot". Let say that I define FSM states as an enum:

 

 

enum
{
Idle,
State_1,
State_2,
State_3
} State;

And my sequential always block where next state is assigned looks something like this:

 

 

always_ff @(posedge CLK or negedge RST)
{
  if(!RST)
    State <= Idle;
  else
    case(State)
      Idle:    State <= In1 ? State_1 : Idle;
      State_1: State <= In2 ? State_2 : State_1;
      State_2: State <= State_3;
      State_3: State <= Idle;
    endcase
}

Does synthesizer actually creates enum variables in one-hot manner and does it actually compare just a single bit for the case statement?

 

 

Or do I have to define enum and than write case statement as shown below: Even if "-fsm_extraction" is set to "one-hot".

 

enum
{
Idle = 4'b0001,
State_1 = 4'b0010,
State_2 = 4'b0100,
State_3 = 4'b1000
} State;
case(1'b1)
State[0]: State <= In1 ? State_1 : Idle; State[1]: State <= In2 ? State_2 : State_1; State[2]: State <= State_3; State[3]: State <= Idle; endcase

I am interested especially in FSM speed.

 

Thank you.

 

 

 

 

0 Kudos
7 Replies
Highlighted
Mentor
Mentor
1,927 Views
Registered: ‎02-24-2014

If you want higher speed,  leave the enumeration as a symbolic list, and don't assign one-hot state values to the states.  I also recommend that you use sequential encoding, as this almost always produces higher speeds for reasonably complicated state machines.   

 

(* fsm_encoding = “sequential” *) reg [7:0] my_state;

Don't forget to close a thread when possible by accepting a post as a solution.
Highlighted
Participant
Participant
1,892 Views
Registered: ‎09-26-2016

Hi @jmcclusk:

 

Thank you for your reply. "Sequential" is meant here as a normal binary counter, or anything else?

 

So, if understand, you suggest to define states as a parameters (or localparams) as:

localparam Idle    = 2'd0;
localparam State_1 = 2'd1;
localparam State_2 = 2'd2;
localparam State_3 = 2'd3;

(* fsm_encoding = “sequential” *) logic [1:0] State = Idle;

 Ok, this example is the most trivial as it can possibly be, but is this the correct way?

 

Enumerators are quite handy, because if you add another state into the FSM, you don't have to bother with state values.

 

Thank you.

0 Kudos
Highlighted
Mentor
Mentor
1,859 Views
Registered: ‎02-24-2014

Your example looks fine.  In SystemVerilog, no values are needed for the states, they can be purely symbolic.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Highlighted
Observer
Observer
259 Views
Registered: ‎03-21-2011

https://www.xilinx.com/support/documentation/sw_manuals/xilinx10/isehelp/pp_db_xst_hdl_synthesis_options.htm

Not sure where the idea that one-hot is slower than sequential, even xilinx's own documentation disagrees

One-Hot

Ensures that an individual state register is dedicated to one state. Only one flip-flop is active, or hot, at any one time. One-hot encoding is very appropriate with most FPGA targets where a large number of flip-flops are available. It is also a good alternative when trying to optimize speed or to reduce power dissipation.

"

0 Kudos
Highlighted
Guide
Guide
250 Views
Registered: ‎01-23-2009

Not sure where the idea that one-hot is slower than sequential, even xilinx's own documentation disagrees

(I presume you mean "highly encoded" rather than sequential - there are multiple "highly encoded" state machine formats)...

So, it's not that "one-hot is (necessarily) slower than encoded" - its more the opposite. It is (or was) commonly "understood" that one-hot was faster than highly encoded. However, over the years, there is a fair amount of reason to believe that this isn't always the case - even though statements like the one you quoted pop up in lots of Xilinx manuals...

I will try and give some kind of feeling as to why...

The reason given for one-hot being faster than encoded is that it is "easier" to decode the current state; to decode "am I in state_1?" you only need to look at the state_1 bit. So for any operation that is a combination of "this input while in state_1", this becomes a simple AND gate. In ASIC technology, this is great - a single AND gate is faster than more complex logic.

But FPGAs are not built with simple gates; they are built with LUTs - the newer technologies use 6 input LUTs. Thus doing decoding is equally fast if the number of inputs required to decode the code is one bit (as it would be with one hot) or 6 bits (as it would be with a highly encoded state machine with up to 64 states [not that I am recommending state machines with 64 states]).

As we take this further, lets say you want to check for "I am in any of state_1 to state_7". In a one-hot state machine you need to combine 7 one-hot state bits (OR them together). This can't be done in one LUT - it needs at least two in series. Conversely, if the state machine is highly encoded, then this decode is combinatiorial logic based on a much smaller number of state bits; even with a 64 state state machine, this can be done in one LUT. Therefore this is faster!

So, all we are saying is that it isn't always true that one-hot is faster than highly encoded (I personally believe that highly encoded tends to be faster most of the time).

The good news is that we don't necessarily need to know the answer to this question.... That's what the synthesis tool is for. Since the synthesis tool can do state machine extraction and state machine encoding conversions, it can decide what the fastest encoding for your state machine is. If it determines that one-hot is faster, then it will use one-hot. If not, it will use one of the highly encoded ones.

Avrum

Highlighted
Observer
Observer
201 Views
Registered: ‎03-21-2011

Yeah, with bigger LUTs the threshold at which one-hot vs binary/gray encoded matters goes up.  If a state machine has 11 states in 4 bits, say, with 4-LUT you're guaranteed practically to have a 2+-deep LUT chain minimum.  With 6-LUTs, state transitions can take two conditions and still fit into a single LUT (well more in parallel but we're talking depth/speed here).  With 1-hot each transition can actually have more conditionals (or you can have A=>Z if B, C => Z, if B, D=>Z if E as 3 parallel transitions) in a single LUT.

But definitely it depends on the topology and the density of your state machine graph and # of conditionals.  And if you really need single-LUT depth between states of a complex state machine (running at what 400MHz then?) it's definitely a special design.

And yeah, it's probably been 10 years since I had to hint at ISE to get it do make something "better" than what it auto-picks for you.  Let the tool pick.

0 Kudos
Highlighted
Adventurer
Adventurer
154 Views
Registered: ‎04-08-2017

@avrumw,

The "highly encoded" FSMs very often exhibit one serious drawback - the encoded state register has high fanout. Every register, every output modified based on the certain FSM state has to be reached by ALL the bits of the state reg. Of course there are cases when some bits of the encoded state reg can be optimized out (i.e. when you do something identical in "0000" and "0001" states, the LSB can be thrown away and only the MSBs - "000" - needs be checked), but the principle remains.

So, you have a situation, when all the modified regs and outputs are "pulling the blanket over themselves", literally tearing the encoded state reg apart. In this case the placer and router have to make high efforts to accomplish timings needed, especially when you work on 400MHz ++.

With one-hot you often have no such a problem. Due to it's nature, such FSMs are easily partitioned. A certain state FF controls only its related part of logic and hence can be placed somewhere around this logic, giving the placer and router an opportunity to accomplish faster implementation.

It's not so important how many LUTs are used, but much more important how long routings are, as they have much higher delays then LUTs.

 

To be fair, I'd like to note that Vivado often handles the one-hot inefficiently. Here is an example.
Hence, the "highly encoded" formats often can give better results.

0 Kudos