Design of State Machine for Field Programmable Gate Array Hardware Description Language
FPGA HDL State Machine Design: Patterns That Work in Real Hardware
State machines are the backbone of almost every FPGA design. They control data flow, manage protocols, sequence operations, and handle error recovery. A badly designed state machine creates timing failures, unreachable states, and bugs that only show up at 3 AM during board bring-up. A well-designed one runs cleanly, scales easily, and survives every toolchain update you throw at it.
This is not a textbook. It is a practical guide to the state machine patterns that actually work when you target real FPGA fabric.
Why Most State Machine Tutorials Get It Wrong
Every HDL textbook teaches the same three encodings: binary, one-hot, and gray. Then it tells you to pick one and move on. That advice is useless for FPGA work.
The reality is that FPGAs are built from lookup tables and flip-flops, not from binary counters. The encoding you choose changes not just the flip-flop count, but the combinational logic complexity, the routing congestion, and the maximum clock frequency. A state machine that works fine in simulation can fail timing in hardware if you picked the wrong encoding for the target device.
The other problem: most tutorials show state machines with clean, synchronous transitions. Real designs have asynchronous inputs, glitchy reset signals, and multiple clock domains. The tutorial examples do not cover any of that, so when you try to apply them to actual hardware, things break in ways the tutorial never mentioned.
One-Hot Encoding: The Default Choice for FPGA State Machines
For any state machine with roughly 8 to 20 states, one-hot encoding is almost always the right call on an FPGA. Each state gets its own flip-flop. The state register looks like a thermometer code: only one bit is high at any time.
Why One-Hot Wins on FPGA Fabric
The next-state logic for a one-hot machine is absurdly simple. Each flip-flop’s next value depends on the current state and the inputs, and the logic reduces to a few AND-OR gates. There are no wide multiplexers, no complex decode trees. The synthesis tool maps each state bit directly to a LUT input, and the whole thing fits in a single logic level.
Compare that to binary encoding, where the next-state logic needs a wide multiplexer to decode the binary state value into individual state signals. That mux eats LUT levels and routing resources, and it gets worse as the state count grows.
The flip-flop count is higher with one-hot, but FPGAs have flip-flops to spare. What they do not have is unlimited logic depth. One-hot trades flip-flops for speed, and that trade-off almost always favors speed.
When to Avoid One-Hot
One-hot breaks down when you have more than 30 or 40 states. The flip-flop count becomes significant, and the routing fanout on the state register starts to hurt timing. At that point, binary or gray encoding becomes competitive. For very large state machines (100+ states), consider splitting the machine into smaller sub-machines rather than forcing a single massive encoding.
Also, if your target is an ASIC rather than an FPGA, one-hot is almost never the right choice because flip-flop area is expensive in ASICs. This article is about FPGAs, so that does not apply here.
Two-Always vs Three-Always: The Debate That Does Not Matter
You will see state machines written with two always blocks (one combinational for next-state logic, one sequential for state register) and with three always blocks (adding a separate output logic block). Both work. Both synthesize to the same hardware if written correctly.
The Two-Always Pattern Is Simpler and Safer
The two-always pattern keeps everything in one place:
1always @(posedge clk or posedge rst) begin
2 if (rst)
3 state <= IDLE;
4 else
5 state <= next_state;
6end
7
8always @(*) begin
9 next_state = state;
10 case (state)
11 IDLE: if (start) next_state = RUN;
12 RUN: if (done) next_state = IDLE;
13 default: next_state = IDLE;
14 endcase
15end
16
Outputs that depend only on the state come from the combinational block. Outputs that depend on state and inputs also come from the combinational block. Everything is explicit, and there is no ambiguity about what drives what.
The three-always pattern separates output logic into its own block. This can make the code look cleaner when you have many outputs, but it adds a third sensitivity list to maintain, and it creates an extra opportunity for simulation-synthesis mismatch if you are not careful.
Stick with two always blocks unless you have a specific reason not to. Simplicity wins.
Handling Reset in State Machines: The Part That Causes Silent Failures
Reset is where state machines go wrong in ways that are incredibly hard to debug. The problem is not the reset itself. It is what happens when the reset deasserts.
Synchronous Reset Release Is Non-Negotiable
The reset should clear the state register asynchronously (so the machine goes to a known state immediately on power-up), but the deassertion should be synchronous. This means the state machine only leaves reset on a clock edge.
1always @(posedge clk or posedge rst) begin
2 if (rst)
3 state <= IDLE;
4 else
5 state <= next_state;
6end
7
This pattern gives you asynchronous assertion (the or posedge rst part) and synchronous deassertion (the else part only evaluates on the clock edge). The machine enters IDLE immediately when reset goes high, and it stays there until the next clock edge after reset goes low.
If you release reset asynchronously, the state machine can transition out of IDLE at a random time relative to the clock, which means it can enter an undefined state or violate setup times on downstream logic. This creates bugs that appear once in a thousand power cycles and are impossible to reproduce in the lab.
Defining a Safe Default State
Every state machine needs a default case in its combinational block that forces the machine into a known state if it ever reaches an undefined value:
1always @(*) begin
2 next_state = IDLE; // default
3 case (state)
4 IDLE: if (start) next_state = RUN;
5 RUN: if (done) next_state = IDLE;
6 // no default here because we set it above
7 endcase
8end
9
Setting the default at the top of the block is safer than relying on the default keyword inside the case statement. If you ever add a new state and forget to update the case, the default at the top catches it.
Mealy vs Moore: It Actually Matters for FPGA Timing
In a Moore machine, outputs depend only on the current state. In a Mealy machine, outputs depend on the current state and the inputs. The textbook answer is that Moore machines are safer because outputs do not glitch when inputs change. That is true in theory. In FPGA practice, the difference is more nuanced.
Moore Machines Give You Cleaner Outputs
With a Moore machine, all outputs are registered. They change only on the clock edge, synchronized with the state register. This means no output glitches, no combinational paths from inputs to outputs, and cleaner timing closure.
For most FPGA designs, Moore is the safer default. The latency is one clock cycle longer (the output reflects the previous state, not the current one), but that one cycle is almost never a problem, and the timing benefits are real.
Mealy Machines Save a Cycle When You Need It
A Mealy machine can react to inputs in the same cycle they arrive, without waiting for the next clock edge. This is useful in high-throughput datapaths where every cycle counts. The trade-off is that outputs now have a combinational path from inputs, which means they can glitch, and the timing analyzer has to check those paths.
If you use Mealy, register the outputs. Put a flip-flop on every output signal. This gives you the speed benefit of Mealy logic with the glitch-free output of a Moore machine. It adds one cycle of latency, but it also makes timing closure much easier because the tool no longer has to analyze combinational paths from primary inputs to primary outputs.
Coding Patterns for Common State Machine Types
Sequence Detectors
A sequence detector watches for a specific pattern on an input stream. The state count equals the pattern length plus one. For a “1011” detector, you need five states: idle, saw-1, saw-10, saw-101, and saw-1011.
Use one-hot encoding. The next-state logic is a simple shift: each state advances to the next when the input bit matches, and falls back to an earlier state when it does not. The fallback logic is where most people make mistakes. Draw the state diagram first. Then code it. Do not try to write the next-state logic directly from the pattern.
Protocol Controllers
Protocol state machines (SPI, I2C, UART, custom handshakes) tend to have many states and complex transition conditions. Split them into two layers: a main controller that sequences through protocol phases, and sub-machines that handle each phase.
The main controller uses a small state machine (5 to 10 states) that transitions based on phase-done signals from the sub-machines. Each sub-machine handles the detailed bit-level or byte-level protocol. This hierarchical approach keeps each state machine small enough to verify and debug independently.
Arbitration and Priority Machines
When multiple modules compete for a shared resource, the arbiter is a state machine that grants access based on priority or round-robin rules. These machines must be deadlock-free by design. Every state must have a defined transition. No state should be able to trap the machine in a loop where no request gets serviced.
Code the arbiter as a Moore machine with registered grant outputs. The grants should not change until the next clock edge after the request, which prevents glitches on the shared bus.
Debugging State Machines: The Practical Approach
When a state machine misbehaves on hardware, the first thing to check is whether it ever leaves the reset state. Probe the state register with an ILA or logic analyzer. If the machine is stuck in IDLE, the reset is not deasserting cleanly, or the start condition is never being met.
The second thing to check is whether the machine ever enters an undefined state. If your default case is missing or wrong, the state register can hold a value that no case branch handles, and the next-state logic will produce garbage. The machine will appear to “hang” or behave randomly.
The third thing: check the clock domain. If the state machine runs on one clock and the inputs come from another, you need a synchronizer on every input signal. An unsynchronized input can cause the machine to see a pulse that is only one clock wide, which it will miss, or a metastable value that causes it to jump to a random state.
Add a signal name prefix to every state machine in your design. Use fsm_ for the state register, fsm_next_ for the next-state logic, and fsm_out_ for the outputs. When you probe the design, you can filter by prefix and see exactly what the state machine is doing without wading through hundreds of unrelated signals.
ChipApex is a global distributor of electronic components: ICs, semiconductors, passives & interconnects. Source active & obsolete parts with wholesale pricing, fast RFQ response, and worldwide delivery.Official website address:chipapex.com