未分类

Hardware description language for sequential logic design of field programmable gate arrays

FPGA HDL Sequential Logic Design: Patterns That Survive Synthesis and Timing Closure

Sequential logic is where FPGAs differ from software. In software, you write instructions and the processor executes them one by one. In an FPGA, sequential logic is physical: flip-flops capture data on clock edges, and everything else is just combinational wiring between those flip-flops. Get the sequential logic wrong, and nothing works. Get it right, and your design meets timing at frequencies you did not expect.

This article covers the sequential logic patterns that actually work in real FPGA designs, not the textbook examples that fall apart the moment you push them through a synthesis tool.


The Flip-Flop Is Your Building Block: Understand It First

Every piece of sequential logic in an FPGA starts with a flip-flop. The HDL abstraction hides the physics, but you need to understand what is happening underneath.

A flip-flop captures the value at its data input on a clock edge and holds that value until the next clock edge. That is it. There is no magic. The HDL always @(posedge clk) block is just a way of telling the synthesis tool: put a flip-flop here, and this is what drives its input.

The problem is that most developers treat flip-flops like variables in software. They are not. A flip-flop has setup time, hold time, clock-to-output delay, and it can only be clocked by signals that come from the global clock network. If you try to use a data signal as a clock, the tool will either reject the design or build something that works at 25 degrees Celsius and fails at 85.

Non-Blocking Assignment Is Not Optional

The <= operator in Verilog (and := in VHDL for signals) is non-blocking assignment. It means: evaluate all right-hand sides first, then update all left-hand sides at the same time. This matches the physical behavior of flip-flops, which all capture their inputs simultaneously on the clock edge.

If you use blocking assignment (=) inside a clocked always block, the simulator executes statements in order. Statement two sees the updated value from statement one. That does not match real hardware. The code will simulate correctly but synthesize into something with race conditions and timing violations that you cannot debug.

This is not a style preference. It is a correctness requirement. Every single clocked always block in your design should use non-blocking assignment. No exceptions.


Register Types: What Gets Clocked and What Does Not

Not every signal in a sequential block is a register. Some signals are just wires that happen to be assigned inside an always block. Understanding the difference saves you from wasting flip-flops and from creating timing problems you did not intend.

Registered Outputs vs Combinational Outputs

A registered output is assigned inside a clocked always block with non-blocking assignment. The synthesis tool infers a flip-flop for it. The output changes on the clock edge, synchronized with everything else in that clock domain.

A combinational output is assigned inside an always @(*) block with blocking assignment. No flip-flop is inferred. The output changes immediately when any input changes, subject to propagation delay through the combinational logic.

The most common mistake: assigning a signal in both a combinational block and a clocked block. The tool will create two drivers for the same net, or it will override one with the other, and the result will not be what you expected. Pick one. Either the signal is registered or it is combinational. Do not try to be both.

Enable Signals vs Gated Clocks

You need to conditionally update a register. The correct way is a clock enable:

1always @(posedge clk) begin
2    if (enable)
3        count <= count + 1;
4end
5

The synthesis tool maps this to the flip-flop’s built-in clock enable pin. No extra logic. No timing penalty. The register updates only when enable is high, but the clock still toggles every cycle.

The wrong way is a gated clock:

1always @(posedge clk or posedge gated_clk) begin
2    if (gated_clk)
3        count <= count + 1;
4end
5

This creates a clock mux in the global clock network. Clock muxes introduce skew, and skew destroys timing closure. Never gate clocks in RTL. Use clock enables. Always.


Pipelining: The Single Most Important Technique for Timing Closure

If your design does not meet timing, the first thing to try is pipelining. The second thing to try is also pipelining. Pipelining is how you take a long combinational path and break it into shorter segments separated by flip-flops.

How Pipelining Actually Works

Suppose you have a calculation that takes 12 LUT levels to complete. At your target clock frequency, the signal cannot propagate through 12 levels in one clock cycle. The timing analyzer reports a violation.

The fix: insert a register after level 6. Now the first 6 levels run in cycle one, and the second 6 levels run in cycle two. The latency increases by one clock cycle, but the maximum frequency often doubles or triples.

This is the fundamental trade-off in FPGA design: latency versus throughput. Almost always, throughput wins. A design that runs at 200 MHz with one cycle of latency beats a design that runs at 80 MHz with zero cycle latency, because the 200 MHz design processes five times more data per second.

Where to Pipeline

Pipeline at natural boundaries in your data path. After an arithmetic operation. Before a wide multiplexer. At the input and output of every memory block. These are the places where the combinational logic tends to accumulate, and breaking them up gives you the most timing benefit per register.

Do not pipeline randomly. Adding registers everywhere slows down your design without fixing the actual bottlenecks. Run timing analysis first, find the critical paths, and pipeline only those paths. The tool will tell you exactly where the violations are. Listen to it.


Reset Strategies for Sequential Logic: More Complex Than You Think

Reset seems simple. Pull a signal high, clear everything, release the signal, and go. In practice, reset is one of the trickiest parts of sequential logic design because the release timing matters as much as the assertion timing.

Asynchronous Assert, Synchronous Deassert

The standard pattern for FPGA flip-flops is asynchronous assertion with synchronous deassertion:

1always @(posedge clk or posedge rst) begin
2    if (rst)
3        q <= 0;
4    else
5        q <= d;
6end
7

The or posedge rst in the sensitivity list means the flip-flop clears immediately when rst goes high, regardless of the clock. This is asynchronous assertion. The else branch only evaluates on the clock edge, which means deassertion is synchronous. The register comes out of reset on a clock edge, not at a random time.

Why does this matter? If reset deasserts asynchronously, the flip-flop can transition from reset to active at any time relative to the clock. If it happens near a clock edge, you get metastability. If it happens far from a clock edge, the downstream logic sees a transition at an unpredictable time, which can cause glitches or setup violations.

Synchronous deassertion forces the release to align with the clock, which eliminates all of these problems.

Reset Synchronizers for Multiple Clock Domains

If your reset signal originates in one clock domain and needs to control flip-flops in another, you must synchronize it. A two-flop synchronizer is the minimum:

1always @(posedge clk_b) begin
2    rst_sync_0 <= rst_async;
3    rst_sync_1 <= rst_sync_0;
4end
5

Use rst_sync_1 as the reset for all flip-flops in clk_b domain. The first flop catches metastability. The second flop gives it a full cycle to resolve. Without this synchronizer, the reset release can be seen at different times by different flip-flops, which means some registers come out of reset one cycle earlier than others. That skew causes functional failures that are nearly impossible to reproduce.


Shift Registers and Delay Lines: Sequential Logic With Structure

Shift registers are everywhere in FPGA designs: serializers, deserializers, delay lines, FIR filter taps. They are simple in concept but easy to get wrong in code.

Coding a Shift Register the Right Way

The naive approach uses a loop:

1always @(posedge clk) begin
2    for (i = 0; i < 8; i = i + 1)
3        shift_reg[i] <= shift_reg[i-1];
4    shift_reg[0] <= data_in;
5end
6

This does not synthesize into a shift register. The synthesis tool sees a loop with a data dependency (each stage reads from the previous stage) and builds a chain of flip-flops with combinational logic between them. It works, but it is not optimal.

The better approach is explicit:

1always @(posedge clk) begin
2    shift_reg[7] <= shift_reg[6];
3    shift_reg[6] <= shift_reg[5];
4    shift_reg[5] <= shift_reg[4];
5    shift_reg[4] <= shift_reg[3];
6    shift_reg[3] <= shift_reg[2];
7    shift_reg[2] <= shift_reg[1];
8    shift_reg[1] <= shift_reg[0];
9    shift_reg[0] <= data_in;
10end
11

Verbose? Yes. But the synthesis tool recognizes this pattern and maps it directly to the FPGA’s dedicated shift register hardware (SRL or cascade chains). The result uses fewer LUTs, routes faster, and consumes less power than the loop version.

For deep shift registers (32 bits or more), use the FPGA’s built-in shift register primitives explicitly. Instantiate them directly rather than hoping the tool infers them from a loop.


Counters: The Simplest Sequential Logic That Still Has Traps

A counter is just a register that increments on every clock cycle. It sounds trivial. It is not.

Free-Running vs Gated Counters

A free-running counter increments every cycle:

1always @(posedge clk) begin
2    if (rst)
3        count <= 0;
4    else
5        count <= count + 1;
6end
7

This is fine. The synthesis tool maps it to a chain of flip-flops with carry logic, and it runs at high speed.

A gated counter only increments when an enable signal is high:

1always @(posedge clk) begin
2    if (rst)
3        count <= 0;
4    else if (enable)
5        count <= count + 1;
6end
7

This also works fine. The enable becomes a clock enable on the flip-flops. No timing penalty.

The trap: if you write the enable as part of the increment condition using a ternary operator inside the addition, the tool may not recognize it as a clock enable and will build extra combinational logic. Keep the enable in the if statement, not inside the arithmetic expression.

Wrap-Around and Saturation

A counter that overflows and wraps around to zero is the default behavior. This is what you want for most applications: timers, address generators, clock dividers.

A saturating counter stops at its maximum value instead of wrapping:

1always @(posedge clk) begin
2    if (rst)
3        count <= 0;
4    else if (enable) begin
5        if (count < MAX_VAL)
6            count <= count + 1;
7    end
8end
9

The comparison count < MAX_VAL adds a combinational check before the increment. This adds one LUT level to the critical path. For wide counters (16 bits or more), this can reduce your maximum clock frequency. If you need saturation, pipeline the comparison: register the comparison result in one cycle, use it to gate the increment in the next cycle.


Multi-Cycle Paths and False Paths: Telling the Timing Tool What to Ignore

Not every path in your design needs to meet single-cycle timing. Some paths are intentionally slow. The timing analyzer does not know this unless you tell it.

Marking Multi-Cycle Paths Correctly

A multi-cycle path is a path that takes two or more clock cycles to complete. Common examples: the output of a block RAM (which takes one cycle to read), the result of a multi-cycle multiplier, or the output of a state machine that updates on a slow enable.

Tell the tool with a constraint:

1set_multicycle_path -setup 2 -from [get_pins state_reg/Q] -to [get_pins next_state_logic/D]
2

This tells the timing analyzer that the path has two cycles to meet setup timing. Without this constraint, the tool will try to meet single-cycle timing on a path that physically cannot do it, and you will get a violation that is not a real problem.

False Paths Are Different

A false path is a path that never exists in real operation. For example, two mutually exclusive state machine states can never be active at the same time, so any path between them is a false path. Mark it with set_false_path. The timing analyzer will ignore it entirely.

Do not abuse false paths. If you mark everything as a false path, the timing report becomes meaningless, and you will ship a board that does not work. Only mark paths that are provably unreachable.


Clock Domain Crossing in Sequential Logic: The Hardest Problem in FPGA Design

When a signal crosses from one clock domain to another, it is no longer sequential logic in the traditional sense. It becomes a synchronization problem, and the solutions are all sequential logic patterns.

The Two-Flop Synchronizer for Single-Bit Signals

Every single-bit control signal that crosses clock domains must pass through a two-flop synchronizer:

1always @(posedge clk_b) begin
2    sync_0 <= signal_from_clk_a;
3    sync_1 <= sync_0;
4end
5

The first flop can go metastable. The second flop resolves it. The probability of metastability propagating to the second flop is so low that you can run a design for years without seeing it. But it is not zero, and if it happens, your design fails in a way that is impossible to reproduce.

Asynchronous FIFOs for Multi-Bit Data

When you need to transfer data (not just a control signal) across clock domains, use an asynchronous FIFO. The FIFO has dual-port memory: one port writes on clk_a, the other port reads on clk_b. The write and read pointers are gray-coded and synchronized across domains, which eliminates the risk of pointer corruption.

Do not try to build your own CDC logic with handshakes. Handshakes require the handshake signals to cross domains, which brings you back to the same problem. The asynchronous FIFO is the only robust solution for multi-bit CDC.


Common Sequential Logic Bugs That Survive Simulation

Simulation does not catch all sequential logic bugs. Some only appear in hardware because simulation does not model real-world electrical behavior.

Incomplete sensitivity lists in combinational blocks that drive sequential logic. If your combinational block misses a signal, simulation will hold the old value, but hardware will react to the new value immediately. The mismatch causes functional failures that simulation never shows.

Uninitialized registers. In simulation, registers start as X (unknown). In hardware, they start as 0 or 1 depending on the power-up state of the flip-flop. If your design depends on a register being 0 at power-up and it happens to be 1, the design fails silently. Always provide a reset that initializes every register to a known value.

Race conditions between always blocks. If two always blocks drive the same signal, the simulator executes them in an undefined order. The hardware does not care about order: both drivers fight for control of the net, and the result is undefined. Synthesis tools will either reject the design or build arbitrary logic. Never drive the same signal from two always blocks.

Clock domain crossings without synchronization. This is the silent killer. The design works in simulation because the simulator does not model metastability. In hardware, the unsynchronized signal causes metastable events that propagate through the design and create random failures. Add synchronizers to every CDC signal. Every single one.

ChipApex is a global distributor of electronic components: ICs, semiconductors, passives & interconnects. Source active & obsolete parts with wholesale pricing, fast RFQ response, and worldwide delivery.Official website address:chipapex.com

Related Articles

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

Back to top button