Finite Automata, Part 3

I. JH recap of steps

Start with alphabet Σ—a set of symbols.
Generate sigma star Σ^*.
Consider a language L ⊆ Σ^*.
Use regex r to describe language L.
Use Thompson’s Construction to convert r to an NFA.
Use 𝜖-closures and subset construction to convert the NFA to a DFA.
Use transition tables to make a straightforward implmentation of the DFA in C or other computer language.

II. 𝜖-closure function: E() applied to free moves only

Epsilon Closure applied to all the 𝜖-transitions only.
See Section 6.1 in JH notes in Notebook #2, 2/20/2026.
See Section 6.1.4 for mathematical formalisms when:
- E() applied to a single q ∈ Q.
- E() applied to an entire set, like E(S) = T.
See JH notes Section 6.2 from 2/21 - 2/23/2026.
Using seed values obtained from T during initial 𝜖-closure process, we generate S₀, S₁, S₂, …, S_n by repeatedly invoking E() on each successive S_i .
We repeat the process until we reach the “fixed point property”. This is when E( E(S) ) = E(S) and T stops growing.
- See also JH notes 6.2.1.

IV. General thoughts from Aho

from p. 152, Section 3.7: From Regular Expressions to Automata
Recall, both DFAs and NFAs can express any regex r. However, DFAs are simpler to implement in software. So usually, the sequence is to construct a regex to recognize a desired string pattern. Then, use Thompson’s Construction (Basis and Induction rules) to convert that regex to an NFA. Then, we use subset construction to convert the NFA to a DFA.
Aho Section 3.7 shows how to convert NFAs to DFAs.

A. Aho 3.7.1 Conversion of NFA to DFA (Aka subset construction)

The general idea behind the subset construction…

V. The Big Picture

Think of the transition from Regex to DFA as a refinement process. One starts with a messy human idea and finishes with a high-speed machine. There are five steps:

Regex: An abstract pattern to describe some set of strings (e.g., (f|r)*frrr).
Thompson’s Construction: Turns the regex into an NFA. It uses 𝜖-transitions (jumps that require no input) to glue small machines together. It’s easy to build but “lazy”—the computer doesn’t know which path to take without guessing.
𝜖-closure: Since a DFA cannot have multiple possible states for one input, we group “sets” of NFA states into single DFA “super-states.”
Subset Construction (The “Power Set” Algorithm): Subset Construction converts an NFA into a DFA.
DFA: The final product. Fast, predictable, and easily implemented in code with a simple switch statement or lookup table.

VII. What is an 𝜖-closure?

In an NFA, an 𝜖-transition allows the machine to move from one state to another without consuming any characters from the input string.

The 𝜖-closure of a state (let’s call it state S) is the set containing S itself plus every other state we can reach just by following 𝜖-transitions.

The Intuition: If we are standing at state S, and we haven’t read a character yet, where could we be? We could be at S, or we could have “slid” across any available 𝜖-planks to other states. All those possible locations combined make up the 𝜖-closure.

Why it matters for Subset Construction

When converting to a DFA, we need to know the effective starting point.

In an NFA, we start at state q₀.
But if q₀ has 𝜖-transitions to q₁ and q₂, we are effectively starting in all three states simultaneously.
The DFA’s start state is actually the 𝜖-closure of the NFA’s start state.

VIII. How it works in the Algorithm

When we are performing the Subset Construction, we use the closure in two main spots:

Step	Action
Initialization	Create the DFA start state by taking the 𝜖-closure of the NFA start state.
Transition	For a set of states, see where they go on input `char`. Then, take the 𝜖-closure of that resulting set to find the next DFA state.

Mathematically, for a set of NFA states $T$ and an input symbol a: next_DFA_state = 𝜖-closure( move(T, a) )

IX. Summary

Computers hate guessing. An NFA is non-deterministic because it can be in multiple states at once. By using 𝜖-closures to group those possibilities into a single DFA state, we ensure the machine always knows exactly where it is. It trades memory (more states) for speed (no backtracking).