I. JH recap of steps
- Start with alphabet Σ—a set of symbols.
- Generate sigma star Σ*.
- Consider a language L ⊆ Σ*.
- Use regex r to describe language L.
- Use Thompson’s Construction to convert r to an NFA.
- Use 𝜖-closures and subset construction to convert the NFA to a DFA.
- Use transition tables to make a straightforward implmentation of the DFA in C or other computer language.
II. 𝜖-closure function: E() applied to free moves only
- Epsilon Closure applied to all the 𝜖-transitions only.
- See Section 6.1 in JH notes in Notebook #2, 2/20/2026.
- See Section 6.1.4 for mathematical formalisms when:
- E() applied to a single q ∈ Q.
- E() applied to an entire set, like E(S) = T.
III. 𝜖-closure function: E() in subset construction
- See JH notes Section 6.2 from 2/21 - 2/23/2026.
- Using seed values obtained from T during initial 𝜖-closure process, we generate S0, S1, S2, …, Sn by repeatedly invoking E() on each successive Si .
- We repeat the process until we reach the “fixed point property”. This is when E( E(S) ) = E(S) and T stops growing.
- See also JH notes 6.2.1.
IX. General thoughts from Aho
- from p. 152, Section 3.7: From Regular Expressions to Automata
- Recall, both DFAs and NFAs can express any regex r. However, DFAs are simpler to implement in software. So usually, the sequence is to construct a regex to recognize a desired string pattern. Then, use Thompson’s Construction (Basis and Induction rules) to convert that regex to an NFA. Then, we use subset construction to convert the NFA to a DFA.
- Aho Section 3.7 shows how to convert NFAs to DFAs.
A. Aho 3.7.1 Conversion of NFA to DFA (Aka subset construction)
- The general idea behind the subset construction
XI. The Big Picture
Think of the transition from Regex to DFA as a refinement process. One starts with a messy human idea and finishes with a high-speed machine. There are five steps:
- Regex: An abstract pattern to describe some set of strings (e.g.,
(f|r)*frrr). - Thompson’s Construction: Turns the regex into an NFA. It uses 𝜖-transitions (jumps that require no input) to glue small machines together. It’s easy to build but “lazy”—the computer doesn’t know which path to take without guessing.
- 𝜖-closure: Since a DFA cannot have multiple possible states for one input, we group “sets” of NFA states into single DFA “super-states.”
- Subset Construction (The “Power Set” Algorithm): This is where we are now. Subset Construction converts an NFA into a DFA.
- DFA: The final product. Fast, predictable, and easily implemented in code with a simple
switchstatement or lookup table.
XII. What is an 𝜖-closure?
In an NFA, an 𝜖-transition allows the machine to move from one state to another without consuming any characters from the input string.
The 𝜖-closure of a state (let’s call it state S) is the set containing S itself plus every other state we can reach just by following 𝜖-transitions.
The Intuition: If we are standing at state S, and we haven’t read a character yet, where could we be? We could be at S, or we could have “slid” across any available 𝜖-planks to other states. All those possible locations combined make up the 𝜖-closure.
Why it matters for Subset Construction
When converting to a DFA, we need to know the effective starting point.
- In an NFA, we start at state q0.
- But if q0 has 𝜖-transitions to q1 and q2, we are effectively starting in all three states simultaneously.
- The DFA’s start state is actually the 𝜖-closure of the NFA’s start state.
XIII. How it works in the Algorithm
When we are performing the Subset Construction, we use the closure in two main spots:
| Step | Action |
|---|---|
| Initialization | Create the DFA start state by taking the 𝜖-closure of the NFA start state. |
| Transition | For a set of states, see where they go on input char. Then, take the 𝜖-closure of that resulting set to find the next DFA state. |
Mathematically, for a set of NFA states $T$ and an input symbol a: next_DFA_state = 𝜖-closure( move(T, a) )
XIV. Summary
Computers hate guessing. An NFA is non-deterministic because it can be in multiple states at once. By using 𝜖-closures to group those possibilities into a single DFA state, we ensure the machine always knows exactly where it is. It trades memory (more states) for speed (no backtracking).
--->