I. The Recognition Predicate
- Let’s move from Regex (Level 4) to a Recognition Machine.
- Definition: A Finite Automaton M recognizes a language L if, for every string w ∈ Σ*, the machine accepts w if and only if w ∈ L.
- Nondeterminism (NFA): Unlike a deterministic finite automaton (DFA), a nondeterministic finite automaton (NFA) allows for multiple possible transitions from the same input.
- Furthermore, NFAs allow for 𝜖-transitions (aka “free moves”)
- Acceptance Rule: A string is accepted if there exists at least one computation path from the start state to the accepting state.
II. Thompson’s Construction (Inductive Building)
- Following the “Basis” and “Induction” pattern from Aho 2006, we build NFAs as modules with a single entry and a single exit.
A. The Basis (Atomic Machines)
- Symbol a: Two states with a transition labeled a.
- The Empty String 𝜖: Two states with a transition labeled 𝜖 (aka “the multiplicative identity”).
- The Empty Set ∅: Two states with no accepting path between them (aka “the annihilator”).
B. The Induction (Structural Glue)
- Union (r|s): Create a new start state that branches via 𝜖 to the start of M(r) and M(s)*. This implements L ∪ M; this is a union at the level of languages (aka JH Level 3).
- Concatenation (rs): Link the exit of M(r) directly to the entry of M(s). This implements L ⋅ M; this is concatenation at the level of languages (aka JH Level 3).
- Kleene Star (r*): Add an 𝜖-loop from the exit M(r) back to its entry; and an 𝜖-bypass from the new start to the new exit to allow for the “length 0” string.
III. The Transition Table Representation
- In the final C code implementation, we do not draw arrows. Instead we use a transition table.
A. Structure of the Transition Table
- Rows represent the States of the NFA.
- Columns represent the alphabet Σ plus the special empty-string 𝜖 column.
- Cell (s, i): Contains the set of the states of the machine can move to from state s on input i.
X. Appendix on notation
Ignore registers below
“ap = Upper case sigma aka Alphabet Σ “sp = Sigma Star (Uppercase Sigma with Asterisk) Σ* “ep = lowercase epsilon aka the empty string 𝜖 “bp = subset of; ⊆ “up = Union symbol ∪ “zp = empty set symbol ∅ “dp = dot product symbol ⋅ “np = NOT equal sign ≠
“lp = L “mp = element of; ∈
macros
@e = turn exactly one character bold