I. The Recognition Predicate
- Let’s move from Regex (Level 4) to a Recognition Machine.
- Definition: A Finite Automaton M recognizes a language L if, for every string w ∈ Σ*, the machine accepts w if and only if w ∈ L.
- Nondeterminism (NFA): Unlike a deterministic finite automaton (DFA), a nondeterministic finite automaton (NFA) allows for multiple possible transitions from the same input.
- DFAs: For each state and each symbol a ∈ Σ, there is exactly one outgoing transition labeled a (possibly to a sink state; these arrows to sink states are often omitted in diagrams).
- NFAs: May have 0, 1, or many next states for a given (q,a). Also, NFAs allow for 𝜖-transitions (aka “free moves”).
- Acceptance Rule: A string is accepted if there exists at least one computation path that consumes all input and ends in an accepting state.
II. Thompson’s Construction (Inductive Building)
- Following the “Basis” and “Induction” pattern from Aho 2006, we build NFAs as modules with a single entry and a single exit.
A. The Basis (Atomic Machines)
- Symbol a: Two states with a transition labeled a.
- The Empty String 𝜖: Two states with a transition labeled 𝜖 (aka “the multiplicative identity”).
- The Empty Set ∅: Two states with no accepting path between them (aka “the annihilator”).
B. The Induction (Structural Glue)
- Union: M( r | s ) recognizes L(r) ∪ L(s).
- Concatenation: M( rs ) recognizes L(r) ⋅ L(s).
- Star: M(r*) recognizes L(r)*.
III. The Transition Table Representation
- In the final C code implementation, we do not draw arrows. Instead we use a transition table.
A. Structure of the Transition Table
- Rows represent the States of the NFA.
- Columns represent the alphabet Σ plus the special empty-string 𝜖 column.
- Cell (s, i): Contains the set of states the machine can move to from state s on input i.
IV. Notes on Thain Chapter 3, p. 15-27
A. Fundamentals of Finite Automata (FA)
- Basic notes on 3.4 Finite Automata, p. 15
- A finite automaton (aka FA) is an abstract machine, which can also be represented by a graph of nodes and edges.
- Each node represents a state; it is drawn as a circle.
- There are special states called accepting states, represented as a double circle.
- Each edge is a directional arrow labeled with one or more symbols drawn from alphabet Σ.
B. Important observations about FA (Thain 3.4)
- Every regex r can be represented by a finite automaton FA and vice versa.
- If an FA is in an accepting state after all input is consumed, then we say that the FA accepts the input.
- Inversely, if the FA is not in an accepting state after all input is provided, then we say that the FA rejects the input string. p. 16
C. Thain 3.4.2 NFA
- Nondeterministic Finite Automata (NFAs) are more challenging because of their ambiguity.
- Example of a NFA analyzing input string
singversus input stringsinging. p. 17
V. Handwritten notes
- See JH Computer Science Notebook #2.
A. Table on JH Notation Color Conventions
- See 2/09/2026 for table in Section 3.4.
B. Section 4.0: DFA and associated tables
- See notes from 2/11 to 2/15/2026 for Section 4.0.
- Section 4.1 DFA that recognizes
forandrofas input regexes.- Figure 4.1a is a simple linear graph that only recognizes
for. - Figure 4.1b is a critical DFA with a single branch. It recognizes both
forandrfo. Used in many of the notes on following pages.
- Figure 4.1a is a simple linear graph that only recognizes
-
- Section 4.2 Step tables showing changes in status of current state of DFA from Fig 4.1b.
- 4.2.1 Table shows change in states when input regex =
for - 4.2.2 Table shows change in states when input regex =
rof - 4.2.3 Table shows rejection when input regex =
fro
-
- Section 4.3 Formal 5-tuple definition of machine M (aka a finite automaton). See also next blog post.
- Section 4.4 Complete version of transition table that models Figure 4.1b.
- This includes all the qsink states that are usually excluded from diagrams and transition tables for conciseness.
-
- Section 4.5 Rewritten version of table 4.2.3 so we can see what a step table looks like when it includes qsink while it rejects a regex
fro.
- Section 4.5 Rewritten version of table 4.2.3 so we can see what a step table looks like when it includes qsink while it rejects a regex
C. Section 5.0: Thompson’s Construction for two different NFAs
- Section 5.1
- Figure 4.1b, the NFA that recognizes
forandrofonly.
- Figure 4.1b, the NFA that recognizes
- Section 5.2
- The NFA that recognizes the regex ‘(f|r)*frrr’
- Complex Thompson Construction with 16 total states, illustrated in 5.2.4
- Built in three different modules:
- Section 5.2.1 Inner union
f|r. Illustrated by drawing in 5.2.1.5. - Section 5.2.2 Kleene star around
f|r. Illustrated by drawing in 5.2.2.3. - Section 5.2.3 Concatenation of trailing
frrrsuffix. Illustrated by monster drawing in 5.2.4.
- Section 5.2.1 Inner union
X. Appendix
A. Table 1: Types
| Object | Type | Example |
|---|---|---|
| Σ | Alphabet (set of symbols) | { a, b } |
| a | Symbol / atom | a |
| Σ* | Set of strings; “universe” of all strings generated from alphabet Σ | { 𝜖, a, b, aa, ab, … } |
| w | String | abba |
| 𝜖 | Empty string; member of every Sigma Star Σ* | string of length 0 |
| ∅ | Empty language | {} or ∅: A language L which is an empty set with no members (aka no strings) |
B. Table 2: JH Notation Style Guide
| Concept | Notation |
|---|---|
| Alphabet | Σ |
| String | w ∈ Σ* |
| Sigma Star | Σ* |
| Regex | r, s |
| Language | L(r), L(M) |
| Automaton (aka Machine) | M(r), M(s), M |
| Acceptance | M accepts w |
| Recognition | L(M) = { w ∈ Σ* : M accepts w } |
C. Other notes
- Link to original Ken Thompson’s original 1968 paper *Regular Expression Search Algorithm.
D. Older study plan
- A. Thain: 3.4; 3.5. Key focus: look at Thain’s explanation of how an NFA “simulates” multiple states at once.
- B. Aho: 3.6; 3.7 Thompson Construction, from regex to automata; 3.7.1 pay attention to the 𝜖-closure algorithm here.
- C. Recommended strategy for trace
- Thain 3.5 for the vibe of the algorithm
- Aho 3.7.1 for specific rules of 𝜖-closure.
- D. continue writing up Thain 3.4.2; the example of
singvssinging. p. 17-18