Computation and Formal Language Theory #
Computability-theoretic substrate for Gold-style learning theory. Contains:
- Automata (DFA, NFA, PDA, Turing machines) with Mathlib bridges
- Formal languages (regular, context-free, recursively enumerable)
- Computability classes (partial computable, RE sets, Δ₀₂, limiting recursion)
- Information-theoretic primitives (Kolmogorov complexity, MDL, MML, SRM)
- Execution traces for program synthesis
Alphabets and Formal Languages #
Automata #
These are wrappers providing the learning-theoretic interface. Mathlib has DFA/NFA; we bridge to those where possible.
The language accepted by a DFA.
Instances For
A regular language is a language accepted by some DFA.
Equations
- IsRegularLang Sym L = ∃ (Q : Type) (x : Fintype Q) (x_1 : DecidableEq Q) (M : DFA' Sym Q), M.accepts = L
Instances For
One-step CFG derivation: replace one nonterminal by a production.
Equations
Instances For
Multi-step CFG derivation: reflexive-transitive closure of one-step.
Equations
Instances For
A context-free language is a language generated by some CFG.
Equations
- IsContextFree Sym L = ∃ (G : CFG Sym), G.Language = L
Instances For
Computability Infrastructure #
Critical for Gold-style learning: EX-learnability connects to the arithmetic hierarchy.
A Turing machine (abstract). We use this as an opaque type representing effective computability. Bridges to Mathlib's computability library.
- index : ℕ
Opaque representation; we work with the computability predicate, not the machine.
Instances For
Gödel numbering: an effective encoding of objects as natural numbers.
Equations
- GodelNumbering α = (α ≃ ℕ)
Instances For
Partial computable function. Bridges to Mathlib's Computable / Partrec.
A function f : α →. β is partial computable iff there exists a Nat.Partrec function g
on ℕ that commutes with the encodings: (f a).map encode = g (encode a).
Note: uses Encodable (not Primcodable); the Partrec witness operates on codes.
Equations
- PartialComputable α β = { f : α →. β // ∃ (g : ℕ →. ℕ), Nat.Partrec g ∧ ∀ (a : α), Part.map Encodable.encode (f a) = g (Encodable.encode a) }
Instances For
Limiting recursion: functions that converge in the limit. f is limiting-recursive if there exists a total computable g such that f(x) = lim_{s→∞} g(x, s). This is the computational substrate of EX-learning.
Equations
Instances For
Δ₀₂ class: sets whose characteristic function is limiting-recursive. EX-learnable concept classes have Δ₀₂ membership.
Equations
Instances For
Information-Theoretic Primitives #
Kolmogorov complexity: the length of the shortest program that produces x. Defined via Nat.Partrec.Code (Mathlib's effective program codes). K(x) = inf { encode(p) : p.eval 0 = some (encode x) }. If no program produces x, sInf ∅ = 0 by convention (vacuous).
Equations
- KolmogorovComplexity α x = sInf {x_1 : ℕ | ∃ (p : Nat.Partrec.Code) (_ : p.eval 0 = Part.some (Encodable.encode x)), Encodable.encode p = x_1}
Instances For
Description length: length of encoding under a specific description language. Unlike Kolmogorov complexity, this is relative to a chosen coding scheme.
Equations
- DescriptionLength α = (α → ℕ)
Instances For
Minimum Description Length principle: choose the hypothesis that minimizes description_length(H) + description_length(data | H). Analogous to Kolmogorov complexity. Analogous to Bayesian MAP.
Description language for hypotheses
Description language for data given hypothesis
MDL selection: minimize total description length. Uses Classical.epsilon to select an h ∈ H minimizing hypothesisLength h + dataLength h data. Inhabited provides the Nonempty witness for epsilon.
Instances For
Minimum Message Length: Bayesian analog of MDL. MML seeks the hypothesis that minimizes expected message length under a prior and likelihood model.
MML selection: minimize total message length (hypothesisLength + dataFit). Inhabited provides the Nonempty witness for epsilon.
Instances For
Algorithmic probability: probability of x under the universal distribution. P(x) = Σ_{p : Code, p(0)=encode(x)} 2^{-|p|}. Related to Kolmogorov complexity by the coding theorem: -log P(x) ≈ K(x). Uses Classical.propDecidable for the halting condition (undecidable in general).
Equations
- AlgorithmicProbability α x = ∑' (p : Nat.Partrec.Code), if p.eval 0 = Part.some (Encodable.encode x) then 2 ^ (-↑(Encodable.encode p)) else 0
Instances For
Structural Risk Minimization: choose hypothesis from the complexity level that minimizes bound on generalization error. Connects VC dimension to inductive bias.
Nested hypothesis classes of increasing complexity
Monotonicity: larger index = richer class
Empirical risk: fraction of training errors
Complexity penalty: regularizer per level
SRM selection: minimize empiricalRisk(h, data) + penalty(n) over (n, h ∈ levels n). Uses Classical.epsilon for the argmin; Inhabited provides the witness.
Instances For
Execution Traces (for process-level concepts) #
An execution trace records the sequence of hypotheses a learner produces.
Equations
- ExecutionTrace X Y = (ℕ → Concept X Y)
Instances For
A partial trace that may not be defined at all time steps.
Equations
- PartialTrace X Y = (ℕ → Option (Concept X Y))