Formal Learning Theory Kernel

Lean 4 formalization of the fundamental theorem of statistical learning, with paradigm separations and a measurability refinement.

21,522 lines · 354 theorems · 53 files · mathlib4 fde0cc5

A typed premise with 42 nodes scoped a human-guided, AI-driven proof search across three learning paradigms (PAC, Online, and Gold-style). Every theorem in the kernel is fully proved.

The infrastructure the types demanded produced mathematics the premise did not predict: a Borel-analytic separation correcting Krapp-Wirth 2024, a compression scheme via approximate minimax (multiplicative weights update), and a monadic structure on measurable batch learners closed under Boolean combination, majority vote, and piecewise interpolation.

Main results

Five-way equivalence for PAC learning, with the NullMeasurableSet refinement of the standard Borel hypothesis
Littlestone dimension characterization for online learning
Gold's theorem and mind-change characterization for text-based learning
Three-paradigm separation: PAC ⇏ Online, Gold ⇏ PAC, Online ⇒ PAC (unconditional)
Borel-analytic separation theorem, witnessed by a singleton class over an analytic non-Borel set
Compression via approximate minimax (multiplicative weights update)
Measurable batch learner monad with version-space instance and closure algebra

Navigate

API documentation Per-module doc-gen4 reference with mathlib cross-links and per-declaration docstrings Project README Full contributions table, related work, paradigm overview, proof engineering notes Source repository GitHub, Apache 2.0 Mathematical blueprint Repo-specific copy of the relevant chapters from the companion textbook, annotated with Lean 4 hyperlinks. Also available as PDF at /blueprint.pdf. For the full textbook, see Zetetic-Dhruv/formal-learning-theory-book on GitHub. Proof engineering In preparation. Audience: formalization specialists. Covers the typed proof operad (TPG_FLT) and the measurable inner event metaprogram.

Related work

The only prior attempt at formalizing PAC learning theory with VC dimension is Google's formal-ml, incomplete in Lean 3 with one sorry. Zhang et al.'s lean-stat-learning-theory is complementary: it formalizes concentration inequalities, covering numbers, and Dudley's entropy integral in Lean 4. This kernel formalizes the characterization theorems, paradigm separations, and measurability theory that theirs does not.