Lean 4 formalization of the fundamental theorem of statistical learning, with paradigm separations and a measurability refinement.
21,522 lines · 354 theorems · 53 files ·
mathlib4 fde0cc5
A typed premise with 42 nodes scoped a human-guided, AI-driven proof search across three learning paradigms (PAC, Online, and Gold-style). Every theorem in the kernel is fully proved.
The infrastructure the types demanded produced mathematics the premise did not predict: a Borel-analytic separation correcting Krapp-Wirth 2024, a compression scheme via approximate minimax (multiplicative weights update), and a monadic structure on measurable batch learners closed under Boolean combination, majority vote, and piecewise interpolation.
The only prior attempt at formalizing PAC learning theory with VC dimension is Google's formal-ml, incomplete in Lean 3 with one sorry. Zhang et al.'s lean-stat-learning-theory is complementary: it formalizes concentration inequalities, covering numbers, and Dudley's entropy integral in Lean 4. This kernel formalizes the characterization theorems, paradigm separations, and measurability theory that theirs does not.