Formal Learning Theory Kernel: Blueprint v1

3.7 What This Chapter Established

The central achievement is the Fundamental Theorem (Theorem 3.17), which says that the following are all the same property of a binary hypothesis class \(\mathcal{H}\):

Condition

First established

Proof location

Finite VC dimension

Vapnik–Chervonenkis (1971)

Section 3.2

Uniform convergence

Vapnik–Chervonenkis (1971)

Theorem 3.12

ERM is a PAC learner

Blumer et al. (1989)

Proposition 3.14

PAC learnable

Valiant (1984)

Theorem 3.7

Agnostic PAC learnable

Haussler (1992)

Section 3.4

\(\Pi (m) {\lt} 2^m\) for some \(m\)

Sauer–Shelah (1972)

Section 3.3.1

Finite compression scheme

Moran–Yehudayoff (2016)

Section 3.3.2

Four structural lessons emerge:

  1. VC dimension is the right measure. Not because it was first, but because it sits at the center of a seven-way equivalence.

  2. The \(\varepsilon ^2\) price is real. The agnostic setting costs \(1/\varepsilon ^2\) where the realizable setting costs \(1/\varepsilon \). This is not a proof artifact, it is a lower bound.

  3. ERM is canonical but not unique. The Fundamental Theorem says ERM works whenever anything works, but other algorithms (compression-based, Bayesian) also achieve PAC guarantees under the same conditions.

  4. Information \(\neq \) computation. The Fundamental Theorem characterizes when learning is possible; the Kearns–Valiant barrier shows this does not determine when learning is efficient. The computational landscape is the subject of the textbook’s Chapter 16 (Computational Hardness).

Timeline. Vapnik and Chervonenkis introduced the VC dimension and proved the uniform convergence direction in 1971. Valiant formalized PAC learning in 1984, without the VC connection. Blumer, Ehrenfeucht, Haussler, and Warmuth closed the loop in 1989, proving that finite VC dimension is both necessary and sufficient for PAC learning. The compression equivalence was conjectured by Littlestone and Warmuth (1986) and proved by Moran and Yehudayoff (2016), a 30-year gap that reflects the depth of the combinatorial argument. The tight realizable sample complexity \(\Theta (d/\varepsilon )\) (removing logarithmic factors) was established by Hanneke  [ .