3.3 The Fundamental Theorem of Statistical Learning
The VC characterization (Theorem 3.7) is the core equivalence. But the full picture is richer: PAC learnability is equivalent to nine conditions, not just three. The Fundamental Theorem packages them all.
Let \(\mathcal{H}\) be a hypothesis class of binary functions over a domain \(X\). The following are equivalent:
\(\mathcal{H}\) is PAC learnable.
\(\mathcal{H}\) is agnostic PAC learnable.
\(\operatorname{ERM}_\mathcal {H}\) is a PAC learner for \(\mathcal{H}\).
\(\mathcal{H}\) has the uniform convergence property.
\(\mathrm{VCdim}(\mathcal{H}) {\lt} \infty \).
\(\Pi _\mathcal {H}(m) {\lt} 2^m\) for some \(m\).
\(\mathcal{H}\) has a finite compression scheme.
Any finite subclass of \(\mathcal{H}\) over a finite domain is PAC learnable (with bounds depending on \(|\mathcal{H}|\)), and this property “lifts” to \(\mathcal{H}\).
\(\mathcal{H}\) has finite Littlestone dimension \(\Longrightarrow \) \(\mathcal{H}\) is PAC learnable. (The converse fails: this implication is strict. See Chapter 6.)
Condition 9 is an implication, not an equivalence: finite Littlestone dimension implies finite VC dimension (see the textbook’s Chapter 10, Combinatorial Dimensions), but the converse fails, thresholds on \(\mathbb {R}\) have \(\mathrm{VCdim}= 1\) but \(\mathrm{Ldim}= \infty \) (??). We include it to mark the boundary of the equivalence web: this is where the PAC–online separation begins.