3.3 The Fundamental Theorem of Statistical Learning

The VC characterization (Theorem 3.7) is the core equivalence. But the full picture is richer: PAC learnability is equivalent to nine conditions, not just three. The Fundamental Theorem packages them all.

Theorem 3.17 The Fundamental Theorem of Statistical Learning [

Let $\mathcal{H}$ be a hypothesis class of binary functions over a domain $X$. The following are equivalent:

$\mathcal{H}$ is PAC learnable.
$\mathcal{H}$ is agnostic PAC learnable.
$\operatorname{ERM}_\mathcal {H}$ is a PAC learner for $\mathcal{H}$.
$\mathcal{H}$ has the uniform convergence property.
$\mathrm{VCdim}(\mathcal{H}) {\lt} \infty $.
$\Pi _\mathcal {H}(m) {\lt} 2^m$ for some $m$.
$\mathcal{H}$ has a finite compression scheme.
Any finite subclass of $\mathcal{H}$ over a finite domain is PAC learnable (with bounds depending on $|\mathcal{H}|$), and this property “lifts” to $\mathcal{H}$.
$\mathcal{H}$ has finite Littlestone dimension $\Longrightarrow $ $\mathcal{H}$ is PAC learnable. (The converse fails: this implication is strict. See Chapter 6.)

Remark 3.18 On the numbering

Condition 9 is an implication, not an equivalence: finite Littlestone dimension implies finite VC dimension (see the textbook’s Chapter 10, Combinatorial Dimensions), but the converse fails, thresholds on $\mathbb {R}$ have $\mathrm{VCdim}= 1$ but $\mathrm{Ldim}= \infty $ (??). We include it to mark the boundary of the equivalence web: this is where the PAC–online separation begins.

\begin{tikzpicture} [
node/.style={draw, thick, rounded corners, minimum width=2.4cm,
minimum height=0.8cm, font=\small, align=center, fill=defblue!8},
edge/.style={-{Stealth[length=2mm]}, thick},
bidir/.style={{Stealth[length=2mm]}-{Stealth[length=2mm]}, thick},
note/.style={font=\tiny, text=notegray, align=center, fill=white,
inner sep=1pt},
node distance=1.8cm and 2.2cm
]
% Core nodes
\node[node, fill=thmred!12] (vc) {$\VC(\mathcal{H}) < \infty$\\\ref{F:finvc}};
\node[node, right=2.5cm of vc] (uc) {Uniform\\convergence\\\ref{F:uc}};
\node[node, right=2.5cm of uc] (erm) {ERM is PAC\\\ref{F:erm}};

\node[node, below=2cm of vc] (growth) {$\growth(m) < 2^m$\\\ref{F:fingrowth}};
\node[node, below=2cm of uc, fill=thmred!12] (pac) {PAC\\learnable\\\ref{F:pac}};
\node[node, below=2cm of erm] (agpac) {Agnostic\\PAC\\\ref{F:agpac}};

\node[node, below=2cm of growth] (comp) {Finite\\compression\\\ref{F:compression}};
\node[node, below=2cm of pac] (finite) {Finiteness\\property\\\ref{F:finiteness}};
\node[node, right=2.5cm of finite, draw=edgeorange, thick, fill=edgeorange!8]
(ldim) {$\Ldim < \infty$\\\ref{F:online}\\(strict)};

% Equivalence edges (bidirectional)
\draw[bidir, defblue] (vc) -- node[note, above] {Sauer--Shelah\\$+$ converse} (uc);
\draw[bidir, defblue] (uc) -- node[note, above] {Prop.~\ref{prop:erm-pac}} (erm);
\draw[bidir, defblue] (vc) -- node[note, left] {Sauer} (growth);
\draw[bidir, defblue] (uc) -- node[note, right] {} (pac);
\draw[bidir, defblue] (pac) -- node[note, above] {} (erm);
\draw[bidir, defblue] (pac) -- node[note, right] {} (agpac);
\draw[bidir, defblue] (growth) -- node[note, left] {} (comp);
\draw[bidir, defblue] (pac) -- node[note, right] {} (finite);

% The strict implication
\draw[edge, edgeorange, dashed] (ldim) -- node[note, below=3pt, text=edgeorange]
{strict: $\Ldim < \infty \Rightarrow \VC < \infty$\\but not conversely} (finite);

% Main diagonal
\draw[bidir, thmred, very thick] (vc) -- node[note, right, text=thmred]
{\textbf{Core}} (pac);
\end{tikzpicture} — Figure 3.1 The equivalence web of the Fundamental Theorem. All solid bidirectional arrows are full equivalences. The dashed arrow from $\mathrm{Ldim}{\lt} \infty $ is strict: finite Littlestone dimension implies PAC learnability, but not conversely. The thick red diagonal marks the VC characterization, the core equivalence proved in Section 3.2.