Formal Learning Theory Kernel: Blueprint v1

3.3 The Fundamental Theorem of Statistical Learning

The VC characterization (Theorem 3.7) is the core equivalence. But the full picture is richer: PAC learnability is equivalent to nine conditions, not just three. The Fundamental Theorem packages them all.

Theorem 3.17 The Fundamental Theorem of Statistical Learning [

Let \(\mathcal{H}\) be a hypothesis class of binary functions over a domain \(X\). The following are equivalent:

  1. \(\mathcal{H}\) is PAC learnable.

  2. \(\mathcal{H}\) is agnostic PAC learnable.

  3. \(\operatorname{ERM}_\mathcal {H}\) is a PAC learner for \(\mathcal{H}\).

  4. \(\mathcal{H}\) has the uniform convergence property.

  5. \(\mathrm{VCdim}(\mathcal{H}) {\lt} \infty \).

  6. \(\Pi _\mathcal {H}(m) {\lt} 2^m\) for some \(m\).

  7. \(\mathcal{H}\) has a finite compression scheme.

  8. Any finite subclass of \(\mathcal{H}\) over a finite domain is PAC learnable (with bounds depending on \(|\mathcal{H}|\)), and this property “lifts” to \(\mathcal{H}\).

  9. \(\mathcal{H}\) has finite Littlestone dimension \(\Longrightarrow \) \(\mathcal{H}\) is PAC learnable. (The converse fails: this implication is strict. See Chapter 6.)

Remark 3.18 On the numbering

Condition 9 is an implication, not an equivalence: finite Littlestone dimension implies finite VC dimension (see the textbook’s Chapter 10, Combinatorial Dimensions), but the converse fails, thresholds on \(\mathbb {R}\) have \(\mathrm{VCdim}= 1\) but \(\mathrm{Ldim}= \infty \) (??). We include it to mark the boundary of the equivalence web: this is where the PAC–online separation begins.

\begin{tikzpicture} [
    node/.style={draw, thick, rounded corners, minimum width=2.4cm,
      minimum height=0.8cm, font=\small, align=center, fill=defblue!8},
    edge/.style={-{Stealth[length=2mm]}, thick},
    bidir/.style={{Stealth[length=2mm]}-{Stealth[length=2mm]}, thick},
    note/.style={font=\tiny, text=notegray, align=center, fill=white,
      inner sep=1pt},
    node distance=1.8cm and 2.2cm
]
% Core nodes
\node[node, fill=thmred!12] (vc) {$\VC(\mathcal{H}) < \infty$\\\ref{F:finvc}};
\node[node, right=2.5cm of vc] (uc) {Uniform\\convergence\\\ref{F:uc}};
\node[node, right=2.5cm of uc] (erm) {ERM is PAC\\\ref{F:erm}};

\node[node, below=2cm of vc] (growth) {$\growth(m) < 2^m$\\\ref{F:fingrowth}};
\node[node, below=2cm of uc, fill=thmred!12] (pac) {PAC\\learnable\\\ref{F:pac}};
\node[node, below=2cm of erm] (agpac) {Agnostic\\PAC\\\ref{F:agpac}};

\node[node, below=2cm of growth] (comp) {Finite\\compression\\\ref{F:compression}};
\node[node, below=2cm of pac] (finite) {Finiteness\\property\\\ref{F:finiteness}};
\node[node, right=2.5cm of finite, draw=edgeorange, thick, fill=edgeorange!8]
  (ldim) {$\Ldim < \infty$\\\ref{F:online}\\(strict)};

% Equivalence edges (bidirectional)
\draw[bidir, defblue] (vc) -- node[note, above] {Sauer--Shelah\\$+$ converse} (uc);
\draw[bidir, defblue] (uc) -- node[note, above] {Prop.~\ref{prop:erm-pac}} (erm);
\draw[bidir, defblue] (vc) -- node[note, left] {Sauer} (growth);
\draw[bidir, defblue] (uc) -- node[note, right] {} (pac);
\draw[bidir, defblue] (pac) -- node[note, above] {} (erm);
\draw[bidir, defblue] (pac) -- node[note, right] {} (agpac);
\draw[bidir, defblue] (growth) -- node[note, left] {} (comp);
\draw[bidir, defblue] (pac) -- node[note, right] {} (finite);

% The strict implication
\draw[edge, edgeorange, dashed] (ldim) -- node[note, below=3pt, text=edgeorange]
  {strict: $\Ldim < \infty \Rightarrow \VC < \infty$\\but not conversely} (finite);

% Main diagonal
\draw[bidir, thmred, very thick] (vc) -- node[note, right, text=thmred]
  {\textbf{Core}} (pac);
\end{tikzpicture}
Figure 3.1 The equivalence web of the Fundamental Theorem. All solid bidirectional arrows are full equivalences. The dashed arrow from \(\mathrm{Ldim}{\lt} \infty \) is strict: finite Littlestone dimension implies PAC learnability, but not conversely. The thick red diagonal marks the VC characterization, the core equivalence proved in Section 3.2.