Formal Learning Theory Kernel: Blueprint v1

5.6 Three Paradigms, Incomparable

We now have three learning paradigms: PAC (Chapter 3), online (Chapter 4), and Gold. Each defines “learnable” differently. The question is: what is the relationship?

The answer is that they are pairwise incomparable. No paradigm subsumes another. There exist classes learnable under one criterion but not another, in every direction. This is the most important structural fact about the landscape of learning theory.

Theorem (Three-Paradigm Separation). The following four separations hold:

  1. \(\mathbf{Ex}\)-learnable but not PAC-learnable. The class \(\mathcal{L}_{\mathrm{fin}}\) of all finite subsets of \(\mathbb {N}\) is \(\mathbf{Ex}\)-identifiable from text (Example 5.3). But the associated concept class has infinite VC dimension, any finite set of points can be shattered, so it is not PAC-learnable.

    The witness exploits the fundamental difference in how the two paradigms treat “all finite subsets.” \(\mathbf{Ex}\)-identification succeeds because on any fixed finite set, the learner eventually sees all elements. PAC fails because the learner must handle all finite sets simultaneously with bounded sample complexity, and the VC dimension is infinite.

  2. PAC-learnable but not \(\mathbf{Ex}\)-identifiable. The class of threshold functions \(\mathcal{C} = \{ x \mapsto \mathbf{1}[x \geq \theta ] : \theta \in \mathbb {R}\} \) over \(\mathbb {R}\) is PAC-learnable (\(\mathrm{VCdim}= 1\)). But it is not \(\mathbf{Ex}\)-identifiable from text, because a text for a threshold language \(\{ \theta , \theta +1, \theta +2, \ldots \} \) reveals the threshold only in the limit, and the class of all such languages together with all finite languages triggers Gold’s impossibility theorem.

  3. Online-learnable but not \(\mathbf{Ex}\)-identifiable. The class of singletons \(\mathcal{C} = \{ \{ x\} : x \in X\} \) has Littlestone dimension 1 (online-learnable with at most 1 mistake). But the class of all singletons together with the empty set is not \(\mathbf{Ex}\)-identifiable from text for reasons analogous to Gold’s theorem: the learner cannot distinguish “no more positive examples will come” from “the next positive example has not yet arrived.”

  4. \(\mathbf{Ex}\)-identifiable but not online-learnable. Consider any class \(\mathcal{L}\) of recursive languages that is \(\mathbf{Ex}\)-identifiable and has infinite Littlestone dimension. Such classes exist: the class of all pattern languages (Angluin, 1980) is \(\mathbf{Ex}\)-identifiable from text but has infinite Littlestone dimension over suitable domains.

\begin{tikzpicture} [
    territory/.style={draw, thick, rounded corners=8pt, minimum width=3.5cm, minimum height=2.5cm, font=\small\bfseries, align=center},
    witness/.style={font=\scriptsize, text=edgeorange, fill=white, inner sep=2pt, align=center},
    arrow/.style={-{Stealth[length=2mm]}, thick, thmred, dashed}
]

% Three paradigms as territories
\node[territory, fill=defblue!8] (pac) at (0,0) {PAC\\{\scriptsize $\varepsilon$-$\delta$, distributions}\\{\scriptsize VC dimension}};
\node[territory, fill=goldcolor!8] (gold) at (5,0) {Gold\\{\scriptsize convergence, no $\varepsilon$}\\{\scriptsize ordinal complexity}};
\node[territory, fill=onlineblue!8] (online) at (2.5,3.5) {Online\\{\scriptsize adversarial, regret}\\{\scriptsize Littlestone dim}};

% Separation arrows (does_not_imply)
\draw[arrow] (pac) -- node[witness, below] {thresholds:\\PAC yes, $\Ex$ no} (gold);
\draw[arrow] (gold) -- node[witness, above] {all finite sets:\\$\Ex$ yes, PAC no} (pac);

\draw[arrow] (online) -- node[witness, left, xshift=-2pt] {singletons+$\emptyset$:\\Online yes, $\Ex$ no} (gold);
\draw[arrow] (gold) -- node[witness, right, xshift=2pt] {patterns:\\$\Ex$ yes, Online no} (online);

\draw[arrow] (pac) -- node[witness, left, xshift=-3pt] {thresholds:\\PAC yes,\\Online no ($\Ldim=\infty$)} (online);
\draw[arrow] (online) -- node[witness, right, xshift=3pt] {countable $\Ldim<\infty$,\\$\VC=\infty$} (pac);

\end{tikzpicture}
Figure 5.2 The three-paradigm separation. Each pair of paradigms is separated in both directions by explicit witnesses. No paradigm subsumes another. The dashed arrows represent does_not_imply edges in the graph, and each arrow is labelled with its witness class.
Remark 5.16 What the separations mean

The three-paradigm separation is not a deficiency of the definitions. It reflects a genuine mathematical fact: different notions of “learning from data” capture different aspects of the learning problem, and no single notion is universal. PAC learning measures statistical generalization under distributional assumptions. Online learning measures worst-case sequential prediction. Gold-style identification measures eventual convergence without any accuracy or efficiency guarantee. These are three different mathematical questions, and they have three different answers.