5.6 Three Paradigms, Incomparable
We now have three learning paradigms: PAC (Chapter 3), online (Chapter 4), and Gold. Each defines “learnable” differently. The question is: what is the relationship?
The answer is that they are pairwise incomparable. No paradigm subsumes another. There exist classes learnable under one criterion but not another, in every direction. This is the most important structural fact about the landscape of learning theory.
Theorem (Three-Paradigm Separation). The following four separations hold:
\(\mathbf{Ex}\)-learnable but not PAC-learnable. The class \(\mathcal{L}_{\mathrm{fin}}\) of all finite subsets of \(\mathbb {N}\) is \(\mathbf{Ex}\)-identifiable from text (Example 5.3). But the associated concept class has infinite VC dimension, any finite set of points can be shattered, so it is not PAC-learnable.
The witness exploits the fundamental difference in how the two paradigms treat “all finite subsets.” \(\mathbf{Ex}\)-identification succeeds because on any fixed finite set, the learner eventually sees all elements. PAC fails because the learner must handle all finite sets simultaneously with bounded sample complexity, and the VC dimension is infinite.
PAC-learnable but not \(\mathbf{Ex}\)-identifiable. The class of threshold functions \(\mathcal{C} = \{ x \mapsto \mathbf{1}[x \geq \theta ] : \theta \in \mathbb {R}\} \) over \(\mathbb {R}\) is PAC-learnable (\(\mathrm{VCdim}= 1\)). But it is not \(\mathbf{Ex}\)-identifiable from text, because a text for a threshold language \(\{ \theta , \theta +1, \theta +2, \ldots \} \) reveals the threshold only in the limit, and the class of all such languages together with all finite languages triggers Gold’s impossibility theorem.
Online-learnable but not \(\mathbf{Ex}\)-identifiable. The class of singletons \(\mathcal{C} = \{ \{ x\} : x \in X\} \) has Littlestone dimension 1 (online-learnable with at most 1 mistake). But the class of all singletons together with the empty set is not \(\mathbf{Ex}\)-identifiable from text for reasons analogous to Gold’s theorem: the learner cannot distinguish “no more positive examples will come” from “the next positive example has not yet arrived.”
\(\mathbf{Ex}\)-identifiable but not online-learnable. Consider any class \(\mathcal{L}\) of recursive languages that is \(\mathbf{Ex}\)-identifiable and has infinite Littlestone dimension. Such classes exist: the class of all pattern languages (Angluin, 1980) is \(\mathbf{Ex}\)-identifiable from text but has infinite Littlestone dimension over suitable domains.
The three-paradigm separation is not a deficiency of the definitions. It reflects a genuine mathematical fact: different notions of “learning from data” capture different aspects of the learning problem, and no single notion is universal. PAC learning measures statistical generalization under distributional assumptions. Online learning measures worst-case sequential prediction. Gold-style identification measures eventual convergence without any accuracy or efficiency guarantee. These are three different mathematical questions, and they have three different answers.