5 Identification in the Limit
In 1967, seventeen years before Valiant introduced PAC learning, E. Mark Gold posed a question that remains the oldest open research programme in computational learning theory: which classes of recursive functions can a machine identify from examples?
Gold’s framework differs from PAC and online learning not merely in its definitions but in its proof technique. PAC theory rests on concentration inequalities, the probabilistic convergence of empirical risk to true risk. Online learning rests on combinatorial game trees, the Littlestone dimension measures the depth of a binary tree that the adversary can force. Gold-style identification rests on diagonalization, the recursion-theoretic technique of defeating every candidate learner by constructing an adversarial input that forces failure. The proof method shapes the entire chapter.
Three features distinguish this paradigm from everything that precedes it in this book:
Infinite horizon. The learner receives an infinite stream of data and must eventually converge. There is no sample complexity bound, no \(\varepsilon \), no \(\delta \). The question is whether the learner converges, not how fast.
Ordinal-valued complexity. When we ask “how many times does the learner change its mind?” the answer is not an integer but a transfinite ordinal. This is the first appearance of ordinals in learning theory.
A lattice of success criteria. PAC learning has one definition (modulo agnostic, realizable, improper variants). Gold-style identification has a rich hierarchy: \(\mathbf{FIN}{\lt} \mathbf{Ex}{\lt} \mathbf{BC}\), with anomalous, monotonic, and vacillatory learning branching off. The hierarchy itself is mathematical content.
Gold’s 1967 paper [ was preceded by his 1965 paper on limiting recursion [ , which introduced the idea that a computation could “converge in the limit” even if it never halts in the classical sense. The 1967 paper applied this idea to language learning. It was the first mathematical theory of learning from examples, predating Valiant’s PAC model by 17 years, Vapnik and Chervonenkis’s foundational work on uniform convergence by 4 years, and Littlestone’s online model by 21 years. The diagonalization technique Gold used to prove his impossibility theorem later became standard in computational complexity theory, but Gold’s use predates most of those applications.