4.7 What This Chapter Established
The online learning model replaces PAC’s probabilistic framework with a game between learner and adversary. The key structural results are:
The Littlestone characterization. The optimal mistake bound for online learning equals the Littlestone dimension \(\mathrm{Ldim}(\mathcal{H})\), the maximum depth of a complete mistake tree. This is a combinatorial characterization, in contrast to the VC dimension’s set-combinatorial characterization of PAC learning.
The SOA algorithm. The upper bound is constructive: the Standard Optimal Algorithm achieves \(\mathrm{Ldim}(\mathcal{H})\) mistakes by predicting the label whose version space has the larger Littlestone dimension. Each mistake provably reduces the Littlestone dimension of the version space by at least one.
The adversary strategy. The lower bound is also constructive: the adversary traverses a complete mistake tree from root to leaf, labeling each instance to contradict the learner’s prediction.
The PAC–Online gap. \(\mathrm{VCdim}(\mathcal{H}) \leq \mathrm{Ldim}(\mathcal{H})\) always, but the gap can be infinite (thresholds: \(\mathrm{VCdim}= 1\), \(\mathrm{Ldim}= \infty \)). Online learnability implies PAC learnability, but not conversely.
The regret framework. Regret bounds, measuring performance relative to the best fixed hypothesis, extend online learning beyond the realizable setting. The Littlestone dimension governs optimal regret rates even in the agnostic case.
The combinatorial and algorithmic flavor of this chapter is not accidental. Online learning theory is fundamentally about trees and algorithms, where PAC learning theory is about sets and probabilities. Both paradigms measure the complexity of the same object (a hypothesis class \(\mathcal{H}\)), but through different instruments, and the instruments are not interchangeable.