4 Online Learning and the Littlestone Dimension
In PAC learning, nature is indifferent: training data arrives as a random sample from a fixed distribution, and the learner’s task is to generalize from this benign source. In online learning, nature is replaced by an adversary. There is no distribution. There is no training phase followed by a test phase. Instead, learning unfolds as a game: at each round, the adversary presents an instance, the learner predicts a label, the adversary reveals the true label, and the game continues. The learner’s goal is to bound the total number of mistakes, no matter what the adversary does.
This change, from distribution to adversary, from batch to sequential, from probabilistic to combinatorial, transforms the mathematics entirely. PAC learning is characterized by a number (the VC dimension) through a probabilistic argument (concentration inequalities). Online learning is characterized by a tree (the mistake tree) through a combinatorial argument (an explicit algorithm). The key theorem of this chapter, that the optimal mistake bound equals the Littlestone dimension, is proved not by bounding tail probabilities but by constructing an algorithm that plays the game optimally.