4.1 The Online Learning Game
Online learning is best understood as a protocol between two players: a learner and an adversary. The game is played over a domain \(X\), a label set \(Y = \{ 0,1\} \), and a hypothesis class \(\mathcal{H} \subseteq \{ 0,1\} ^X\). The adversary is constrained to be realizable: there must exist some \(h^* \in \mathcal{H}\) consistent with all of the adversary’s labels. Beyond this, the adversary is unconstrained, in particular, the adversary may be adaptive, choosing \(x_t\) based on the learner’s previous predictions.
Lean: OnlineLearnable
The online learning game proceeds in rounds \(t = 1, 2, 3, \ldots \) :
The adversary selects an instance \(x_t \in X\).
The learner observes \(x_t\) and predicts a label \(\hat{y}_t \in \{ 0,1\} \).
The adversary reveals the true label \(y_t \in \{ 0,1\} \).
If \(\hat{y}_t \neq y_t\), the learner incurs a mistake.
The game continues for as many rounds as the adversary chooses. The adversary must ensure that there exists \(h^* \in \mathcal{H}\) with \(h^*(x_t) = y_t\) for all \(t\) (realizability).
Three features distinguish this game from the PAC framework of Chapter 3:
No distribution. In PAC learning, performance is measured with respect to an unknown but fixed distribution \(D\). In online learning, there is no distribution at all. The adversary’s choices need not be drawn from any probability measure.
The adversary is adaptive. The adversary sees the learner’s predictions and can choose future instances to exploit the learner’s weaknesses. This is strictly harder than random sampling: an adaptive adversary can do everything that a random distribution can, and more.
Performance is worst-case over sequences. The mistake bound must hold for every sequence of instances and labels the adversary might produce, subject to realizability. There is no averaging.
Lean: OnlineLearnable
A learner \(A\) is mistake-bounded with bound \(M\) on \(\mathcal{H}\) if, for every adversary strategy consistent with some \(h^* \in \mathcal{H}\), the total number of mistakes made by \(A\) is at most \(M\). The optimal mistake bound of \(\mathcal{H}\) is \(\mathrm{Opt}(\mathcal{H}) = \min _A \max _{\mathrm{adversary}} \# \{ \text{mistakes of } A\} \).
The central question of online learning theory is: what determines \(\mathrm{Opt}(\mathcal{H})\)? The answer is a combinatorial object called the Littlestone dimension.