4.1 The Online Learning Game

Online learning is best understood as a protocol between two players: a learner and an adversary. The game is played over a domain $X$, a label set $Y = \{ 0,1\} $, and a hypothesis class $\mathcal{H} \subseteq \{ 0,1\} ^X$. The adversary is constrained to be realizable: there must exist some $h^* \in \mathcal{H}$ consistent with all of the adversary’s labels. Beyond this, the adversary is unconstrained, in particular, the adversary may be adaptive, choosing $x_t$ based on the learner’s previous predictions.

Definition 4.1 Online Learning Protocol

Lean: OnlineLearnable

The online learning game proceeds in rounds $t = 1, 2, 3, \ldots $ :

The adversary selects an instance $x_t \in X$.
The learner observes $x_t$ and predicts a label $\hat{y}_t \in \{ 0,1\} $.
The adversary reveals the true label $y_t \in \{ 0,1\} $.
If $\hat{y}_t \neq y_t$, the learner incurs a mistake.

The game continues for as many rounds as the adversary chooses. The adversary must ensure that there exists $h^* \in \mathcal{H}$ with $h^*(x_t) = y_t$ for all $t$ (realizability).

\begin{tikzpicture} [
player/.style={draw, thick, rounded corners, minimum width=2.4cm, minimum height=0.8cm, font=\small\bfseries},
action/.style={font=\small, align=center},
arr/.style={-{Stealth[length=2.5mm]}, thick},
node distance=1.2cm
]
% Players
\node[player, fill=thmred!10] (adv) {Adversary};
\node[player, fill=defblue!10, right=5cm of adv] (lrn) {Learner};

% Round structure
\draw[arr, thmred!70!black] (adv.east) -- ++(1.2,0) |- node[action, above, pos=0.25] {presents $x_t$} ($(lrn.west)+(0,0.25)$);
\draw[arr, defblue!70!black] (lrn.west) ++(-0.0,0.0) -- ++(-1.2,0) |- node[action, below, pos=0.25] {predicts $\hat{y}_t$} ($(adv.east)+(0,-0.25)$);

% Reveal
\node[action, below=1.8cm of $(adv.south east)!0.5!(lrn.south west)$, font=\small] (reveal) {Adversary reveals $y_t$. \quad Mistake if $\hat{y}_t \neq y_t$.};
\draw[arr, notegray, dashed] ($(adv.south)!0.5!(lrn.south)+(0,-0.3)$) -- (reveal);

% Constraint
\node[below=0.6cm of reveal, font=\scriptsize\itshape, text=notegray, align=center] {Constraint: $\exists\, h^* \in \mathcal{H}$ such that $h^*(x_t) = y_t$ for all $t$.};
\end{tikzpicture} — Figure 4.1 The online learning game. The adversary and learner interact sequentially; the adversary is constrained only by realizability. There is no distribution, no random sampling, and no distinction between training and test phases.

Three features distinguish this game from the PAC framework of Chapter 3:

No distribution. In PAC learning, performance is measured with respect to an unknown but fixed distribution $D$. In online learning, there is no distribution at all. The adversary’s choices need not be drawn from any probability measure.
The adversary is adaptive. The adversary sees the learner’s predictions and can choose future instances to exploit the learner’s weaknesses. This is strictly harder than random sampling: an adaptive adversary can do everything that a random distribution can, and more.
Performance is worst-case over sequences. The mistake bound must hold for every sequence of instances and labels the adversary might produce, subject to realizability. There is no averaging.

Definition 4.2 Mistake Bound

Lean: OnlineLearnable

A learner $A$ is mistake-bounded with bound $M$ on $\mathcal{H}$ if, for every adversary strategy consistent with some $h^* \in \mathcal{H}$, the total number of mistakes made by $A$ is at most $M$. The optimal mistake bound of $\mathcal{H}$ is $\mathrm{Opt}(\mathcal{H}) = \min _A \max _{\mathrm{adversary}} \# \{ \text{mistakes of } A\} $.

The central question of online learning theory is: what determines $\mathrm{Opt}(\mathcal{H})$? The answer is a combinatorial object called the Littlestone dimension.