Formal Learning Theory Kernel: Blueprint v1

4.5 Regret Bounds and the Multiplicative Weights Framework

The mistake-bound framework requires that the adversary be realizable: some \(h^* \in \mathcal{H}\) must be consistent with all labels. This is a strong assumption. What happens when we drop it?

In the regret framework, the adversary is unrestricted. The learner is compared not to the truth but to the best fixed hypothesis in hindsight.

Definition 4.13 Regret

Given a sequence \((x_1, y_1), \ldots , (x_T, y_T)\) and the learner’s predictions \(\hat{y}_1, \ldots , \hat{y}_T\), the regret of the learner relative to \(\mathcal{H}\) is

\[ \mathrm{Regret}_T = \sum _{t=1}^T \mathbf{1}[\hat{y}_t \neq y_t] - \min _{h \in \mathcal{H}} \sum _{t=1}^T \mathbf{1}[h(x_t) \neq y_t]. \]

This definition contains a conceptual surprise: the regret can be small even when the learner makes many mistakes. What matters is not the absolute number of errors, but how the learner’s error count compares to the best fixed hypothesis in \(\mathcal{H}\). If every hypothesis in \(\mathcal{H}\) also makes many mistakes (because the adversary is not realizable), then a learner with many mistakes but low regret is performing well.

Remark 4.14 Mistake bound vs. regret bound

In the realizable case, the best hypothesis \(h^*\) makes zero mistakes, so \(\mathrm{Regret}_T\) equals the total mistake count. The regret framework is therefore a strict generalization of the mistake-bound framework. The regret formulation becomes essential when \(\mathcal{H}\) is infinite or when realizability fails, precisely the settings where the Littlestone dimension may be infinite and the mistake-bound framework provides no guarantee.

The Weighted Majority algorithm of Littlestone and Warmuth  [ achieves the following regret bound for finite \(\mathcal{H}\).

Theorem 4.15 Weighted Majority / Hedge

For a finite hypothesis class \(\mathcal{H}\) with \(|\mathcal{H}| = N\), the Weighted Majority algorithm achieves, for any sequence of \(T\) rounds:

\[ \mathrm{Regret}_T \leq 2\sqrt{T \ln N}. \]

In particular, the per-round regret \(\mathrm{Regret}_T / T \to 0\) as \(T \to \infty \).

The algorithm maintains a weight \(w_t(h)\) for each \(h \in \mathcal{H}\), initially \(w_1(h) = 1\). At each round: predict the weighted majority vote; after observing \(y_t\), multiply the weight of each hypothesis that erred by a factor \((1 - \eta )\) for a learning rate \(\eta \in (0, 1)\). Setting \(\eta = \sqrt{\ln N / T}\) (or using the doubling trick when \(T\) is unknown) yields the bound above.

This approach generalizes to the multiplicative weights framework (also called Hedge), which extends beyond binary prediction to decision-making over finitely many “experts.” The framework is one of the most widely applicable algorithmic paradigms in theoretical computer science, with applications to zero-sum games, boosting, and linear programming  [ .

Remark 4.16 Connection to Littlestone dimension

Ben-David, Pál, and Shalev-Shwartz  [ showed that for infinite hypothesis classes, the Littlestone dimension also governs the optimal regret rate. Specifically, the minimax regret over \(T\) rounds is \(\Theta (\sqrt{\mathrm{Ldim}(\mathcal{H}) \cdot T})\) in the agnostic (non-realizable) setting. This connects the combinatorial tree structure of the Littlestone dimension to the sequential prediction framework even beyond realizability.