4.2 Mistake Trees and the Littlestone Dimension

The VC dimension counts the size of the largest set that $\mathcal{H}$ can shatter. The Littlestone dimension counts the depth of the largest tree that $\mathcal{H}$ can shatter. This shift, from sets to trees, is what captures the adversary’s adaptive power.

Definition 4.3 Mistake Tree

Lean: LTree.isShattered

A mistake tree for $\mathcal{H}$ over $X$ is a complete binary tree $T$ in which:

[leftmargin=*, itemsep=2pt]
Each internal node $v$ is labeled with an instance $x_v \in X$.
The left child of $v$ corresponds to the label $y = 0$ and the right child to $y = 1$.
For every root-to-leaf path $(v_1, y_1), (v_2, y_2), \ldots , (v_d, y_d)$, there exists a hypothesis $h \in \mathcal{H}$ consistent with all labels along the path: $h(x_{v_i}) = y_i$ for all $i$.

The depth of the tree is the number of internal nodes on any root-to-leaf path (all paths have the same length since the tree is complete).

The key idea is that a mistake tree encodes an adversary strategy. Starting at the root, the adversary presents $x_{v_1}$. Whatever the learner predicts, the adversary can label $x_{v_1}$ to make the prediction wrong and descend to the corresponding child. Realizability is maintained because every root-to-leaf path is consistent with some $h \in \mathcal{H}$. Thus a mistake tree of depth $d$ represents an adversary strategy that forces any learner to make at least $d$ mistakes.

Definition 4.4 Littlestone Dimension [

Lean: LittlestoneDim

The Littlestone dimension of a hypothesis class $\mathcal{H}$, denoted $\mathrm{Ldim}(\mathcal{H})$, is the maximum depth of a complete mistake tree for $\mathcal{H}$. If arbitrarily deep mistake trees exist, $\mathrm{Ldim}(\mathcal{H}) = \infty $.

\begin{tikzpicture} [
level distance=1.6cm,
sibling distance=1.2cm,
every node/.style={font=\small},
treenode/.style={draw, circle, thick, minimum size=0.7cm, fill=onlineblue!10, font=\small\bfseries},
leafnode/.style={draw, rectangle, rounded corners, thick, minimum width=0.6cm, minimum height=0.45cm, fill=sepgreen!15, font=\scriptsize},
edgelabel/.style={font=\scriptsize, midway},
level 1/.style={sibling distance=5.6cm},
level 2/.style={sibling distance=2.8cm},
level 3/.style={sibling distance=1.4cm},
]
\node[treenode] (root) {$4$}
child { node[treenode] {$2$}
child { node[treenode] {$1$}
child { node[leafnode] {$h_{\leq 0}$} edge from parent node[edgelabel, left] {$0$} }
child { node[leafnode] {$h_{\leq 1}$} edge from parent node[edgelabel, right] {$1$} }
edge from parent node[edgelabel, left] {$0$}
}
child { node[treenode] {$3$}
child { node[leafnode] {$h_{\leq 2}$} edge from parent node[edgelabel, left] {$0$} }
child { node[leafnode] {$h_{\leq 3}$} edge from parent node[edgelabel, right] {$1$} }
edge from parent node[edgelabel, right] {$1$}
}
edge from parent node[edgelabel, left] {$0$}
}
child { node[treenode] {$6$}
child { node[treenode] {$5$}
child { node[leafnode] {$h_{\leq 4}$} edge from parent node[edgelabel, left] {$0$} }
child { node[leafnode] {$h_{\leq 5}$} edge from parent node[edgelabel, right] {$1$} }
edge from parent node[edgelabel, left] {$0$}
}
child { node[treenode] {$7$}
child { node[leafnode] {$h_{\leq 6}$} edge from parent node[edgelabel, left] {$0$} }
child { node[leafnode] {$h_{\leq 7}$} edge from parent node[edgelabel, right] {$1$} }
edge from parent node[edgelabel, right] {$1$}
}
edge from parent node[edgelabel, right] {$1$}
};

% Annotation
\node[right=0.3cm of root, font=\scriptsize\itshape, text=notegray, align=left, anchor=west, xshift=3.5cm] {
Domain: $X = \{1,2,\ldots,8\}$\\[2pt]
Class: thresholds $h_{\leq \theta}(x) = \ind[x \leq \theta]$\\[2pt]
Each internal node presents an instance.\\[2pt]
Each leaf witnesses a consistent $h \in \mathcal{H}$.\\[2pt]
Depth $= 3 = \lceil \log_2 8 \rceil$.
};
\end{tikzpicture} — Figure 4.2 A complete mistake tree of depth $3$ for the threshold class on $\{ 1,\ldots ,8\} $. Each internal node is labeled with an instance; the left and right children correspond to labels $0$ and $1$. Every root-to-leaf path is consistent with the threshold hypothesis shown at the leaf. The adversary can traverse this tree from root to any leaf, forcing $3$ mistakes.

Example 4.5 Littlestone dimension of fundamental classes

Finite classes. If $|\mathcal{H}| {\lt} \infty $, then $\mathrm{Ldim}(\mathcal{H}) \leq \lfloor \log _2 |\mathcal{H}| \rfloor $, since a complete binary tree of depth $d$ has $2^d$ leaves and each leaf requires a distinct consistent hypothesis. This bound is tight for classes that are “richly structured enough” (e.g., all functions on a domain of size $\log _2 |\mathcal{H}|$).
Thresholds on $\{ 1, \ldots , n\} $. The class $\{ h_{\leq \theta } : \theta \in \{ 0, 1, \ldots , n\} \} $ has $\mathrm{Ldim}= \lceil \log _2(n+1) \rceil $. The mistake tree in Figure 4.2 illustrates the case $n = 8$. The adversary performs binary search on the threshold.
Thresholds on $\mathbb {R}$. Now $\mathrm{Ldim}= \infty $. The key point: the domain is dense, so the adversary can always find a point between any two previous thresholds. At each round, the adversary presents a point that bisects the current interval of uncertainty, forcing a mistake no matter what the learner predicts. This produces mistake trees of unbounded depth.
Intervals on $\mathbb {R}$. The class $\{ x \mapsto \mathbf{1}[a \leq x \leq b] : a, b \in \mathbb {R}\} $ also has $\mathrm{Ldim}= \infty $. The adversary can adaptively narrow down both endpoints.
Linear classifiers in $\mathbb {R}^d$. The class of halfspaces has $\mathrm{Ldim}= d$, matching the VC dimension. This is one of the rare cases where the two dimensions coincide.

Remark 4.6 Sets vs. trees

Shattering a set $\{ x_1, \ldots , x_d\} $ means $\mathcal{H}$ can produce all $2^d$ labelings of these fixed points. Shattering a tree of depth $d$ means $\mathcal{H}$ can produce a consistent labeling along every root-to-leaf path, but the points themselves may depend on previous labels. The tree structure captures adaptivity: the adversary’s choice at depth $k$ depends on the learner’s responses at depths $1, \ldots , k-1$. A shattered set is a special case of a mistake tree (one in which every node at the same depth is labeled with the same instance), so $\mathrm{VCdim}(\mathcal{H}) \leq \mathrm{Ldim}(\mathcal{H})$ always holds. The converse fails spectacularly: thresholds on $\mathbb {R}$ have $\mathrm{VCdim}= 1$ but $\mathrm{Ldim}= \infty $.