6.3 Strict Strength Hierarchy

The 4 strictly_stronger edges assert that one concept does imply another, but the reverse fails: the stronger concept is a proper generalization. Each edge requires two proofs: one for the forward implication, and one, the witness, for the strictness of the gap.

Ex-learning \(\supsetneq \) FIN-learning.

Forward implication. Every FIN-learnable class is trivially Ex-learnable: a learner that outputs its final hypothesis after finitely many data points and never changes it again is, a fortiori, a learner that converges in the limit.

Witness for strictness. FIN-learning requires zero mind changes: the learner must output its final, correct hypothesis at some point and never retract it. Ex-learning allows unbounded mind changes before convergence. The gap is witnessed by any class requiring unbounded mind changes for identification.

Consider the class of pattern languages with growing structural complexity. A Gold-style learner can identify any member in the limit (Ex-learn it) by successively refining its conjecture as new data arrives, but no learner can commit to a correct hypothesis after seeing only finitely many initial elements without ever revising. The revision process is essential, and FIN-learning forbids it.

In the Gold identification hierarchy, this separation is the first step: \(\mathbf{FIN}\subsetneq \mathbf{Ex}\), and the gap is measured by the mind-change ordinal. FIN corresponds to \(0\) mind changes; Ex allows \({\lt} \omega \) mind changes (any finite number, not bounded in advance).

[

Online learning \(\supsetneq \) PAC learning.

Forward implication. If \(\mathcal{H}\) has finite Littlestone dimension \(\mathrm{Ldim}(\mathcal{H}) = d\), then in particular \(\mathrm{VCdim}(\mathcal{H}) \leq d\) (every shattered set in the VC sense is a path in a Littlestone tree). By the fundamental theorem, \(\mathcal{H}\) is PAC learnable.

Witness for strictness. Thresholds on finite domains provide the simplest witness. Fix \(X = \{ 1, 2, \ldots , n\} \) and let \(\mathcal{H}_n = \{ h_\theta : \theta \in \{ 0, 1, \ldots , n\} \} \) where \(h_\theta (x) = \mathbf{1}[x {\gt} \theta ]\). Then \(\mathrm{VCdim}(\mathcal{H}_n) = 1\) for every \(n\), but \(\mathrm{Ldim}(\mathcal{H}_n) = \lfloor \log _2(n+1) \rfloor \), which grows without bound as \(n \to \infty \).

More dramatically, thresholds on \(\mathbb {R}\) (the witness from Separation 1) have \(\mathrm{VCdim}= 1\) but \(\mathrm{Ldim}= \infty \): PAC learnable but not online learnable with any finite mistake bound. The containment is strict at every level of the hierarchy.

[

Universal learning \(\supsetneq \) PAC learning.

Forward implication. Every PAC learnable class (finite VC dimension) is universally learnable with exponential learning rates. Universal learning requires learning under every distribution individually, whereas PAC learning requires a single learner that works uniformly over all distributions. The PAC guarantee is a special case.

Witness for strictness. Bousquet et al. [ established a trichotomy for universal learning rates: every concept class has either an exponential rate, an arbitrarily slow rate, or is not universally learnable at all. The trichotomy theorem shows that classes with \(\mathrm{VCdim}(\mathcal{H}) = \infty \) but containing no infinite Littlestone tree achieve exponential universal learning rates, they are universally learnable but not PAC learnable (since PAC requires finite VC dimension).

The critical structural difference is the quantifier order. PAC learning demands: \(\exists \) learner \(\forall \) distributions \(\forall \varepsilon , \delta \), the learner succeeds. Universal learning demands: \(\forall \) distributions \(\exists \) rate at which some learner succeeds. By removing the uniformity requirement over distributions, universal learning captures a strictly larger class of learning problems.

[

DS dimension \(\supsetneq \) Natarajan dimension.

Forward implication. The DS dimension refines the Natarajan dimension: \(d_{\mathrm{N}}(\mathcal{H}) \leq \mathrm{DSdim}(\mathcal{H})\) for every hypothesis class \(\mathcal{H}\). Both measure the combinatorial complexity of multiclass concept classes, but the DS dimension accounts for the full oriented hypergraph structure, not just pairwise shattering.

Witness for strictness. Brukhim et al. [ constructed a concept class using a hyperbolic pseudo-manifold with \(d_{\mathrm{N}}= 1\) but \(\mathrm{DSdim}= \infty \). The construction builds a concept class over an infinite label space where every pair of points admits only one non-trivial two-coloring (giving \(d_{\mathrm{N}}= 1\)), but the global combinatorial structure, encoded in the oriented faces of the pseudo-manifold, is rich enough to drive the DS dimension to infinity.

This is the witness that simultaneously demonstrates the separation \(d_{\mathrm{N}}\not\Rightarrow \text{PAC learnable}\) from Section 6.2: the same class has \(d_{\mathrm{N}}= 1\) but is not learnable, because learnability is characterized by the DS dimension, not the Natarajan dimension.

[