Formal Learning Theory Kernel: Blueprint v1: The Compression Direction (Sketch)

3.3.2 The Compression Direction (Sketch)

Definition 3.19 Compression Scheme

A compression scheme of size \(k\) for \(\mathcal{H}\) is a pair \((\kappa , \rho )\) where:

[nosep]
\(\kappa \) (the compressor) maps any sample \(S\) consistent with some \(h \in \mathcal{H}\) to a subsequence \(\kappa (S) \subseteq S\) of size \(\leq k\);
\(\rho \) (the reconstructor) maps any subsequence of size \(\leq k\) to a hypothesis \(\rho (\kappa (S)) \in Y^X\);
\(\rho (\kappa (S))\) is consistent with the full sample \(S\).

The direction \(\ref{F:compression} \Rightarrow \ref{F:pac}\) is classical and relatively straightforward: a compression scheme of size \(k\) yields PAC learning with sample complexity \(O\! \left(\frac{k\log (k/\varepsilon ) + \log (1/\delta )}{\varepsilon }\right)\), since the compressed subsequence encodes all the information the learner needs, and there are at most \(\binom {m}{k}\) possible compressed sets.

The converse direction \(\ref{F:finvc} \Rightarrow \ref{F:compression}\) is the deep one. A weaker exponential bound was proved by Moran and Yehudayoff [ ; the conjectured linear bound remains open.

The sample compression conjecture. Littlestone and Warmuth conjectured in 1986 that every class of VC dimension \(d\) has a compression scheme of size at most \(d\) (the linear bound). Moran and Yehudayoff (2016) proved the weaker result that every class of VC dimension \(d\) has a compression scheme of size at most \(2^{O(d)}\) (an exponential bound). The original conjecture of a linear bound \(O(d)\) remains one of the field’s central open problems. This kernel formalizes the Moran-Yehudayoff exponential bound via an approximate minimax (MWU) construction, not the conjectured linear bound.

Proof sketch (Moran–Yehudayoff) ▼

The argument constructs a compression scheme from a maximum class (a class achieving the Sauer–Shelah bound with equality). Every class of VC dimension \(d\) can be embedded, for sample complexity purposes, into a maximum class of VC dimension \(d\) via a projection argument. For maximum classes, the one-inclusion graph has special structure: its edges can be oriented so that every vertex has in-degree at most \(d\). This orientation defines a compression scheme, given sample \(S\), output the at-most-\(d\) predecessors of the unique vertex in the one-inclusion graph determined by \(S\). The reconstruction uses the structure of the maximum class to recover a consistent hypothesis from these \(d\) points.

Extending from maximum classes to arbitrary classes of finite VC dimension requires a more delicate argument involving fractional covers of the one-inclusion hypergraph, which blows up the size to \(2^{O(d)}\).