Moran-Yehudayoff Compression Theorem #
Finite VC dimension ↔ compression scheme with finite side information.
Architecture #
The forward direction (VCDim < ⊤ → compression) uses the Moran-Yehudayoff construction:
- A proper finite-support learner (from VC + Sauer-Shelah + probabilistic method)
- A hypothesis envelope (finite image of the learner on bounded subsamples)
- An approximate minimax strategy on the agreement game
- Sparse approximation via VC ε-approximation on agreement tests
- Majority vote reconstruction with incidence side information
The reverse direction (compression → VCDim < ⊤) is by pigeonhole on the bounded set of (kernel, info) pairs.
No Measure Theory #
The forward theorem is pure and combinatorial. It uses FinitePMF, Finset, and finite games — no MeasureTheory.Measure, IsProbabilityMeasure, Measure.dirac, or MeasurableSpace hypotheses.
Helper definitions #
Extract the domain points from a labeled sample.
Equations
- pointSupport S = Finset.image (fun (i : Fin m) => (S i).1) Finset.univ
Instances For
supportError is nonneg.
supportError is at most 1.
Structure: Proper Finite-Support Learner #
A proper finite-support learner for a concept class C. This structure captures the existence of a bounded-support ERM with error at most 1/3 for any C-realizable finite distribution. CORRECTED: good_on_support returns Finset X (not Fin k → X).
- sampleBound : ℕ
- good_on_support (c : X → Bool) : c ∈ C → ∀ (Y : Finset X) (q : FinitePMF ↥Y), ∃ Z ⊆ Y, Z.card ≤ self.sampleBound ∧ supportError Y q (self.learn (labeledSampleOfFinset c Z)) c ≤ 1 / 3
Instances For
Finite VC dimension implies existence of a proper finite-support learner. The construction uses ERM + finite_support_vc_approx on the disagreement family.
Hypothesis Envelope #
The hypothesis envelope: the finite set of all possible learner outputs on bounded subsamples of Y, labeled by concept c.
Equations
- hypothesisEnvelope L c Y = Finset.image (fun (Z : Finset X) => L.learn (labeledSampleOfFinset c Z)) (boundedSubsamples Y L.sampleBound)
Instances For
Every hypothesis in the envelope is in C.
Agreement Tests #
The family of agreement tests over all points in Y.
Equations
- agreeTests c Y HY = Finset.image (fun (x : X) => agreeTest c x HY) Y
Instances For
Roundtrip helpers for the compression proof #
Encode a witness set W as the set of kernel positions of the pairs (x, c x).
The bound kernel.card ≤ K is fed into the encoding through the if branch, so the
result has the same shape as the current compressCore code.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If every (x, c x) with x ∈ W lies in kernel, and kernel.card ≤ K, then
decoding the encoded witness positions gives back exactly W.
On the encoded witness support, the decoded label function agrees with the true
label function c, provided every pair in the kernel has the correct second coordinate.
If two label functions agree on all points of Z, then the labeled samples they
induce on Z.equivFin are equal.
Generic roundtrip theorem for the hround sorry.
If:
encodeWitnessInfois used incompressCore,decodeWitnessXCoordsanddecodeWitnessLabelare used inblockHyp, and- the kernel contains the witness pairs with the correct labels,
then the decoded block hypothesis is exactly the representative hypothesis.
Moran-Yehudayoff forward construction — universe-fixed closure helpers #
Fix the hidden Info universe parameter of CompressionSchemeWithInfo to 0.
This resolves the universe elaboration obstruction: Fin T → Finset (Fin K) is
Type 0, while CompressionSchemeWithInfo X Bool C with X : Type u infers
Info : Type u. Pinning to .{u, 0, 0} allows Type 0 Info directly.
Equations
- CompressionSchemeWithInfo0 X Y C = CompressionSchemeWithInfo X Y C
Instances For
Concrete side information for the MY construction: each of the T recovered
blocks is represented by the set of kernel positions it uses.
Equations
- IncidenceInfo T K = (Fin T → Finset (Fin K))
Instances For
Equations
Forward direction: VCDim < ⊤ → compression with info #
The forward direction of the Moran-Yehudayoff theorem: finite VC dimension implies existence of a compression scheme with finite side information.
The construction:
- Build a proper finite-support learner L from VC + Sauer-Shelah
- For sample S: extract c, Y = pointSupport S, HY = hypothesis envelope
- Apply approximate minimax on the agreement game → distribution p on HY
- Apply VC ε-approximation on agreement tests → T representative hypotheses
- Kernel = union of witness subsets for T hypotheses
- Side info = incidence: which hypothesis's witness contains each kernel point
- Reconstruct by majority vote over T hypotheses
Reverse direction: compression with info → VCDim < ⊤ #
Pigeonhole core: if two C-realizable samples over the same points with different labelings produce the same (kernel, info) pair, correctness forces the labelings to agree.
Compression with side info implies finite VC dimension. Proof by pigeonhole: compress is injective on C-realizable labelings (by correctness), but compressed outputs form a bounded set.