Structured Complexity Measures #
Compression schemes, algorithmic stability, multiclass dimensions, real-valued dimensions, teaching/eluder dimensions, SQ dimension, KL complexity, margin theory, covering numbers.
Compression Schemes #
A compression scheme of size k for concept class C (Littlestone-Warmuth 1986). M-DefinitionRepair (Γ₅₉ → Γ₇₃): parameterized by concept class C.
compressnow returnsFinset (X × Y)(actual compressed data points), notFinset (Fin m)(indices). This ensuresreconstructdepends ONLY on the compressed subset, enabling the pigeonhole argument.correctfield guarded by C-realizability: reconstructed hypothesis agrees with every sample point WHEN the sample is C-realizable. This matches Littlestone-Warmuth and Moran-Yehudayoff: compression schemes only need to be correct on samples consistent with some concept in C. Γ₇₃ RESOLVED: previouscorrectquantified over ALL samples (including inconsistent ones), making CompressionScheme X Bool uninhabited for nonempty X. Now guarded by realizability, enabling genuine non-vacuous proofs.
Compression: extract ≤ size labeled examples from the sample
Reconstruction: produce hypothesis from compressed subset alone
- size : ℕ
Size of the compressed representation
Compressed set is small
Compressed set is a subset of the sample
- correct {m : ℕ} (S : Fin m → X × Y) : (∃ c ∈ C, ∀ (i : Fin m), c (S i).1 = (S i).2) → ∀ (i : Fin m), self.reconstruct (self.compress S) (S i).1 = (S i).2
Correctness: reconstructed hypothesis agrees with every sample point, when the sample is C-realizable (∃ c ∈ C agreeing with all labels)
Instances For
Algorithmic Stability #
Algorithmic stability: removing one example changes the loss by at most β.
- learner : BatchLearner X Y
The learner whose stability we measure
- beta : ℝ
Stability parameter
- loss : LossFunction Y
- stable {m : ℕ} (S : Fin (m + 1) → X × Y) (i : Fin (m + 1)) (z : X × Y) : |self.loss (self.learner.learn S z.1) z.2 - self.loss (self.learner.learn (fun (j : Fin m) => S (i.succAbove j)) z.1) z.2| ≤ self.beta
Stability: for any sample S and any index i, the loss of L(S) vs L(S\i) on any fresh point z differs by at most β. S\i denotes the sample with the i-th element removed.
Instances For
Multiclass and Real-Valued Extensions #
Natarajan dimension: multiclass generalization of VC dimension. Largest d such that ∃ S with |S| = d and two witness functions f₀ f₁ : S → Y disagreeing everywhere on S, such that for every binary selection h, some concept in C picks f₀ or f₁ at each point according to h.
Equations
- One or more equations did not get rendered due to their size.
Instances For
DS dimension: multiclass shattering where EVERY labeling is realizable. Strictly stronger than Natarajan dimension.
Equations
Instances For
DS dimension ≤ Natarajan dimension: DS-shattering (every labeling realizable) is strictly STRONGER than Natarajan-shattering (2-coloring witness), so fewer sets satisfy the DS condition, hence DSDim ≤ NatarajanDim. Requires |Y| ≥ 2 (Nontrivial Y): when |Y| = 1, Natarajan witnesses f₀ ≠ f₁ cannot exist. M-DefinitionRepair (Γ₈): original statement had wrong direction.
Pseudodimension: real-valued generalization of VC dimension. Largest d such that ∃ S with |S| = d and thresholds t, such that for every binary labeling, some concept crosses the thresholds accordingly.
Equations
Instances For
Fat-shattering dimension at margin γ > 0. Like pseudodimension but with γ-margin on both sides of the threshold.
Equations
Instances For
Other Complexity Measures #
SQ dimension: largest d such that there exist d concepts in C with pairwise small correlations under D. Captures the hardness of learning C using only statistical queries (expected values of functions of the sample). M-DefinitionRepair: added distribution parameter D (originally missing).
Equations
Instances For
Covering number: minimum number of balls of radius ε (under metric d) to cover C.
Equations
Instances For
Teaching dimension: max over concepts of min teaching set size. A teaching set for c ∈ C is a set of labeled examples consistent with c that distinguishes c from all other concepts in C.
Equations
Instances For
Eluder dimension: length of the longest eluder-independent sequence. Each element xᵢ has two concepts in C agreeing on x₁,...,x_{i-1} but disagreeing on xᵢ.
Equations
- One or more equations did not get rendered due to their size.
Instances For
KL complexity: KL divergence D_KL(posterior ‖ prior). Uses the convention 0 · log(0/·) = 0 and · · log(·/0) = +∞ (the tsum returns 0 for non-summable families).
Equations
Instances For
Compression with Side Information (Moran-Yehudayoff 2016) #
The Moran-Yehudayoff theorem proves compression WITH finite side information.
The paper's definition: κ : LC(∞) → LC(k) × I where I is a finite set.
CompressionScheme above is strictly stronger (no side information).
A labeled compression scheme with finite side information. This is the object proved to exist by Moran-Yehudayoff (2016, arXiv:1503.06960).
The current CompressionScheme is strictly stronger: it requires
reconstruction from the compressed Finset alone (no side information).
See Open_NoInfoCompressionStrengthening for that conjecture.
- Info : Type u_1
The side information type
Side information is finite
Compression: extract ≤ kernelSize labeled examples + side information
Reconstruction: produce hypothesis from compressed subset AND side information
- kernelSize : ℕ
Kernel size bound
Compressed set is small
Compressed set is a subset of the sample
- correct {m : ℕ} (S : Fin m → X × Y) : (∃ c ∈ C, ∀ (i : Fin m), c (S i).1 = (S i).2) → ∀ (i : Fin m), self.reconstruct (self.compress S).1 (self.compress S).2 (S i).1 = (S i).2
Correctness: reconstructed hypothesis agrees with every sample point, when the sample is C-realizable
Instances For
Total size of a compression scheme with side information: kernel size + number of side information states. (The paper uses k + log₂(|I|+1); we use the simpler k + |I| which is an upper bound and avoids importing Real.log.)
Equations
- cs.size = cs.kernelSize + Fintype.card cs.Info
Instances For
The current CompressionScheme (no side info) embeds into CompressionSchemeWithInfo
with trivial side information Info := PUnit.
Equations
- One or more equations did not get rendered due to their size.
Instances For
OPEN CONJECTURE: Does finite VC dimension imply compression WITHOUT side information? This is strictly stronger than the Moran-Yehudayoff theorem. The O(d) version is the Littlestone-Warmuth / Floyd-Warmuth conjecture (open since 1986).
Equations
- Open_NoInfoCompressionStrengthening = ∀ (X : Type ?u.8) (C : ConceptClass X Bool), VCDim X C < ⊤ → ∃ (k : ℕ) (cs : CompressionScheme X Bool C), cs.size = k