Symmetrization and Ghost Sample Infrastructure #
Reusable symmetrization/ghost sample machinery for uniform convergence bounds. This file provides the symmetrization argument (SSBD Chapter 4/6, Kakade-Tewari Lecture 19) that converts a one-sided uniform convergence question into a double-sample question, then bounds the double-sample event via exchangeability + growth function.
Main results #
hoeffding_one_sided: one-sided Hoeffding for bounded [0,1] lossessymmetrization_step: P[∃h: TrueErr-EmpErr ≥ ε] ≤ 2·P_{double}[∃h: EmpErr'-EmpErr ≥ ε/2]double_sample_pattern_bound: double-sample bound via exchangeability + growth functionsymmetrization_uc_bound: two-sided UC bound 4·GF(C,2m)·exp(-mε²/8)growth_exp_le_delta: arithmetic: sample complexity makes the UC bound ≤ δ
Infrastructure #
DoubleSampleMeasure: D^m ⊗ D^m as the product of two independent pi measuresMergedSample: Fin (2*m) → X with the Fin.append isomorphismSplitMeasure: uniform measure over (2m choose m) splits for exchangeability argument
Design notes #
All theorems use the STANDARD Approach A (exchangeability + permutation) for T3, NOT the relaxed iid Rademacher approach. This is the structurally correct argument that avoids introducing unnecessary independence assumptions.
Helper Definitions (DoubleSampleMeasure, ValidSplit, etc. in MathLib.Exchangeability) #
T1: One-sided Hoeffding Inequality #
One-sided Hoeffding: for iid Bernoulli(p) draws, the empirical average undershoots the mean by ≥ t with probability ≤ exp(-2mt²).
Proof strategy (3 steps):
MGF bound (Hoeffding's lemma): For X ∈ [0,1] with E[X] = p, E[exp(s(X-p))] ≤ exp(s²/8).
- Adapt from
cosh_le_exp_sq_halfinfrastructure in Rademacher.lean. - Key: convexity of exp on [0,1] gives E[exp(sX)] ≤ p·exp(s) + (1-p)·exp(0), then the s²/8 bound follows from ln(1 + x) ≤ x and Taylor expansion.
have mgf_bound : ∀ (s : ℝ), ∫ x, Real.exp (s * (indicator x - p)) ∂D ≤ Real.exp (s^2 / 8) := by ...- Adapt from
Product independence: E[exp(s·∑(X_i-p))] = ∏ E[exp(s(X_i-p))] ≤ exp(ms²/8).
- Uses
MeasureTheory.Measure.piindependence structure. - Needs:
Measure.piintegral factorization for product of functions. - MEASURABILITY:
fun xs => Real.exp (s * ∑ i, f (xs i))is measurable (composition of measurable functions).
have product_bound : ∀ (s : ℝ), ∫ xs, Real.exp (s * ∑ i, (indicator (xs i) - p)) ∂Measure.pi (fun _ => D) ≤ Real.exp (m * s^2 / 8) := by ...- Uses
Exponential Markov + optimize: P[∑(X_i-p) ≤ -mt] = P[exp(-s·∑(X_i-p)) ≥ exp(smt)] ≤ exp(-smt + ms²/8). Optimize over s: set s = 4t to get ≤ exp(-2mt²).
- Uses Markov's inequality in ENNReal form.
- CAST ISSUE: Markov gives ENNReal bound, need to convert exp(-2mt²) between ENNReal.ofReal and the measure value.
have markov_step : ∀ (s : ℝ) (hs : 0 < s), Measure.pi (fun _ => D) {xs | ∑ i, (indicator (xs i) - p) ≤ -(m : ℝ) * t} ≤ ENNReal.ofReal (Real.exp (-(s * m * t) + m * s^2 / 8)) := by ... have optimize : Real.exp (-(4*t * m * t) + m * (4*t)^2 / 8) = Real.exp (-2 * m * t^2) := by ring_nf
CAST ISSUES to watch:
m : ℕneeds cast toℝin the exponent:(m : ℝ)EmpiricalErrorreturnsℝ,TrueErrorRealreturnsℝ, good — no ENNReal gap- The measure value is
ENNReal, the boundexp(-2mt²)isℝ≥0∞viaENNReal.ofReal
References: SSBD Lemma B.3, Hoeffding (1963)
T2: Symmetrization Step #
Symmetrization: the probability of a large gap TrueErr-EmpErr is at most twice the probability of a large gap EmpErr'-EmpErr on the double sample.
Proof strategy (6 steps):
Witness selection: For S in the bad event, ∃h* ∈ C with TrueErr(h*) - EmpErr_S(h*) ≥ ε.
-- In the bad event set, extract h* by classical choice have h_witness : ∀ xs ∈ bad_event, ∃ h* ∈ C, TrueErrorReal X h* c D - EmpiricalError X Bool h* (sample xs) (zeroOneLoss Bool) ≥ εGhost sample mean: E_{S'}[EmpErr_{S'}(h*)] = TrueErr(h*) ≥ EmpErr_S(h*) + ε.
- Uses:
MeasureTheory.integral_pito compute E[EmpErr] over product measure. - KEY LEMMA: For fixed h, E_{D^m}[EmpiricalError(h,S)] = TrueErrorReal(h,c,D). This is because EmpErr = (1/m)∑ indicator(x_i), and E[indicator(x_i)] = TrueErrorReal.
have expected_emp_err : ∀ h* : Concept X Bool, ∫ xs, EmpiricalError X Bool h* (sample xs) (zeroOneLoss Bool) ∂(Measure.pi (fun _ : Fin m => D)) = TrueErrorReal X h* c D := by ...- Uses:
Hoeffding on ghost sample: P_{S'}[EmpErr_{S'}(h*) < TrueErr(h*) - ε/2] ≤ exp(-mε²/2).
- Apply
hoeffding_one_sidedwith t = ε/2. - The
hm_largehypothesis ensures exp(-mε²/2) < 1/2: 2·ln2 ≤ mε² ⟹ mε²/2 ≥ ln2 ⟹ exp(-mε²/2) ≤ 1/2.
have hoeffding_ghost : ∀ h* ∈ C, Measure.pi (fun _ : Fin m => D) {xs' | EmpiricalError X Bool h* (sample xs') (zeroOneLoss Bool) < TrueErrorReal X h* c D - ε/2} ≤ ENNReal.ofReal (Real.exp (-m * (ε/2)^2 * 2)) := by intro h* _; exact hoeffding_one_sided D h* c m hm (ε/2) (by linarith) (by ...) (by ...)- Apply
Complementary probability: P_{S'}[EmpErr_{S'}(h*) - EmpErr_S(h*) ≥ ε/2] ≥ 1/2.
- From step 2: TrueErr(h*) ≥ EmpErr_S(h*) + ε
- From step 3: P[EmpErr_{S'} ≥ TrueErr - ε/2] ≥ 1/2
- Chain: EmpErr_{S'} ≥ TrueErr - ε/2 ≥ EmpErr_S + ε - ε/2 = EmpErr_S + ε/2
Conditional to unconditional: The witness h* from step 1 also witnesses the double-sample event ∃h∈C: EmpErr'-EmpErr ≥ ε/2. So: P_{S'}[double event | S bad] ≥ 1/2.
have conditional_bound : ∀ xs ∈ bad_event, Measure.pi (fun _ : Fin m => D) {xs' | ∃ h ∈ C, EmpiricalError ... xs' - EmpiricalError ... xs ≥ ε/2} ≥ ENNReal.ofReal (1/2) := by ...Fubini integration: By Measure.prod_apply and Fubini: P_{S,S'}[double event] = ∫S P{S'}[double event | S] ≥ (1/2) · P_S[bad event] ⟹ P_S[bad event] ≤ 2 · P_{S,S'}[double event].
-- Uses: MeasureTheory.Measure.prod_apply or lintegral_prod -- MEASURABILITY: the double-sample event is measurable as a finite union -- of sets of the form {(xs,xs') | EmpErr'(h) - EmpErr(h) ≥ ε/2} for h ∈ C. -- Since C may be infinite, measurability requires care: the sup over h -- must be shown to be measurable. For finite restriction patterns (≤ 2^m -- on Fin m → Bool), this is a finite union.
MEASURABILITY CONCERNS:
{xs | ∃ h ∈ C, ...}is NOT obviously measurable for infinite C. Strategy: decompose via restriction patterns. On any fixed xs, the set of labelings {(h(xs 0), ..., h(xs(m-1))) | h ∈ C} has at most GF(C,m) ≤ 2^m elements. So the ∃h event is a finite union of measurable sets.EmpiricalErroris a finite sum of measurable functions, hence measurable.- The product σ-algebra on (Fin m → X) × (Fin m → X) is generated by cylinder sets, and our events are in this σ-algebra.
References: SSBD Lemma 4.5, Kakade-Tewari Lecture 19 Lemma 1
T3: Double Sample Pattern Bound (Approach A — Standard Exchangeability) #
Per-hypothesis Hoeffding on the double sample: for a FIXED hypothesis h, the probability that EmpErr(h,S') - EmpErr(h,S) ≥ ε/2 under D^m ⊗ D^m is at most exp(-mε²/8).
Proof: The gap = (1/m)∑ᵢ (Zᵢ' - Zᵢ) where Zᵢ = 1[h(xᵢ)≠c(xᵢ)], Zᵢ' = 1[h(x'ᵢ)≠c(x'ᵢ)] are iid Bernoulli(p) with p = TrueError(h,c,D). The differences Wᵢ = Zᵢ' - Zᵢ are independent, bounded in [-1,1], and centered (E[Wᵢ] = 0). By Hoeffding's inequality: P[(1/m)∑Wᵢ ≥ ε/2] ≤ exp(-mε²/8).
This uses the sub-Gaussian machinery from T1, extended to the product space.
Proof sketch:
- Pair D^m ⊗ D^m ≅ (D⊗D)^m via the natural isomorphism (Fin m → X) × (Fin m → X) ≃ᵐ Fin m → X × X
- Define g : X × X → ℝ, g(a,b) = 1[h(b)≠c(b)] - 1[h(a)≠c(a)] Then g ∈ [-1,1], E_{D⊗D}[g] = 0, so HasSubgaussianMGF g 1 (D⊗D)
- The gap = (1/m)∑ᵢ g(xᵢ, x'ᵢ) where pairs are iIndepFun under (D⊗D)^m
- By measure_sum_ge_le_of_iIndepFun: P[∑g ≥ mε/2] ≤ exp(-(mε/2)²/(2m)) = exp(-mε²/8)
Mathlib chain: iIndepFun_pi + HasSubgaussianMGF.of_map + measure_sum_ge_le_of_iIndepFun
The number of distinct restriction patterns of C on any n points is at most GF(C,n). For z : Fin n → X, define patterns(z) = {p : Fin n → Bool | ∃ h ∈ C, ∀ i, p i = (h(z i) ≠ c(z i))}. Then patterns(z).ncard ≤ GrowthFunction X C n by definition of GrowthFunction.
Generic finite exchangeability bound. Given a measure-preserving family of transformations on a probability space, a NullMeasurableSet S, and a pointwise bound on the sum of preimage indicators, conclude ν(S) ≤ B.
A concept class is well-behaved if the ghost gap event is null-measurable. This is the minimal regularity assumption for the symmetrization proof.
Equations
- One or more equations did not get rendered due to their size.
Instances For
On the double sample, the probability that any hypothesis has EmpErr' - EmpErr ≥ ε/2 is bounded by GF(C,2m) · exp(-mε²/8).
Proof strategy (Approach A — standard exchangeability, 5 steps):
EXCHANGEABILITY: Under D^m ⊗ D^m, the 2m draws z₁,...,z_{2m} are iid from D. The joint distribution is invariant under permutations of {1,...,2m}.
Key lemma: P_{D^m⊗D^m}[event(S,S')] = E_z[P_{split}[event | z]] where z = merged sample and the split is uniformly random among all C(2m,m) ways to partition z into two groups of m.
-- Measure.pi permutation invariance have pi_perm_invariant : ∀ (σ : Equiv.Perm (Fin (2*m))), (Measure.pi (fun _ : Fin (2*m) => D)).map (fun z i => z (σ i)) = Measure.pi (fun _ : Fin (2*m) => D) := by ... -- Consequence: the event probability equals the split-averaged probability have exchangeability : DoubleSampleMeasure D m {p | ∃ h ∈ C, gap(p) ≥ ε/2} = ∫ z, SplitMeasure m {vs | ∃ h ∈ C, gap(split z vs) ≥ ε/2} ∂(Measure.pi (fun _ : Fin (2*m) => D)) := by ...CONDITIONING: For fixed merged sample z of 2m points:
- C restricts to at most GF(C,2m) distinct labeling patterns on z (deterministic).
- For each pattern p, define: diff(p, split) = EmpErr_{S'}(p) - EmpErr_S(p) = (1/m) ∑{i∈S'} a_i - (1/m) ∑{i∈S} a_i where a_i = 1[pattern(z_i) ≠ c(z_i)] ∈ {0,1}.
-- Number of distinct patterns have num_patterns : ∀ (z : MergedSample X m), Set.ncard {p : Fin (2*m) → Bool | ∃ h ∈ C, ∀ i, p i = (h (z i) ≠ c (z i))} ≤ GrowthFunction X C (2*m) := by ...PER-PATTERN HOEFFDING ON SPLITS: For fixed z and fixed pattern p: Under uniformly random split (S,S') of z into two groups of m: diff(p, split) = (1/m) ∑{i∈S'} a_i - (1/m) ∑{i∈S} a_i
This is a function of the random partition. By Hoeffding's inequality for sampling without replacement (Serfling 1974): P_split[diff ≥ ε/2] ≤ exp(-mε²/8)
Alternative derivation: Hoeffding without replacement from Hoeffding with replacement (iid signs) via coupling. The without-replacement bound is actually TIGHTER (variance reduction), but the with-replacement bound suffices.
-- Per-pattern concentration have per_pattern_bound : ∀ (z : MergedSample X m) (a : Fin (2*m) → ℝ) (ha : ∀ i, a i ∈ Set.Icc 0 1), SplitMeasure m {vs | (1/m) * ∑ i ∈ second_group vs, a i - (1/m) * ∑ i ∈ first_group vs, a i ≥ ε/2} ≤ ENNReal.ofReal (Real.exp (-(m : ℝ) * (ε/2)^2 / 2)) := by ... -- Note: m*(ε/2)^2/2 = mε²/8UNION BOUND: P_split[∃ pattern: diff ≥ ε/2 | z] ≤ (number of patterns) · max_pattern P_split[diff ≥ ε/2] ≤ GF(C,2m) · exp(-mε²/8)
have union_bound : ∀ (z : MergedSample X m), SplitMeasure m {vs | ∃ h ∈ C, gap(split z vs, h) ≥ ε/2} ≤ ENNReal.ofReal (GrowthFunction X C (2*m) * Real.exp (-(m : ℝ) * ε^2 / 8)) := by ...INTEGRATE: P_{D^m⊗D^m}[event] = E_z[P_split[event|z]] (by step 1) ≤ E_z[GF(C,2m) · exp(-mε²/8)] (by step 4, pointwise) = GF(C,2m) · exp(-mε²/8) (bound is independent of z)
-- The bound is a constant, so integrating gives the same constant -- (using IsProbabilityMeasure for the 2m-fold product)
Infrastructure needed:
Fin.sumFinEquiv : Fin m ⊕ Fin n ≃ Fin (m + n)(available in Mathlib)mergeSamples/splitMergedSample(defined above)SplitMeasureandValidSplit(defined above)Measure.pipermutation invariance (to be proved or imported)- Hoeffding for sampling without replacement
GrowthFunctionon 2m points +sauer_shelah_exp_boundfrom Rademacher.lean
MEASURABILITY CONCERNS:
- The merged sample z ↦ P_split[event|z] must be measurable as a function of z. Since the event is a finite union over patterns, and each pattern's indicator is a measurable function of z (finite evaluation), this follows.
GrowthFunction X C (2*m)is a natural number (deterministic), no measurability issue.
References: SSBD Theorem 6.7, Hoeffding (1963), Serfling (1974)
Upper-tail Hoeffding: for iid Bernoulli(p) draws, the empirical average overshoots the mean by ≥ t with probability ≤ exp(-2mt²).
This is the mirror of hoeffding_one_sided (which bounds the lower tail).
The proof uses the same sub-Gaussian machinery with Z_i = indicator(x_i) - p
(instead of p - indicator(x_i)).
Symmetrization step for the lower tail: P[∃h: EmpErr-TrueErr ≥ ε] ≤ 2·P_{double}[∃h: EmpErr_S-EmpErr_{S'} ≥ ε/2].
Mirror of symmetrization_step for the opposite direction.
Uses hoeffding_one_sided_upper instead of hoeffding_one_sided.
T4: Symmetrization Uniform Convergence Bound (two-sided) #
The symmetrization uniform convergence bound: two-sided version. P[∃h∈C: |TrueErr-EmpErr| ≥ ε] ≤ 4·GF(C,2m)·exp(-mε²/8).
Proof strategy (4 steps):
Decompose absolute value: |TrueErr - EmpErr| ≥ ε ↔ (TrueErr - EmpErr ≥ ε) ∨ (EmpErr - TrueErr ≥ ε)
have abs_decomp : ∀ (a b : ℝ), |a - b| ≥ ε ↔ a - b ≥ ε ∨ b - a ≥ ε := by intro a b; constructor · intro h; by_cases h' : a - b ≥ ε · exact Or.inl h' · exact Or.inr (by linarith [abs_sub_comm a b, le_abs_self (a - b)]) · intro h; cases h with | inl h => exact le_trans (le_of_eq (abs_of_nonneg (by linarith))) (by linarith) | inr h => exact le_trans (le_of_eq (abs_of_nonpos (by linarith) ▸ ...)) ...Upper tail: P[∃h∈C: TrueErr-EmpErr ≥ ε] ≤ 2·GF(C,2m)·exp(-mε²/8)
- Direct application of
symmetrization_step+double_sample_pattern_bound.
- Direct application of
Lower tail: P[∃h∈C: EmpErr-TrueErr ≥ ε] ≤ 2·GF(C,2m)·exp(-mε²/8)
- Apply the symmetric argument: swap roles of S and S' in the double sample.
- Equivalently, apply
symmetrization_stepto the event EmpErr-TrueErr ≥ ε and bound the double-sample event {EmpErr_S - EmpErr_{S'} ≥ ε/2}. - The bound is symmetric because D^m ⊗ D^m is symmetric under swapping factors.
have swap_symmetry : DoubleSampleMeasure D m {p | ∃ h ∈ C, EmpErr(S) - EmpErr(S') ≥ ε/2} = DoubleSampleMeasure D m {p | ∃ h ∈ C, EmpErr(S') - EmpErr(S) ≥ ε/2} := Measure.prod_swap ...Union bound: P[|gap| ≥ ε] ≤ P[gap ≥ ε] + P[gap ≤ -ε] ≤ 2·GF·exp(...) + 2·GF·exp(...) = 4·GF(C,2m)·exp(-mε²/8)
-- Uses: MeasureTheory.measure_union_le for the union of two events -- CAST: 2 * X + 2 * X = 4 * X in ENNReal (need ENNReal.add_mul or similar)
References: SSBD Theorem 6.7, Kakade-Tewari Lecture 19
T5: Arithmetic — Growth Function × Exponential ≤ δ #
Sorry-free UC proof: composing symmetrization + arithmetic #
These theorems close the sorry in uc_bad_event_le_delta (Generalization.lean)
by composing symmetrization_uc_bound with growth_exp_le_delta.
They live here because Symmetrization.lean has access to both components,
whereas Generalization.lean cannot import Symmetrization.lean (circular).
Finite VCDim implies uniform convergence. Proof: VCDim < ∞ → UC.
- Finite X: direct Hoeffding per-hypothesis + finite union bound.
- Infinite X: Sauer-Shelah → symmetrization + growth function → UC.
VCDim < ⊤ → PACLearnable via UC route.