Documentation

FLT_Proofs.Complexity.Symmetrization

Symmetrization and Ghost Sample Infrastructure #

Reusable symmetrization/ghost sample machinery for uniform convergence bounds. This file provides the symmetrization argument (SSBD Chapter 4/6, Kakade-Tewari Lecture 19) that converts a one-sided uniform convergence question into a double-sample question, then bounds the double-sample event via exchangeability + growth function.

Main results #

Infrastructure #

Design notes #

All theorems use the STANDARD Approach A (exchangeability + permutation) for T3, NOT the relaxed iid Rademacher approach. This is the structurally correct argument that avoids introducing unnecessary independence assumptions.

Helper Definitions (DoubleSampleMeasure, ValidSplit, etc. in MathLib.Exchangeability) #

T1: One-sided Hoeffding Inequality #

theorem hoeffding_one_sided {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (h c : Concept X Bool) (m : ) (hm : 0 < m) (t : ) (ht : 0 < t) (_ht1 : t 1) (hmeas : MeasurableSet {x : X | h x c x}) :
(MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool) TrueErrorReal X h c D - t} ENNReal.ofReal (Real.exp (-2 * m * t ^ 2))

One-sided Hoeffding: for iid Bernoulli(p) draws, the empirical average undershoots the mean by ≥ t with probability ≤ exp(-2mt²).

Proof strategy (3 steps):

  1. MGF bound (Hoeffding's lemma): For X ∈ [0,1] with E[X] = p, E[exp(s(X-p))] ≤ exp(s²/8).

    • Adapt from cosh_le_exp_sq_half infrastructure in Rademacher.lean.
    • Key: convexity of exp on [0,1] gives E[exp(sX)] ≤ p·exp(s) + (1-p)·exp(0), then the s²/8 bound follows from ln(1 + x) ≤ x and Taylor expansion.
    have mgf_bound : ∀ (s : ℝ),
      ∫ x, Real.exp (s * (indicator x - p)) ∂D ≤ Real.exp (s^2 / 8) := by ...
    
  2. Product independence: E[exp(s·∑(X_i-p))] = ∏ E[exp(s(X_i-p))] ≤ exp(ms²/8).

    • Uses MeasureTheory.Measure.pi independence structure.
    • Needs: Measure.pi integral factorization for product of functions.
    • MEASURABILITY: fun xs => Real.exp (s * ∑ i, f (xs i)) is measurable (composition of measurable functions).
    have product_bound : ∀ (s : ℝ),
      ∫ xs, Real.exp (s * ∑ i, (indicator (xs i) - p)) ∂Measure.pi (fun _ => D)
      ≤ Real.exp (m * s^2 / 8) := by ...
    
  3. Exponential Markov + optimize: P[∑(X_i-p) ≤ -mt] = P[exp(-s·∑(X_i-p)) ≥ exp(smt)] ≤ exp(-smt + ms²/8). Optimize over s: set s = 4t to get ≤ exp(-2mt²).

    • Uses Markov's inequality in ENNReal form.
    • CAST ISSUE: Markov gives ENNReal bound, need to convert exp(-2mt²) between ENNReal.ofReal and the measure value.
    have markov_step : ∀ (s : ℝ) (hs : 0 < s),
      Measure.pi (fun _ => D) {xs | ∑ i, (indicator (xs i) - p) ≤ -(m : ℝ) * t}
      ≤ ENNReal.ofReal (Real.exp (-(s * m * t) + m * s^2 / 8)) := by ...
    have optimize : Real.exp (-(4*t * m * t) + m * (4*t)^2 / 8)
      = Real.exp (-2 * m * t^2) := by ring_nf
    

CAST ISSUES to watch:

References: SSBD Lemma B.3, Hoeffding (1963)

T2: Symmetrization Step #

theorem symmetrization_step {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (C : ConceptClass X Bool) (c : Concept X Bool) (hmeas_C : hC, Measurable h) (hc_meas : Measurable c) (m : ) (hm : 0 < m) (ε : ) ( : 0 < ε) (hm_large : 2 * Real.log 2 m * ε ^ 2) :
(MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | hC, TrueErrorReal X h c D - EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool) ε} 2 * ((MeasureTheory.Measure.pi fun (x : Fin m) => D).prod (MeasureTheory.Measure.pi fun (x : Fin m) => D)) {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2}

Symmetrization: the probability of a large gap TrueErr-EmpErr is at most twice the probability of a large gap EmpErr'-EmpErr on the double sample.

Proof strategy (6 steps):

  1. Witness selection: For S in the bad event, ∃h* ∈ C with TrueErr(h*) - EmpErr_S(h*) ≥ ε.

    -- In the bad event set, extract h* by classical choice
    have h_witness : ∀ xs ∈ bad_event, ∃ h* ∈ C,
      TrueErrorReal X h* c D - EmpiricalError X Bool h* (sample xs) (zeroOneLoss Bool) ≥ ε
    
  2. Ghost sample mean: E_{S'}[EmpErr_{S'}(h*)] = TrueErr(h*) ≥ EmpErr_S(h*) + ε.

    • Uses: MeasureTheory.integral_pi to compute E[EmpErr] over product measure.
    • KEY LEMMA: For fixed h, E_{D^m}[EmpiricalError(h,S)] = TrueErrorReal(h,c,D). This is because EmpErr = (1/m)∑ indicator(x_i), and E[indicator(x_i)] = TrueErrorReal.
    have expected_emp_err : ∀ h* : Concept X Bool,
      ∫ xs, EmpiricalError X Bool h* (sample xs) (zeroOneLoss Bool)
        ∂(Measure.pi (fun _ : Fin m => D))
      = TrueErrorReal X h* c D := by ...
    
  3. Hoeffding on ghost sample: P_{S'}[EmpErr_{S'}(h*) < TrueErr(h*) - ε/2] ≤ exp(-mε²/2).

    • Apply hoeffding_one_sided with t = ε/2.
    • The hm_large hypothesis ensures exp(-mε²/2) < 1/2: 2·ln2 ≤ mε² ⟹ mε²/2 ≥ ln2 ⟹ exp(-mε²/2) ≤ 1/2.
    have hoeffding_ghost : ∀ h* ∈ C,
      Measure.pi (fun _ : Fin m => D)
        {xs' | EmpiricalError X Bool h* (sample xs') (zeroOneLoss Bool)
          < TrueErrorReal X h* c D - ε/2}
      ≤ ENNReal.ofReal (Real.exp (-m * (ε/2)^2 * 2)) := by
        intro h* _; exact hoeffding_one_sided D h* c m hm (ε/2) (by linarith) (by ...) (by ...)
    
  4. Complementary probability: P_{S'}[EmpErr_{S'}(h*) - EmpErr_S(h*) ≥ ε/2] ≥ 1/2.

    • From step 2: TrueErr(h*) ≥ EmpErr_S(h*) + ε
    • From step 3: P[EmpErr_{S'} ≥ TrueErr - ε/2] ≥ 1/2
    • Chain: EmpErr_{S'} ≥ TrueErr - ε/2 ≥ EmpErr_S + ε - ε/2 = EmpErr_S + ε/2
  5. Conditional to unconditional: The witness h* from step 1 also witnesses the double-sample event ∃h∈C: EmpErr'-EmpErr ≥ ε/2. So: P_{S'}[double event | S bad] ≥ 1/2.

    have conditional_bound : ∀ xs ∈ bad_event,
      Measure.pi (fun _ : Fin m => D)
        {xs' | ∃ h ∈ C, EmpiricalError ... xs' - EmpiricalError ... xs ≥ ε/2}
      ≥ ENNReal.ofReal (1/2) := by ...
    
  6. Fubini integration: By Measure.prod_apply and Fubini: P_{S,S'}[double event] = ∫S P{S'}[double event | S] ≥ (1/2) · P_S[bad event] ⟹ P_S[bad event] ≤ 2 · P_{S,S'}[double event].

    -- Uses: MeasureTheory.Measure.prod_apply or lintegral_prod
    -- MEASURABILITY: the double-sample event is measurable as a finite union
    -- of sets of the form {(xs,xs') | EmpErr'(h) - EmpErr(h) ≥ ε/2} for h ∈ C.
    -- Since C may be infinite, measurability requires care: the sup over h
    -- must be shown to be measurable. For finite restriction patterns (≤ 2^m
    -- on Fin m → Bool), this is a finite union.
    

MEASURABILITY CONCERNS:

  • {xs | ∃ h ∈ C, ...} is NOT obviously measurable for infinite C. Strategy: decompose via restriction patterns. On any fixed xs, the set of labelings {(h(xs 0), ..., h(xs(m-1))) | h ∈ C} has at most GF(C,m) ≤ 2^m elements. So the ∃h event is a finite union of measurable sets.
  • EmpiricalError is a finite sum of measurable functions, hence measurable.
  • The product σ-algebra on (Fin m → X) × (Fin m → X) is generated by cylinder sets, and our events are in this σ-algebra.

References: SSBD Lemma 4.5, Kakade-Tewari Lecture 19 Lemma 1

T3: Double Sample Pattern Bound (Approach A — Standard Exchangeability) #

theorem per_hypothesis_gap_bound {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (h c : Concept X Bool) (hmeas_h : Measurable h) (hc_meas : Measurable c) (m : ) (hm : 0 < m) (ε : ) ( : 0 < ε) :
have μ := MeasureTheory.Measure.pi fun (x : Fin m) => D; (μ.prod μ) {p : (Fin mX) × (Fin mX) | EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2} ENNReal.ofReal (Real.exp (-(m * ε ^ 2 / 8)))

Per-hypothesis Hoeffding on the double sample: for a FIXED hypothesis h, the probability that EmpErr(h,S') - EmpErr(h,S) ≥ ε/2 under D^m ⊗ D^m is at most exp(-mε²/8).

Proof: The gap = (1/m)∑ᵢ (Zᵢ' - Zᵢ) where Zᵢ = 1[h(xᵢ)≠c(xᵢ)], Zᵢ' = 1[h(x'ᵢ)≠c(x'ᵢ)] are iid Bernoulli(p) with p = TrueError(h,c,D). The differences Wᵢ = Zᵢ' - Zᵢ are independent, bounded in [-1,1], and centered (E[Wᵢ] = 0). By Hoeffding's inequality: P[(1/m)∑Wᵢ ≥ ε/2] ≤ exp(-mε²/8).

This uses the sub-Gaussian machinery from T1, extended to the product space.

Proof sketch:

  1. Pair D^m ⊗ D^m ≅ (D⊗D)^m via the natural isomorphism (Fin m → X) × (Fin m → X) ≃ᵐ Fin m → X × X
  2. Define g : X × X → ℝ, g(a,b) = 1[h(b)≠c(b)] - 1[h(a)≠c(a)] Then g ∈ [-1,1], E_{D⊗D}[g] = 0, so HasSubgaussianMGF g 1 (D⊗D)
  3. The gap = (1/m)∑ᵢ g(xᵢ, x'ᵢ) where pairs are iIndepFun under (D⊗D)^m
  4. By measure_sum_ge_le_of_iIndepFun: P[∑g ≥ mε/2] ≤ exp(-(mε/2)²/(2m)) = exp(-mε²/8)

Mathlib chain: iIndepFun_pi + HasSubgaussianMGF.of_map + measure_sum_ge_le_of_iIndepFun

theorem restriction_pattern_count {X : Type u} [MeasurableSpace X] [Infinite X] (C : ConceptClass X Bool) (c : Concept X Bool) (n : ) (z : Fin nX) :
{p : Fin nBool | hC, ∀ (i : Fin n), p i = decide (h (z i) c (z i))}.ncard GrowthFunction X C n

The number of distinct restriction patterns of C on any n points is at most GF(C,n). For z : Fin n → X, define patterns(z) = {p : Fin n → Bool | ∃ h ∈ C, ∀ i, p i = (h(z i) ≠ c(z i))}. Then patterns(z).ncard ≤ GrowthFunction X C n by definition of GrowthFunction.

theorem finite_exchangeability_bound {Ω : Type u_1} {G : Type u_2} [MeasurableSpace Ω] [Fintype G] [Nonempty G] {ν : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure ν] (T : GΩΩ) (S : Set Ω) (hT : ∀ (g : G), MeasureTheory.MeasurePreserving (T g) ν ν) (hS0 : MeasureTheory.NullMeasurableSet S ν) (B : ENNReal) (hpointwise : ∀ (z : Ω), g : G, (T g ⁻¹' S).indicator 1 z B * (Fintype.card G)) :
ν S B

Generic finite exchangeability bound. Given a measure-preserving family of transformations on a probability space, a NullMeasurableSet S, and a pointwise bound on the sum of preimage indicators, conclude ν(S) ≤ B.

A concept class is well-behaved if the ghost gap event is null-measurable. This is the minimal regularity assumption for the symmetrization proof.

Equations
  • One or more equations did not get rendered due to their size.
Instances For
    theorem exchangeability_chain_bound {X : Type u} [MeasurableSpace X] [Infinite X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (C : ConceptClass X Bool) (c : Concept X Bool) (_hmeas_C : hC, Measurable h) (_hc_meas : Measurable c) (m : ) (hm : 0 < m) (ε : ) ( : 0 < ε) (_hε2 : ε 2) (hC : Set.Nonempty C) (hE_nullmeas : MeasureTheory.NullMeasurableSet {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2} ((MeasureTheory.Measure.pi fun (x : Fin m) => D).prod (MeasureTheory.Measure.pi fun (x : Fin m) => D))) :
    have μ := MeasureTheory.Measure.pi fun (x : Fin m) => D; (μ.prod μ) {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2} ENNReal.ofReal ((GrowthFunction X C (2 * m)) * Real.exp (-(m * ε ^ 2 / 8)))
    theorem double_sample_pattern_bound {X : Type u} [MeasurableSpace X] [Infinite X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (C : ConceptClass X Bool) (c : Concept X Bool) (hmeas_C : hC, Measurable h) (hc_meas : Measurable c) (m : ) (hm : 0 < m) (ε : ) ( : 0 < ε) (hE_nullmeas : MeasureTheory.NullMeasurableSet {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2} ((MeasureTheory.Measure.pi fun (x : Fin m) => D).prod (MeasureTheory.Measure.pi fun (x : Fin m) => D))) :
    ((MeasureTheory.Measure.pi fun (x : Fin m) => D).prod (MeasureTheory.Measure.pi fun (x : Fin m) => D)) {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2} ENNReal.ofReal ((GrowthFunction X C (2 * m)) * Real.exp (-(m * ε ^ 2 / 8)))

    On the double sample, the probability that any hypothesis has EmpErr' - EmpErr ≥ ε/2 is bounded by GF(C,2m) · exp(-mε²/8).

    Proof strategy (Approach A — standard exchangeability, 5 steps):

    1. EXCHANGEABILITY: Under D^m ⊗ D^m, the 2m draws z₁,...,z_{2m} are iid from D. The joint distribution is invariant under permutations of {1,...,2m}.

      Key lemma: P_{D^m⊗D^m}[event(S,S')] = E_z[P_{split}[event | z]] where z = merged sample and the split is uniformly random among all C(2m,m) ways to partition z into two groups of m.

      -- Measure.pi permutation invariance
      have pi_perm_invariant : ∀ (σ : Equiv.Perm (Fin (2*m))),
        (Measure.pi (fun _ : Fin (2*m) => D)).map (fun z i => z (σ i))
        = Measure.pi (fun _ : Fin (2*m) => D) := by ...
      -- Consequence: the event probability equals the split-averaged probability
      have exchangeability :
        DoubleSampleMeasure D m {p | ∃ h ∈ C, gap(p) ≥ ε/2}
        = ∫ z, SplitMeasure m {vs | ∃ h ∈ C, gap(split z vs) ≥ ε/2}
          ∂(Measure.pi (fun _ : Fin (2*m) => D)) := by ...
      
    2. CONDITIONING: For fixed merged sample z of 2m points:

      • C restricts to at most GF(C,2m) distinct labeling patterns on z (deterministic).
      • For each pattern p, define: diff(p, split) = EmpErr_{S'}(p) - EmpErr_S(p) = (1/m) ∑{i∈S'} a_i - (1/m) ∑{i∈S} a_i where a_i = 1[pattern(z_i) ≠ c(z_i)] ∈ {0,1}.
      -- Number of distinct patterns
      have num_patterns : ∀ (z : MergedSample X m),
        Set.ncard {p : Fin (2*m) → Bool | ∃ h ∈ C, ∀ i, p i = (h (z i) ≠ c (z i))}
        ≤ GrowthFunction X C (2*m) := by ...
      
    3. PER-PATTERN HOEFFDING ON SPLITS: For fixed z and fixed pattern p: Under uniformly random split (S,S') of z into two groups of m: diff(p, split) = (1/m) ∑{i∈S'} a_i - (1/m) ∑{i∈S} a_i

      This is a function of the random partition. By Hoeffding's inequality for sampling without replacement (Serfling 1974): P_split[diff ≥ ε/2] ≤ exp(-mε²/8)

      Alternative derivation: Hoeffding without replacement from Hoeffding with replacement (iid signs) via coupling. The without-replacement bound is actually TIGHTER (variance reduction), but the with-replacement bound suffices.

      -- Per-pattern concentration
      have per_pattern_bound : ∀ (z : MergedSample X m) (a : Fin (2*m) → ℝ)
        (ha : ∀ i, a i ∈ Set.Icc 0 1),
        SplitMeasure m {vs | (1/m) * ∑ i ∈ second_group vs, a i
          - (1/m) * ∑ i ∈ first_group vs, a i ≥ ε/2}
        ≤ ENNReal.ofReal (Real.exp (-(m : ℝ) * (ε/2)^2 / 2)) := by ...
      -- Note: m*(ε/2)^2/2 = mε²/8
      
    4. UNION BOUND: P_split[∃ pattern: diff ≥ ε/2 | z] ≤ (number of patterns) · max_pattern P_split[diff ≥ ε/2] ≤ GF(C,2m) · exp(-mε²/8)

      have union_bound : ∀ (z : MergedSample X m),
        SplitMeasure m {vs | ∃ h ∈ C, gap(split z vs, h) ≥ ε/2}
        ≤ ENNReal.ofReal (GrowthFunction X C (2*m) * Real.exp (-(m : ℝ) * ε^2 / 8))
        := by ...
      
    5. INTEGRATE: P_{D^m⊗D^m}[event] = E_z[P_split[event|z]] (by step 1) ≤ E_z[GF(C,2m) · exp(-mε²/8)] (by step 4, pointwise) = GF(C,2m) · exp(-mε²/8) (bound is independent of z)

      -- The bound is a constant, so integrating gives the same constant
      -- (using IsProbabilityMeasure for the 2m-fold product)
      

    Infrastructure needed:

    MEASURABILITY CONCERNS:

    • The merged sample z ↦ P_split[event|z] must be measurable as a function of z. Since the event is a finite union over patterns, and each pattern's indicator is a measurable function of z (finite evaluation), this follows.
    • GrowthFunction X C (2*m) is a natural number (deterministic), no measurability issue.

    References: SSBD Theorem 6.7, Hoeffding (1963), Serfling (1974)

    theorem hoeffding_one_sided_upper {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (h c : Concept X Bool) (m : ) (hm : 0 < m) (t : ) (ht : 0 < t) (_ht1 : t 1) (hmeas : MeasurableSet {x : X | h x c x}) :
    (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool) TrueErrorReal X h c D + t} ENNReal.ofReal (Real.exp (-2 * m * t ^ 2))

    Upper-tail Hoeffding: for iid Bernoulli(p) draws, the empirical average overshoots the mean by ≥ t with probability ≤ exp(-2mt²).

    This is the mirror of hoeffding_one_sided (which bounds the lower tail). The proof uses the same sub-Gaussian machinery with Z_i = indicator(x_i) - p (instead of p - indicator(x_i)).

    theorem symmetrization_step_lower {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (C : ConceptClass X Bool) (c : Concept X Bool) (hmeas_C : hC, Measurable h) (hc_meas : Measurable c) (m : ) (hm : 0 < m) (ε : ) ( : 0 < ε) (hm_large : 2 * Real.log 2 m * ε ^ 2) :
    (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | hC, EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool) - TrueErrorReal X h c D ε} 2 * ((MeasureTheory.Measure.pi fun (x : Fin m) => D).prod (MeasureTheory.Measure.pi fun (x : Fin m) => D)) {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) ε / 2}

    Symmetrization step for the lower tail: P[∃h: EmpErr-TrueErr ≥ ε] ≤ 2·P_{double}[∃h: EmpErr_S-EmpErr_{S'} ≥ ε/2].

    Mirror of symmetrization_step for the opposite direction. Uses hoeffding_one_sided_upper instead of hoeffding_one_sided.

    T4: Symmetrization Uniform Convergence Bound (two-sided) #

    theorem symmetrization_uc_bound {X : Type u} [MeasurableSpace X] [Infinite X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (C : ConceptClass X Bool) (c : Concept X Bool) (hmeas_C : hC, Measurable h) (hc_meas : Measurable c) (m : ) (hm : 0 < m) (ε : ) ( : 0 < ε) (hm_large : 2 * Real.log 2 m * ε ^ 2) (hE_nullmeas : MeasureTheory.NullMeasurableSet {p : (Fin mX) × (Fin mX) | hC, EmpiricalError X Bool h (fun (i : Fin m) => (p.2 i, c (p.2 i))) (zeroOneLoss Bool) - EmpiricalError X Bool h (fun (i : Fin m) => (p.1 i, c (p.1 i))) (zeroOneLoss Bool) ε / 2} ((MeasureTheory.Measure.pi fun (x : Fin m) => D).prod (MeasureTheory.Measure.pi fun (x : Fin m) => D))) :
    (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | hC, |TrueErrorReal X h c D - EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool)| ε} ENNReal.ofReal (4 * (GrowthFunction X C (2 * m)) * Real.exp (-(m * ε ^ 2 / 8)))

    The symmetrization uniform convergence bound: two-sided version. P[∃h∈C: |TrueErr-EmpErr| ≥ ε] ≤ 4·GF(C,2m)·exp(-mε²/8).

    Proof strategy (4 steps):

    1. Decompose absolute value: |TrueErr - EmpErr| ≥ ε ↔ (TrueErr - EmpErr ≥ ε) ∨ (EmpErr - TrueErr ≥ ε)

      have abs_decomp : ∀ (a b : ℝ),
        |a - b| ≥ ε ↔ a - b ≥ ε ∨ b - a ≥ ε := by
        intro a b; constructor
        · intro h; by_cases h' : a - b ≥ ε
          · exact Or.inl h'
          · exact Or.inr (by linarith [abs_sub_comm a b, le_abs_self (a - b)])
        · intro h; cases h with
          | inl h => exact le_trans (le_of_eq (abs_of_nonneg (by linarith))) (by linarith)
          | inr h => exact le_trans (le_of_eq (abs_of_nonpos (by linarith) ▸ ...)) ...
      
    2. Upper tail: P[∃h∈C: TrueErr-EmpErr ≥ ε] ≤ 2·GF(C,2m)·exp(-mε²/8)

    3. Lower tail: P[∃h∈C: EmpErr-TrueErr ≥ ε] ≤ 2·GF(C,2m)·exp(-mε²/8)

      • Apply the symmetric argument: swap roles of S and S' in the double sample.
      • Equivalently, apply symmetrization_step to the event EmpErr-TrueErr ≥ ε and bound the double-sample event {EmpErr_S - EmpErr_{S'} ≥ ε/2}.
      • The bound is symmetric because D^m ⊗ D^m is symmetric under swapping factors.
      have swap_symmetry :
        DoubleSampleMeasure D m {p | ∃ h ∈ C, EmpErr(S) - EmpErr(S') ≥ ε/2}
        = DoubleSampleMeasure D m {p | ∃ h ∈ C, EmpErr(S') - EmpErr(S) ≥ ε/2} :=
        Measure.prod_swap ...
      
    4. Union bound: P[|gap| ≥ ε] ≤ P[gap ≥ ε] + P[gap ≤ -ε] ≤ 2·GF·exp(...) + 2·GF·exp(...) = 4·GF(C,2m)·exp(-mε²/8)

      -- Uses: MeasureTheory.measure_union_le for the union of two events
      -- CAST: 2 * X + 2 * X = 4 * X in ENNReal (need ENNReal.add_mul or similar)
      

    References: SSBD Theorem 6.7, Kakade-Tewari Lecture 19

    T5: Arithmetic — Growth Function × Exponential ≤ δ #

    theorem growth_exp_le_delta {X : Type u} [MeasurableSpace X] (C : ConceptClass X Bool) (v : ) (hv : 0 < v) (m : ) (hm : 0 < m) (ε δ : ) ( : 0 < ε) ( : 0 < δ) (hδ1 : δ < 1) (hv_bound : ∀ (n : ), v nGrowthFunction X C n iFinset.range (v + 1), n.choose i) (hm_bound : (16 * Real.exp 1 * (v + 1) / ε ^ 2) ^ (v + 1) / δ m) :
    4 * (GrowthFunction X C (2 * m)) * Real.exp (-(m * ε ^ 2 / 8)) δ 2 * Real.log 2 m * ε ^ 2

    Sorry-free UC proof: composing symmetrization + arithmetic #

    These theorems close the sorry in uc_bad_event_le_delta (Generalization.lean) by composing symmetrization_uc_bound with growth_exp_le_delta. They live here because Symmetrization.lean has access to both components, whereas Generalization.lean cannot import Symmetrization.lean (circular).

    theorem vcdim_finite_imp_uc' (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) (hC : VCDim X C < ) (hmeas_C : hC, Measurable h) (hc_meas : ∀ (c : Concept X Bool), Measurable c) (hWB : WellBehavedVC X C) :

    Finite VCDim implies uniform convergence. Proof: VCDim < ∞ → UC.

    • Finite X: direct Hoeffding per-hypothesis + finite union bound.
    • Infinite X: Sauer-Shelah → symmetrization + growth function → UC.
    theorem vcdim_finite_imp_pac_via_uc' (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) (hC : VCDim X C < ) (hmeas_C : hC, Measurable h) (hc_meas : ∀ (c : Concept X Bool), Measurable c) (hWB : WellBehavedVC X C) :

    VCDim < ⊤ → PACLearnable via UC route.