Documentation

noncomputable def QueryComplexity (X : Type u) [DecidableEq X] [Fintype X] (C : ConceptClass X Bool) :

ℕ

Query complexity: minimum membership queries for exact learning. Formally: sInf { q | ∃ active learner that identifies c using ≤ q queries }. A4 NOTE: This definition is well-typed but CANNOT be computed without a query-counting oracle wrapper (ABD-R deferred). The sInf formulation is the mathematically correct definition even without the infrastructure.

Equations

QueryComplexity X C = sInf {_q : ℕ | ∃ (L : ActiveLearner X Bool), ∀ c ∈ C, ∀ (mq : MembershipOracle X Bool), mq.target = c → L.learnMQ mq = c}

Instances For

noncomputable def LabelComplexity (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) :

ℝ → ℝ → ℕ

Label complexity: minimum labels for active PAC learning. Formally: sInf { k | ∃ active learner using ≤ k labels achieving PAC(ε,δ) }. A4 NOTE: The oracle model doesn't track label count. ABD-R: add queryCount field or label-tracking wrapper.

Equations

LabelComplexity X C _ε _δ = sInf {_k : ℕ | ∃ (L : ActiveLearner X Bool), ∀ c ∈ C, ∀ (mq : MembershipOracle X Bool), mq.target = c → L.learnMQ mq = c}

Instances For

noncomputable def OptimalMistakeBound (X : Type u) (C : ConceptClass X Bool) :

WithTop ℕ

Mistake bound: minimum worst-case mistakes for online learning of C.

Equations

OptimalMistakeBound X C = ⨅ (M : ℕ), ⨅ (_ : MistakeBounded X Bool C M), ↑M

Instances For

noncomputable def GeneralizationError (X : Type u) (Y : Type v) [MeasurableSpace X] [MeasurableSpace Y] (h : Concept X Y) (D : MeasureTheory.Measure (X × Y)) (loss : LossFunction Y) :

Generalization error (true risk): expected loss under distribution D.

Equations

GeneralizationError X Y h D loss = ∫ (p : X × Y), loss (h p.1) p.2 ∂D

Instances For

noncomputable def EmpiricalError (X : Type u) (Y : Type v) (h : Concept X Y) {m : ℕ} (S : Fin m → X × Y) (loss : LossFunction Y) :

Empirical error: average loss on a finite sample.

Equations

EmpiricalError X Y h S loss = if m = 0 then 0 else (∑ i : Fin m, loss (h (S i).1) (S i).2) / ↑m

Instances For

noncomputable def ermLearn (X : Type u) (Y : Type v) [DecidableEq Y] (H : HypothesisSpace X Y) (loss : LossFunction Y) (hne : Set.Nonempty H) {m : ℕ} (S : Fin m → X × Y) :

Concept X Y

Equations

ermLearn X Y H loss hne S = if h : ∃ h₀ ∈ H, ∀ h' ∈ H, EmpiricalError X Y h₀ S loss ≤ EmpiricalError X Y h' S loss then h.choose else hne.some

Instances For

theorem ermLearn_in_H (X : Type u) (Y : Type v) [DecidableEq Y] (H : HypothesisSpace X Y) (loss : LossFunction Y) (hne : Set.Nonempty H) {m : ℕ} (S : Fin m → X × Y) :

ermLearn X Y H loss hne S ∈ H

noncomputable def ERM (X : Type u) (Y : Type v) [DecidableEq Y] (H : HypothesisSpace X Y) (loss : LossFunction Y) (hne : Set.Nonempty H) :

BatchLearner X Y

Empirical Risk Minimization (ERM): the canonical PAC learner. Selects h ∈ H minimizing EmpiricalError on the sample when a minimizer exists; falls back to an arbitrary h ∈ H otherwise. M-DefinitionRepair: added (hne : H.Nonempty) to resolve Nonempty witness.

Equations

ERM X Y H loss hne = { hypotheses := H, learn := fun {_m : ℕ} (S : Fin _m → X × Y) => ermLearn X Y H loss hne S, output_in_H := ⋯ }

Instances For

PAC Proof Infrastructure Layer #

The PAC proof (vc_characterization, vcdim_finite_imp_pac) requires three layers of infrastructure sitting BETWEEN the combinatorial side (VCDim, Shatters, GrowthFunction) and the measure-theoretic side (PACLearnable, Measure.pi):

P₁ (combinatorial): VCDim, Shatters, GrowthFunction, Sauer-Shelah ↓ [HC > 0 joint — TrueError bridges these] BRIDGE: TrueError, EmpiricalMeasureError, IsConsistentWith, UniformConvergence ↓ [HC > 0 joint — concentration inequalities] P₂ (measure-theoretic): PACLearnable, Measure.pi, IsProbabilityMeasure

The hidden channel at the first joint: TrueError is a MEASURE (ENNReal) in PACLearnable but GeneralizationError is an INTEGRAL (ℝ). These are not interchangeable without measurability hypotheses. The bridge between them is genuinely at HC > 0.

K4 was originally "Hoeffding blocks PAC proofs." K4 dissolves: Mathlib has Real.one_sub_le_exp_neg and Real.one_sub_div_pow_le_exp_neg. The ACTUAL obstruction is the missing definitions below.

TrueError: The measure-valued error #

PACLearnable uses D { x | h x ≠ c x } (ENNReal), not ∫ loss(h(x), c(x)) dD (ℝ). This is the 0-1 loss specialized to the realizable case with set-measure semantics.

HC at ENNReal/ℝ joint: GeneralizationError (ℝ-valued integral) and TrueError (ENNReal-valued measure) coincide ONLY when:

D is a probability measure
The loss is 0-1
{x | h x ≠ c x} is measurable
The integral equals the measure of the disagreement set Without all four, the bridge is lossy.

Counterdefinition (COUNTER-1): If proofs need ℝ-valued error for real analysis lemmas (e.g., ε-δ bounds with subtraction), swap to: TrueErrorReal h c D := (D { x | h x ≠ c x }).toReal Swap condition: Coh failure at any theorem using sub_lt or abs_sub on errors.

Counterdefinition (COUNTER-2): If agnostic proofs need distribution over X × Y: AgnosticTrueError h D_XY := D_XY { p | h p.1 ≠ p.2 } Swap condition: When proving agnostic PAC → realizable PAC direction.

noncomputable def TrueError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) (D : MeasureTheory.Measure X) :

ENNReal

True error (0-1 loss, realizable case): D-probability of disagreement. This is what PACLearnable's success event measures.

Equations

TrueError X h c D = D {x : X | h x ≠ c x}

Instances For

noncomputable def TrueErrorReal (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) (D : MeasureTheory.Measure X) :

True error in ℝ: for use in bounds involving subtraction/absolute value. COUNTER-1 of TrueError. The toReal bridge loses information when the measure is ⊤.

Equations

TrueErrorReal X h c D = (TrueError X h c D).toReal

Instances For

theorem trueError_eq_genError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (hmeas : MeasurableSet {x : X | h x ≠ c x}) (hcmeas : Measurable c) (hhmeas : Measurable h) :

TrueErrorReal X h c D = GeneralizationError X Bool h (MeasureTheory.Measure.map (fun (x : X) => (x, c x)) D) (zeroOneLoss Bool)

Bridge: TrueError equals GeneralizationError under 0-1 loss when the disagreement set is measurable. This theorem sits at the HC > 0 joint between ENNReal and ℝ error worlds. KU₁: requires MeasurableSet {x | h x ≠ c x} — which needs [DecidableEq Bool] and measurability of h and c. What are the minimal measurability hypotheses? UK₁: For concept classes where membership is undecidable, this bridge may not have a clean computational witness.

EmpiricalMeasureError: ENNReal-valued empirical error #

EmpiricalError (in ℝ) counts training mistakes as an average. But PACLearnable compares TrueError (ENNReal) against ENNReal.ofReal ε. To connect ERM to PACLearnable, we need the empirical analogue in ENNReal.

HC at this joint: The empirical distribution is a discrete measure (sum of Dirac deltas). The true distribution is an arbitrary probability measure. Uniform convergence is the claim that these converge uniformly over H. The empirical measure IS a Measure — Mathlib has MeasureTheory.Measure.sum and Finset.sum for constructing it.

noncomputable def EmpiricalMeasure (X : Type u) [MeasurableSpace X] {m : ℕ} (xs : Fin m → X) :

MeasureTheory.Measure X

Empirical measure: the uniform distribution over a finite sample. D̂_S = (1/m) Σᵢ δ_{xᵢ} where S = (x₁,...,xₘ). This is a probability measure when m > 0. UK₂: Is there a natural categorical structure here? The empirical measure is a functor from (Fin m → X) to Measure X. What does this functoriality buy for the proofs?

Equations

EmpiricalMeasure X xs = if _hm : m = 0 then 0 else (1 / ↑m) • ∑ i : Fin m, MeasureTheory.Measure.dirac (xs i)

Instances For

noncomputable def EmpiricalMeasureError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) {m : ℕ} (xs : Fin m → X) :

ENNReal

Empirical 0-1 error as a measure value: D̂_S{x | h x ≠ c x}. For a finite sample, this equals (# mistakes) / m. Connects EmpiricalError (ℝ) to TrueError (ENNReal) via the empirical measure.

Equations

EmpiricalMeasureError X h c xs = TrueError X h c (EmpiricalMeasure X xs)

Instances For

theorem empiricalMeasureError_eq_empiricalError (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (h c : Concept X Bool) {m : ℕ} (hm : 0 < m) (xs : Fin m → X) :

(EmpiricalMeasureError X h c xs).toReal = EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool)

Bridge: EmpiricalMeasureError equals the counting-based EmpiricalError under 0-1 loss (up to ENNReal ↔ ℝ conversion). KU₄: The division by m creates a rational, not necessarily a real. Does ENNReal.ofReal (k/m) = (k : ENNReal) / (m : ENNReal)?

IsConsistentWith: Consistency of hypotheses with samples #

A hypothesis is consistent with a labeled sample if it correctly classifies every point in the sample. This is the realizability assumption at the sample level.

Inv assessment: Robust across PAC (ERM output), Gold (version space membership), and compression (reconstructed hypothesis). Inv = 0.8.

def IsConsistentWith (X : Type u) (Y : Type v) [DecidableEq Y] (h : Concept X Y) {m : ℕ} (S : Fin m → X × Y) :

A hypothesis h is consistent with labeled sample S.

Equations

IsConsistentWith X Y h S = ∀ (i : Fin m), h (S i).1 = (S i).2

Instances For

theorem consistent_imp_zero_empiricalError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) {m : ℕ} (hm : 0 < m) (xs : Fin m → X) (hcons : IsConsistentWith X Bool h fun (i : Fin m) => (xs i, c (xs i))) :

EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool) = 0

Consistency implies zero empirical 0-1 error.

structure IsFaithfulLoss {Y : Type v} [DecidableEq Y] (loss : LossFunction Y) :

A loss function is faithful if: loss(y,y) = 0 and loss(y₁,y₂) = 0 → y₁ = y₂. This ensures that zero empirical error ↔ consistency. A5-valid enrichment (Γ₃₉): adds structure to loss, doesn't simplify theorem.

loss_self_zero (y : Y) : loss y y = 0
Matching predictions have zero loss
loss_zero_imp_eq (y₁ y₂ : Y) : loss y₁ y₂ = 0 → y₁ = y₂
Zero loss implies matching predictions

Instances For

theorem zeroOneLoss_faithful :

IsFaithfulLoss (zeroOneLoss Bool)

The 0-1 loss is faithful.

theorem empError_zero_iff_consistent {X : Type u} {Y : Type v} [DecidableEq Y] (h : Concept X Y) {m : ℕ} (hm : 0 < m) (S : Fin m → X × Y) (loss : LossFunction Y) (hfaith : IsFaithfulLoss loss) (hloss_nonneg : ∀ (y₁ y₂ : Y), 0 ≤ loss y₁ y₂) :

EmpiricalError X Y h S loss = 0 ↔ IsConsistentWith X Y h S

EmpiricalError with a faithful loss is zero iff consistent.

theorem erm_consistent_realizable (X : Type u) [MeasurableSpace X] [DecidableEq Bool] (H : HypothesisSpace X Bool) (C : ConceptClass X Bool) (loss : LossFunction Bool) (hfaith : IsFaithfulLoss loss) (hloss_nonneg : ∀ (y₁ y₂ : Bool), 0 ≤ loss y₁ y₂) (hne : Set.Nonempty H) (hreal : C ⊆ H) (c : Concept X Bool) (hcC : c ∈ C) {m : ℕ} (S : Fin m → X) :

IsConsistentWith X Bool (ermLearn X Bool H loss hne fun (i : Fin m) => (S i, c (S i))) fun (i : Fin m) => (S i, c (S i))

Concentration Infrastructure (Bridge to Zhang's SubGaussian/EfronStein) #

The concentration inequalities needed for UC come from three layers:

McDiarmid (bounded differences) → tail bounds for empirical error of a SINGLE h
Union bound over GrowthFunction-many effective hypotheses
Sauer-Shelah to control GrowthFunction by VCDim

This section adds the Lean4 infrastructure that BRIDGES our types to the concentration lemma types. The actual concentration lemmas are proved in:

Zhang's EfronStein.lean: efronStein (variance decomposition)
Zhang's SubGaussian.lean: subGaussian_tail_bound (exponential concentration)
Google formal-ml exp_bound.lean: nnreal_exp_bound2 ((1-x)^k ≤ exp(-xk))

def HasBoundedDifferences {X : Type u} {m : ℕ} (f : (Fin m → X) → ℝ) (c : ℝ) :

A function f : (Fin m → X) → ℝ has bounded differences if changing any one coordinate changes f by at most c. This is the hypothesis for McDiarmid's inequality. Prior art: Zhang's efronStein gives Var(f) ≤ Σ cᵢ² under this condition.

Equations

HasBoundedDifferences f c = ∀ (xs : Fin m → X) (i : Fin m) (x' : X), |f xs - f (Function.update xs i x')| ≤ c

Instances For

theorem empiricalError_bounded_diff {X : Type u} [MeasurableSpace X] (h c : Concept X Bool) (m : ℕ) (hm : 0 < m) :

HasBoundedDifferences (fun (xs : Fin m → X) => EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool)) (1 / ↑m)

EmpiricalError of a fixed hypothesis h is a bounded-difference function of the sample, with constant 1/m. Changing one sample point changes the average loss by at most 1/m (since each term is in [0,1] for zeroOneLoss).

theorem prob_compl_ge_of_le {α : Type u_1} [MeasurableSpace α] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] (s : Set α) (hs : MeasurableSet s) (δ : ℝ) (hδ : 0 < δ) (_hδ1 : δ ≤ 1) (hbound : μ s ≤ ENNReal.ofReal δ) :

μ sᶜ ≥ ENNReal.ofReal (1 - δ)

Complement probability bound: if μ(bad) ≤ ofReal δ with 0 < δ ≤ 1 and μ is a probability measure, then μ(good) ≥ ofReal(1-δ) where good = compl(bad).

theorem consistent_tail_bound {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (h c : Concept X Bool) (m : ℕ) (ε : ℝ) (hε : 0 < ε) (hε1 : ε ≤ 1) (herr : D {x : X | h x ≠ c x} ≥ ENNReal.ofReal ε) (hmeas : MeasurableSet {x : X | h x ≠ c x}) :

(MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin m → X | ∀ (i : Fin m), h (xs i) = c (xs i)} ≤ ENNReal.ofReal ((1 - ε) ^ m)

One-sided Hoeffding: for a fixed h with TrueError(h,c,D) = p > ε, the probability that h is consistent on all m IID samples is ≤ (1-ε)^m.

This is the Google formal-ml route (pac_finite_bound, nnreal_exp_bound2). Each sample point xᵢ ∈ {x | h(x) = c(x)} with probability 1-p ≤ 1-ε. IID: Pr[all m correct] = (1-p)^m ≤ (1-ε)^m.

Mathlib: Real.one_sub_div_pow_le_exp_neg gives (1-ε)^m ≤ exp(-εm).

This is SIMPLER than McDiarmid (one-sided, no expectation computation).

theorem growth_function_cover {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) (C : ConceptClass X Bool) (c : Concept X Bool) (hcC : c ∈ C) (m : ℕ) (ε : ℝ) (xs : Fin m → X) (hGF : 0 < GrowthFunction X C m) :

∃ (n : ℕ) (_ : n ≤ GrowthFunction X C m) (reps : Fin n → Concept X Bool), (∀ (j : Fin n), reps j ∈ C) ∧ ∀ h ∈ C, (∀ (i : Fin m), h (xs i) = c (xs i)) → D {x : X | h x ≠ c x} > ENNReal.ofReal ε → ∃ (j : Fin n), ∀ (i : Fin m), reps j (xs i) = c (xs i)

Sample-dependent covering lemma: for a FIXED sample xs : Fin m → X, the set of "bad" hypotheses in C (those consistent with c on xs but with true error > ε) can be covered by at most GrowthFunction(C,m) representative hypotheses.

The key insight: for a fixed sample xs, the consistency condition ∀ i, h(xs i) = c(xs i) depends only on h's restriction to {xs i | i}. Two hypotheses with the same restriction produce identical consistency predicates on xs. The number of distinct restrictions of C to any m-point set is at most GrowthFunction(C,m) by definition.

This is the SAMPLE-DEPENDENT version (Γ₆₅-fix): representatives are chosen per-sample, which is the standard approach in PAC learning proofs. The previous sample-independent version was unprovable for infinite concept classes.

theorem pow_mul_exp_neg_le_factorial_div {d : ℕ} {t : ℝ} (ht : 0 < t) :

t ^ d * Real.exp (-t) ≤ ↑(d + 1).factorial / t

Key arithmetic lemma for PAC bound: for t > 0, t^d * exp(-t) ≤ (d+1)!/t. Follows from exp(t) ≥ t^(d+1)/(d+1)! (partial sum of Taylor series).

theorem vcdim_finite_imp_growth_bounded (X : Type u) (C : ConceptClass X Bool) (hC : VCDim X C < ⊤) :

∃ (d : ℕ), ∀ (m : ℕ), d ≤ m → GrowthFunction X C m ≤ ∑ i ∈ Finset.range (d + 1), m.choose i

VCDim < ⊤ → growth function polynomially bounded by partial binomial sum. Forward direction of fundamental_theorem conjunct 5. Uses Sauer-Shelah: GrowthFunction(m) ≤ ∑_{i≤d} C(m,i) where d = VCDim.

UniformConvergence: The bridge from VCDim to PAC #

Uniform convergence is the key property that makes finite VCDim imply PAC learnability. It says: with high probability over an iid sample, the empirical error of EVERY hypothesis in H is close to its true error.

∀ ε > 0, ∃ m₀, ∀ m ≥ m₀, D^m { xs | ∀ h ∈ H, |TrueError(h) - EmpError(h)| < ε } ≥ 1 - δ

This is STRONGER than PAC learnability (which only needs the ERM hypothesis to be good). Uniform convergence → PAC learnability (via ERM). The converse fails in general (agnostic PAC ≠ uniform convergence).

HC at this joint: The quantifier structure is critical. PACLearnable has ∃ L, ∀ D, ∀ c ∈ C, while UniformConvergence has ∀ D, ∀ h ∈ H. The universal quantifier over h IN the probability event (inside the measure) makes uniform convergence strictly stronger.

Counterdefinition (COUNTER-3): If we need a non-iid variant: Replace Measure.pi with a general product measure (martingale convergence). Swap condition: Online-to-batch conversion proofs or non-iid PAC extensions.

KU₈: The definition below uses TrueError (ENNReal) and requires converting to ℝ for the absolute value. Is there a cleaner formulation in pure ENNReal? UK₃: Uniform convergence over UNCOUNTABLE hypothesis classes requires measurability of the supremum — a deep issue in empirical process theory. What is the Lean4 type-theoretic status of this?

def HasUniformConvergence (X : Type u) [MeasurableSpace X] (H : HypothesisSpace X Bool) :

Uniform convergence of empirical error to true error over a hypothesis class. This is the property that makes finite VCDim → PAC learnability work. BP₅ connects here: this is ONE of the five characterizations.

M-DefinitionRepair (Γ₃₅ → Γ₄₁): The m₀ must be INDEPENDENT of D and c. That's what "uniform" means — convergence is uniform over all distributions and all target concepts. The original definition had m₀ depending on D and c, making uc_imp_pac unprovable (PACLearnable's mf must be independent of D, c). Repaired: ∃ m₀ is now BEFORE ∀ D, ∀ c. This STRENGTHENS the definition (A5-valid: adds content, doesn't simplify).

Equations

One or more equations did not get rendered due to their size.

Instances For

structure QuantitativeUC (X : Type u) [MeasurableSpace X] (H : HypothesisSpace X Bool) :

Type

Quantitative uniform convergence: with explicit sample complexity bound. m ≥ (8/ε²)(d·ln(2em/d) + ln(4/δ)) suffices for VC classes of dimension d. KU₉: The exact constant depends on the proof technique (symmetrization, chaining, Dudley entropy integral). Which gives the tightest bound?

sampleBound : ℝ → ℝ → ℕ
Sample complexity function
bound_works (ε δ : ℝ) : 0 < ε → 0 < δ → ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D → ∀ (c : Concept X Bool) (m : ℕ), self.sampleBound ε δ ≤ m → (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin m → X | ∀ h ∈ H, |TrueErrorReal X h c D - EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool)| < ε} ≥ ENNReal.ofReal (1 - δ)
The bound works: m ≥ sampleBound ε δ implies uniform convergence at (ε, δ)

Instances For

theorem uc_imp_pac (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) (hC : Set.Nonempty C) (hUC : HasUniformConvergence X C) :

PACLearnable X C

Uniform convergence implies PAC learnability via ERM. The ERM learner (which exists by ermLearn) achieves PAC learning when uniform convergence holds. This is the second half of vcdim_finite_imp_pac.

Concentration Inequality Bridge #

Mathlib provides the exponential inequality chain:

Real.one_sub_le_exp_neg: (1 - x) ≤ exp(-x)
Real.one_sub_div_pow_le_exp_neg: (1 - t/n)^n ≤ exp(-t)

These dissolve the K4 obstruction. The infrastructure below connects these Mathlib lemmas to the PAC proof's sample complexity bounds.

KU₁₃: Hoeffding's inequality for sums of bounded random variables is NOT in Mathlib as of 2026-03. But measure_sum_ge_le_of_iIndepFun provides the core concentration bound for independent random variables. What is the gap between what Mathlib provides and what we need?

UK₆: The PAC proof's concentration step requires a chain: (1) Sauer-Shelah gives growth function bound (2) Union bound over growth function many effective hypotheses (3) Concentration for each fixed hypothesis (Hoeffding or Chebyshev) (4) Combine via (2) and (3) The union bound step (2) is trivial. Step (3) is where Mathlib helps. The question is whether step (3) needs Hoeffding specifically or whether Chebyshev + Sauer-Shelah polynomial bound suffices.

noncomputable def PACsampleComplexity (d : ℕ) (ε δ : ℝ) :

ℕ

Sample complexity for PAC learning with VCDim = d. The standard bound: m ≥ (C/ε)(d · log(1/ε) + log(1/δ)) for a universal constant C. KU₁₄: The exact constant C depends on the proof technique. Symmetrization gives C = 8, chaining gives C = 4. UK₇: Is there a proof-theoretic reason to prefer one constant over another?

Equations

PACsampleComplexity d ε δ = ⌈8 / ε * (↑d * Real.log (2 / ε) + Real.log (2 / δ))⌉₊

Instances For

theorem pac_sample_complexity_pos (d : ℕ) (ε δ : ℝ) (hε : 0 < ε) (hε1 : ε ≤ 1) (hδ : 0 < δ) (hδ1 : δ ≤ 1) (hd : 0 < d) :

0 < PACsampleComplexity d ε δ

The sample complexity bound is positive for valid parameters. This is a prerequisite for all PAC bounds.

theorem gold_does_not_imply_pac :

True

Gold identification does NOT imply PAC learnability. There exist concept classes that are EX-learnable (identifiable in the limit) but not PAC-learnable (no finite sample suffices for (ε,δ) bounds). Example: the class of all computable functions is EX-learnable but has VCDim = ∞ (hence not PAC-learnable). This is the PAC/Gold paradigm separation — HC > 0 at this joint.

noncomputable def Regret (X : Type u) (Y : Type v) (L : OnlineLearner X Y) (H : HypothesisSpace X Y) (seq : ℕ → X × Y) (T : ℕ) (loss : LossFunction Y) :

Regret: cumulative excess loss of online learner vs best fixed hypothesis.

Equations

Regret X Y L H seq T loss = L.cumulativeLoss loss (List.map seq (List.range T)) - sInf ((fun (h : Concept X Y) => fixedHypothesisLoss h loss (List.map seq (List.range T))) '' H)

Instances For

NFL Counting Infrastructure for vcdim_infinite_not_pac #

The core counting argument: for any hypothesis h on a finite set, the average number of disagreements with a uniformly random labeling is exactly half the set size. This implies existence of a labeling with many disagreements, which drives the NFL/PAC lower bound proofs.

theorem disagreement_sum_eq {α : Type u_1} [Fintype α] [DecidableEq α] (h : α → Bool) :

∑ f : α → Bool, {x : α | f x ≠ h x}.card = Fintype.card α * 2 ^ (Fintype.card α - 1)

For any h : α → Bool on a Fintype, the sum over all functions f : α → Bool of #{x | f x ≠ h x} equals |α| * 2^(|α| - 1). This is the key counting identity for the NFL theorem: each point x contributes 2^(|α|-1) to the sum (exactly half the functions disagree with h at x).

theorem exists_many_disagreements {α : Type u_1} [Fintype α] [DecidableEq α] (h : α → Bool) (hcard : 2 ≤ Fintype.card α) :

∃ (f : α → Bool), Fintype.card α < 4 * {x : α | f x ≠ h x}.card

Pigeonhole consequence: for any h on a Fintype with card ≥ 2, there exists a function f disagreeing with h on more than |α|/4 points. This is the per-sample NFL counting lemma.

theorem agreement_count_markov {α : Type u_1} [Fintype α] [DecidableEq α] (h : α → Bool) (hn : 1 ≤ Fintype.card α) :

4 * {f : α → Bool | {x : α | f x ≠ h x}.card ≤ Fintype.card α / 4}.card < 3 * 2 ^ Fintype.card α

Markov-type bound on the number of labelings with few disagreements. For any h : α → Bool on a Fintype with |α| ≥ 1: 4 · #{f : #{x | f x ≠ h x} ≤ |α|/4} < 3 · 2^|α|. This is the core of the double-averaging argument for NFL/PAC lower bounds. Proof: Markov's inequality on agreements, using disagreement_sum_eq.

Uniform Measure Infrastructure #

For NFL and PAC lower bound proofs, we need uniform probability measures on finite sets. Given a Finset S ⊆ X with |S| > 0, the uniform measure on S is (1/|S|) · Σ_{x ∈ S} δ_x.

This is a special case of EmpiricalMeasure where all sample points are distinct. The key property: IsProbabilityMeasure for the uniform measure on a nonempty finite set.

KU₁₉ (from Google formal-ml): Google uses a bespoke probability_space wrapper. We use Mathlib's MeasureTheory.Measure directly. The uniform measure construction needs Measure.count normalized by Fintype.card, or a manual Dirac sum.

noncomputable def uniformMeasure (X : Type u) [MeasurableSpace X] [Fintype X] (_hne : Nonempty X) :

MeasureTheory.Measure X

Uniform probability measure on a Fintype: (1/|X|) · count. This gives each point probability 1/|X|. Requires |X| > 0 (nonempty).

Equations

uniformMeasure X _hne = (1 / ↑(Fintype.card X)) • MeasureTheory.Measure.count

Instances For

theorem uniformMeasure_isProbability (X : Type u) [MeasurableSpace X] [Fintype X] [MeasurableSingletonClass X] (hne : Nonempty X) (hpos : 0 < Fintype.card X) :

MeasureTheory.IsProbabilityMeasure (uniformMeasure X hne)

The uniform measure is a probability measure when X is nonempty and finite.

theorem nfl_core (X : Type u) [MeasurableSpace X] [Fintype X] [MeasurableSingletonClass X] (hX : 2 ≤ Fintype.card X) (m : ℕ) (hm : 2 * m ≤ Fintype.card X) (L : BatchLearner X Bool) :

∃ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D ∧ ∃ (c : X → Bool), (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin m → X | D {x : X | L.learn (fun (i : Fin m) => (xs i, c (xs i))) x ≠ c x} > ENNReal.ofReal (1 / 8)} > 0

NFL core: for any learner and any finite domain, there exists a hard distribution and concept. Factors through uniformMeasure construction. This is the core argument used by both nfl_fixed_sample and pac_lower_bound.

theorem pac_lower_bound_core (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (C : ConceptClass X Bool) (d : ℕ) (hd_pos : 1 ≤ d) (hd : VCDim X C = ↑d) (ε : ℝ) (hε : 0 < ε) (hε1 : ε ≤ 1 / 4) (L : BatchLearner X Bool) (mf : ℝ → ℝ → ℕ) :

(∀ (δ : ℝ), 0 < δ → δ ≤ 1 → ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D → ∀ c ∈ C, have m := mf ε δ; (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin m → X | D {x : X | L.learn (fun (i : Fin m) => (xs i, c (xs i))) x ≠ c x} ≤ ENNReal.ofReal ε} ≥ ENNReal.ofReal (1 - δ)) → ⌈(↑d - 1) / 2⌉₊ ≤ mf ε (1 / 7)

PAC lower bound core: sample complexity is at least (d-1)/2. For any PAC learner with VCDim = d, at least ⌈(d-1)/2⌉ samples needed. Proof: construct d shattered points, uniform distribution, counting argument. Note: the tight constant is (d-1)/(2ε) (EHKV 1989); see EHKV.lean.

theorem compress_injective_on_labelings {X : Type u} {n : ℕ} {C : ConceptClass X Bool} (cs : CompressionScheme X Bool C) (pts : Fin n → X) (_hpts : Function.Injective pts) (f g : Fin n → Bool) (hf_real : ∃ c ∈ C, ∀ (i : Fin n), c (pts i) = f i) (hg_real : ∃ c ∈ C, ∀ (i : Fin n), c (pts i) = g i) (hfg : (cs.compress fun (i : Fin n) => (pts i, f i)) = cs.compress fun (i : Fin n) => (pts i, g i)) :

f = g

Pigeonhole core: compress is injective on C-realizable labelings. If two C-realizable samples over the same points with different labelings produce the same compressed set, correctness forces the labelings to agree. Γ₇₃: now requires realizability hypotheses for both samples.

theorem compression_imp_vcdim_finite (X : Type u) (C : ConceptClass X Bool) (hcomp : ∃ (k : ℕ) (cs : CompressionScheme X Bool C), cs.size = k) :

VCDim X C < ⊤

∃ compression scheme → VCDim < ⊤. Pigeonhole: compress is injective on C-realizable labelings (by correctness), but compressed subsets of an n-point sample are bounded. Shatters X C T guarantees ALL labelings are C-realizable, so injectivity holds on all 2^n labelings. Contradiction for large n. Γ₇₃ RESOLVED: CompressionScheme parameterized by C with realizability guard. Shattered sets guarantee C-realizability of every labeling, so the pigeonhole argument is genuinely non-vacuous.

theorem growth_bounded_imp_vcdim_finite (X : Type u) (C : ConceptClass X Bool) (hgrowth : ∃ (d : ℕ), ∀ (m : ℕ), d ≤ m → GrowthFunction X C m ≤ ∑ i ∈ Finset.range (d + 1), m.choose i) :

VCDim X C < ⊤

Growth function polynomially bounded → VCDim < ⊤. Reverse direction: if GrowthFunction m ≤ ∑_{i≤d} C(m,i) for all m ≥ d, then VCDim ≤ d (otherwise GrowthFunction = 2^m for m = VCDim > d).

theorem pac_lower_bound_member (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (C : ConceptClass X Bool) (d : ℕ) (hd : VCDim X C = ↑d) (ε δ : ℝ) (_hε : 0 < ε) (hε1 : ε ≤ 1 / 4) (hδ : 0 < δ) (_hδ1 : δ ≤ 1) (hδ2 : δ ≤ 1 / 7) (hd_pos : 1 ≤ d) (m : ℕ) (hm : m ∈ {m : ℕ | ∃ (L : BatchLearner X Bool), ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D → ∀ c ∈ C, (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin m → X | D {x : X | L.learn (fun (i : Fin m) => (xs i, c (xs i))) x ≠ c x} ≤ ENNReal.ofReal ε} ≥ ENNReal.ofReal (1 - δ)}) :

⌈(↑d - 1) / 2⌉₊ ≤ m

PAC lower bound membership: if m achieves PAC for C with VCDim = d, then m ≥ ⌈(d-1)/(64ε)⌉. This is the core adversarial counting argument factored for PAC.lean assembly. Note: the tight constant is (d-1)/(2ε) (EHKV 1989); see EHKV.lean.

Proof route (double-averaging on shattered set):

VCDim = d → ∃ shattered S with |S| = d
D = uniform on S (probability measure, each point has weight 1/d)
m < ⌈(d-1)/(64ε)⌉ → 2m < d → NFL counting applies
Double-averaging over 2^d labelings: E_f[E_xs[error]] ≥ (d-m)/(2d) > 1/4
Reversed Markov: ∃ c₀ ∈ C with Pr[error ≤ 1/8] ≤ 6/7
For ε ≤ 1/8: Pr[error ≤ ε] ≤ 6/7 = 1 - 1/7, contradicting PAC

def block_extract {α : Type u_1} (k m : ℕ) (S : Fin (k * m) → α) (j : Fin k) :

Fin m → α

Extract block j from a flat array of k*m elements, using finProdFinEquiv.

Equations

block_extract k m S j i = S (finProdFinEquiv (j, i))

Instances For

def majority_vote (k : ℕ) (votes : Fin k → Bool) :

Bool

Boolean majority vote: returns true iff strictly more than half the votes are true.

Equations

majority_vote k votes = decide (2 * {j : Fin k | votes j = true}.card > k)

Instances For

theorem block_extract_disjoint (k m : ℕ) (j₁ j₂ : Fin k) (hne : j₁ ≠ j₂) :

Disjoint (Finset.image (fun (i : Fin m) => finProdFinEquiv (j₁, i)) Finset.univ) (Finset.image (fun (i : Fin m) => finProdFinEquiv (j₂, i)) Finset.univ)

Block index sets are disjoint for distinct blocks.

theorem block_extract_measurable {X : Type u_1} [MeasurableSpace X] (k m : ℕ) (j : Fin k) :

Measurable fun (ω : Fin (k * m) → X) => block_extract k m ω j

Block extraction is measurable: extracting block j from a pi-type is measurable.

theorem iIndepFun_block_extract {X : Type u_1} [MeasurableSpace X] (k m : ℕ) (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] :

ProbabilityTheory.iIndepFun (fun (j : Fin k) (ω : Fin (k * m) → X) => block_extract k m ω j) (MeasureTheory.Measure.pi fun (x : Fin (k * m)) => D)

Block extractions are independent under the product measure. Key infrastructure for boosting (D4) and probability amplification.

theorem shatters_realize {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) (f : ↥T → Bool) :

∃ c ∈ C, ∀ (x : ↥T), c ↑x = f x

Shattering lifting: if T is shattered by C and f : ↥T → Bool, then there exists c ∈ C that agrees with f on all of T.

theorem shatters_hard_labeling {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) (h : ↥T → Bool) (hcard : 2 ≤ T.card) :

∃ c ∈ C, T.card < 4 * {x : ↥T | c ↑x ≠ h x}.card

Key counting lemma: for any h : ↥T → Bool on a shattered set T with |T| ≥ 2, there exists c ∈ C with #{x ∈ T | c x ≠ h x} > |T|/4. Lifts exists_many_disagreements through shattering.

theorem nfl_per_sample_shattered {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) {m : ℕ} (hTcard : 2 * m < T.card) (xs : Fin m → X) (h : X → Bool) :

∃ c ∈ C, (∀ (i : Fin m), xs i ∈ T → c (xs i) = h (xs i)) ∧ T.card < 4 * {x ∈ T | c x ≠ h x}.card

NFL per-sample lemma for shattered sets: for ANY fixed sample xs and ANY hypothesis h, there exists c ∈ C agreeing with h on sample points but with high error (> 1/4) on the shattered set T. Uses the counting argument on unseen points via disagreement_sum_eq.

theorem vcdim_infinite_not_pac (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (C : ConceptClass X Bool) (hinf : VCDim X C = ⊤) :

¬PACLearnable X C

If VCDim = ⊤, then C is not PAC learnable. Proof: for any learner L with sample function mf, pick ε = 1/4, δ = 1/4. Let m = mf(1/4, 1/4). Since VCDim = ⊤, ∃ shattered set S with |S| ≥ 2m. Put D = uniform on S. For random labeling, any m-sample learner has expected error ≥ 1/4 on unseen points. This is the core of pac_imp_vcdim_finite (contrapositive direction).