Documentation

FLT_Proofs.Complexity.Generalization

Generalization Error, Sample/Query/Label Complexity, ERM #

The numerical quantities that PAC learning bounds. Includes the canonical PAC learner (ERM).

noncomputable def SampleComplexity (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) :

Sample complexity of PAC learning: the minimum number of samples needed to achieve (ε,δ)-PAC learning. m_C(ε,δ) = sInf{m | ∃ L, ∀ D prob, ∀ c ∈ C, D^m{S : error(L(S)) ≤ ε} ≥ 1-δ}.

Equations
  • One or more equations did not get rendered due to their size.
Instances For
    noncomputable def QueryComplexity (X : Type u) [DecidableEq X] [Fintype X] (C : ConceptClass X Bool) :

    Query complexity: minimum membership queries for exact learning. Formally: sInf { q | ∃ active learner that identifies c using ≤ q queries }. A4 NOTE: This definition is well-typed but CANNOT be computed without a query-counting oracle wrapper (ABD-R deferred). The sInf formulation is the mathematically correct definition even without the infrastructure.

    Equations
    Instances For
      noncomputable def LabelComplexity (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) :

      Label complexity: minimum labels for active PAC learning. Formally: sInf { k | ∃ active learner using ≤ k labels achieving PAC(ε,δ) }. A4 NOTE: The oracle model doesn't track label count. ABD-R: add queryCount field or label-tracking wrapper.

      Equations
      Instances For
        noncomputable def OptimalMistakeBound (X : Type u) (C : ConceptClass X Bool) :

        Mistake bound: minimum worst-case mistakes for online learning of C.

        Equations
        Instances For
          noncomputable def GeneralizationError (X : Type u) (Y : Type v) [MeasurableSpace X] [MeasurableSpace Y] (h : Concept X Y) (D : MeasureTheory.Measure (X × Y)) (loss : LossFunction Y) :

          Generalization error (true risk): expected loss under distribution D.

          Equations
          Instances For
            noncomputable def EmpiricalError (X : Type u) (Y : Type v) (h : Concept X Y) {m : } (S : Fin mX × Y) (loss : LossFunction Y) :

            Empirical error: average loss on a finite sample.

            Equations
            Instances For
              noncomputable def ermLearn (X : Type u) (Y : Type v) [DecidableEq Y] (H : HypothesisSpace X Y) (loss : LossFunction Y) (hne : Set.Nonempty H) {m : } (S : Fin mX × Y) :
              Equations
              Instances For
                theorem ermLearn_in_H (X : Type u) (Y : Type v) [DecidableEq Y] (H : HypothesisSpace X Y) (loss : LossFunction Y) (hne : Set.Nonempty H) {m : } (S : Fin mX × Y) :
                ermLearn X Y H loss hne S H
                noncomputable def ERM (X : Type u) (Y : Type v) [DecidableEq Y] (H : HypothesisSpace X Y) (loss : LossFunction Y) (hne : Set.Nonempty H) :

                Empirical Risk Minimization (ERM): the canonical PAC learner. Selects h ∈ H minimizing EmpiricalError on the sample when a minimizer exists; falls back to an arbitrary h ∈ H otherwise. M-DefinitionRepair: added (hne : H.Nonempty) to resolve Nonempty witness.

                Equations
                • ERM X Y H loss hne = { hypotheses := H, learn := fun {_m : } (S : Fin _mX × Y) => ermLearn X Y H loss hne S, output_in_H := }
                Instances For

                  PAC Proof Infrastructure Layer #

                  The PAC proof (vc_characterization, vcdim_finite_imp_pac) requires three layers of infrastructure sitting BETWEEN the combinatorial side (VCDim, Shatters, GrowthFunction) and the measure-theoretic side (PACLearnable, Measure.pi):

                  P₁ (combinatorial): VCDim, Shatters, GrowthFunction, Sauer-Shelah ↓ [HC > 0 joint — TrueError bridges these] BRIDGE: TrueError, EmpiricalMeasureError, IsConsistentWith, UniformConvergence ↓ [HC > 0 joint — concentration inequalities] P₂ (measure-theoretic): PACLearnable, Measure.pi, IsProbabilityMeasure

                  The hidden channel at the first joint: TrueError is a MEASURE (ENNReal) in PACLearnable but GeneralizationError is an INTEGRAL (ℝ). These are not interchangeable without measurability hypotheses. The bridge between them is genuinely at HC > 0.

                  K4 was originally "Hoeffding blocks PAC proofs." K4 dissolves: Mathlib has Real.one_sub_le_exp_neg and Real.one_sub_div_pow_le_exp_neg. The ACTUAL obstruction is the missing definitions below.

                  TrueError: The measure-valued error #

                  PACLearnable uses D { x | h x ≠ c x } (ENNReal), not ∫ loss(h(x), c(x)) dD (ℝ). This is the 0-1 loss specialized to the realizable case with set-measure semantics.

                  HC at ENNReal/ℝ joint: GeneralizationError (ℝ-valued integral) and TrueError (ENNReal-valued measure) coincide ONLY when:

                  1. D is a probability measure
                  2. The loss is 0-1
                  3. {x | h x ≠ c x} is measurable
                  4. The integral equals the measure of the disagreement set Without all four, the bridge is lossy.

                  Counterdefinition (COUNTER-1): If proofs need ℝ-valued error for real analysis lemmas (e.g., ε-δ bounds with subtraction), swap to: TrueErrorReal h c D := (D { x | h x ≠ c x }).toReal Swap condition: Coh failure at any theorem using sub_lt or abs_sub on errors.

                  Counterdefinition (COUNTER-2): If agnostic proofs need distribution over X × Y: AgnosticTrueError h D_XY := D_XY { p | h p.1 ≠ p.2 } Swap condition: When proving agnostic PAC → realizable PAC direction.

                  noncomputable def TrueError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) (D : MeasureTheory.Measure X) :

                  True error (0-1 loss, realizable case): D-probability of disagreement. This is what PACLearnable's success event measures.

                  Equations
                  Instances For
                    noncomputable def TrueErrorReal (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) (D : MeasureTheory.Measure X) :

                    True error in ℝ: for use in bounds involving subtraction/absolute value. COUNTER-1 of TrueError. The toReal bridge loses information when the measure is ⊤.

                    Equations
                    Instances For
                      theorem trueError_eq_genError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (hmeas : MeasurableSet {x : X | h x c x}) (hcmeas : Measurable c) (hhmeas : Measurable h) :

                      Bridge: TrueError equals GeneralizationError under 0-1 loss when the disagreement set is measurable. This theorem sits at the HC > 0 joint between ENNReal and ℝ error worlds. KU₁: requires MeasurableSet {x | h x ≠ c x} — which needs [DecidableEq Bool] and measurability of h and c. What are the minimal measurability hypotheses? UK₁: For concept classes where membership is undecidable, this bridge may not have a clean computational witness.

                      EmpiricalMeasureError: ENNReal-valued empirical error #

                      EmpiricalError (in ℝ) counts training mistakes as an average. But PACLearnable compares TrueError (ENNReal) against ENNReal.ofReal ε. To connect ERM to PACLearnable, we need the empirical analogue in ENNReal.

                      HC at this joint: The empirical distribution is a discrete measure (sum of Dirac deltas). The true distribution is an arbitrary probability measure. Uniform convergence is the claim that these converge uniformly over H. The empirical measure IS a Measure — Mathlib has MeasureTheory.Measure.sum and Finset.sum for constructing it.

                      noncomputable def EmpiricalMeasure (X : Type u) [MeasurableSpace X] {m : } (xs : Fin mX) :

                      Empirical measure: the uniform distribution over a finite sample. D̂_S = (1/m) Σᵢ δ_{xᵢ} where S = (x₁,...,xₘ). This is a probability measure when m > 0. UK₂: Is there a natural categorical structure here? The empirical measure is a functor from (Fin m → X) to Measure X. What does this functoriality buy for the proofs?

                      Equations
                      Instances For
                        noncomputable def EmpiricalMeasureError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) {m : } (xs : Fin mX) :

                        Empirical 0-1 error as a measure value: D̂_S{x | h x ≠ c x}. For a finite sample, this equals (# mistakes) / m. Connects EmpiricalError (ℝ) to TrueError (ENNReal) via the empirical measure.

                        Equations
                        Instances For
                          theorem empiricalMeasureError_eq_empiricalError (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (h c : Concept X Bool) {m : } (hm : 0 < m) (xs : Fin mX) :
                          (EmpiricalMeasureError X h c xs).toReal = EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool)

                          Bridge: EmpiricalMeasureError equals the counting-based EmpiricalError under 0-1 loss (up to ENNReal ↔ ℝ conversion). KU₄: The division by m creates a rational, not necessarily a real. Does ENNReal.ofReal (k/m) = (k : ENNReal) / (m : ENNReal)?

                          IsConsistentWith: Consistency of hypotheses with samples #

                          A hypothesis is consistent with a labeled sample if it correctly classifies every point in the sample. This is the realizability assumption at the sample level.

                          Inv assessment: Robust across PAC (ERM output), Gold (version space membership), and compression (reconstructed hypothesis). Inv = 0.8.

                          def IsConsistentWith (X : Type u) (Y : Type v) [DecidableEq Y] (h : Concept X Y) {m : } (S : Fin mX × Y) :

                          A hypothesis h is consistent with labeled sample S.

                          Equations
                          Instances For
                            theorem consistent_imp_zero_empiricalError (X : Type u) [MeasurableSpace X] (h c : Concept X Bool) {m : } (hm : 0 < m) (xs : Fin mX) (hcons : IsConsistentWith X Bool h fun (i : Fin m) => (xs i, c (xs i))) :
                            EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool) = 0

                            Consistency implies zero empirical 0-1 error.

                            structure IsFaithfulLoss {Y : Type v} [DecidableEq Y] (loss : LossFunction Y) :

                            A loss function is faithful if: loss(y,y) = 0 and loss(y₁,y₂) = 0 → y₁ = y₂. This ensures that zero empirical error ↔ consistency. A5-valid enrichment (Γ₃₉): adds structure to loss, doesn't simplify theorem.

                            • loss_self_zero (y : Y) : loss y y = 0

                              Matching predictions have zero loss

                            • loss_zero_imp_eq (y₁ y₂ : Y) : loss y₁ y₂ = 0y₁ = y₂

                              Zero loss implies matching predictions

                            Instances For

                              The 0-1 loss is faithful.

                              theorem empError_zero_iff_consistent {X : Type u} {Y : Type v} [DecidableEq Y] (h : Concept X Y) {m : } (hm : 0 < m) (S : Fin mX × Y) (loss : LossFunction Y) (hfaith : IsFaithfulLoss loss) (hloss_nonneg : ∀ (y₁ y₂ : Y), 0 loss y₁ y₂) :
                              EmpiricalError X Y h S loss = 0 IsConsistentWith X Y h S

                              EmpiricalError with a faithful loss is zero iff consistent.

                              theorem erm_consistent_realizable (X : Type u) [MeasurableSpace X] [DecidableEq Bool] (H : HypothesisSpace X Bool) (C : ConceptClass X Bool) (loss : LossFunction Bool) (hfaith : IsFaithfulLoss loss) (hloss_nonneg : ∀ (y₁ y₂ : Bool), 0 loss y₁ y₂) (hne : Set.Nonempty H) (hreal : C H) (c : Concept X Bool) (hcC : c C) {m : } (S : Fin mX) :
                              IsConsistentWith X Bool (ermLearn X Bool H loss hne fun (i : Fin m) => (S i, c (S i))) fun (i : Fin m) => (S i, c (S i))

                              Concentration Infrastructure (Bridge to Zhang's SubGaussian/EfronStein) #

                              The concentration inequalities needed for UC come from three layers:

                              1. McDiarmid (bounded differences) → tail bounds for empirical error of a SINGLE h
                              2. Union bound over GrowthFunction-many effective hypotheses
                              3. Sauer-Shelah to control GrowthFunction by VCDim

                              This section adds the Lean4 infrastructure that BRIDGES our types to the concentration lemma types. The actual concentration lemmas are proved in:

                              def HasBoundedDifferences {X : Type u} {m : } (f : (Fin mX)) (c : ) :

                              A function f : (Fin m → X) → ℝ has bounded differences if changing any one coordinate changes f by at most c. This is the hypothesis for McDiarmid's inequality. Prior art: Zhang's efronStein gives Var(f) ≤ Σ cᵢ² under this condition.

                              Equations
                              Instances For
                                theorem empiricalError_bounded_diff {X : Type u} [MeasurableSpace X] (h c : Concept X Bool) (m : ) (hm : 0 < m) :
                                HasBoundedDifferences (fun (xs : Fin mX) => EmpiricalError X Bool h (fun (i : Fin m) => (xs i, c (xs i))) (zeroOneLoss Bool)) (1 / m)

                                EmpiricalError of a fixed hypothesis h is a bounded-difference function of the sample, with constant 1/m. Changing one sample point changes the average loss by at most 1/m (since each term is in [0,1] for zeroOneLoss).

                                theorem prob_compl_ge_of_le {α : Type u_1} [MeasurableSpace α] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] (s : Set α) (hs : MeasurableSet s) (δ : ) ( : 0 < δ) (_hδ1 : δ 1) (hbound : μ s ENNReal.ofReal δ) :
                                μ s ENNReal.ofReal (1 - δ)

                                Complement probability bound: if μ(bad) ≤ ofReal δ with 0 < δ ≤ 1 and μ is a probability measure, then μ(good) ≥ ofReal(1-δ) where good = compl(bad).

                                theorem consistent_tail_bound {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] (h c : Concept X Bool) (m : ) (ε : ) ( : 0 < ε) (hε1 : ε 1) (herr : D {x : X | h x c x} ENNReal.ofReal ε) (hmeas : MeasurableSet {x : X | h x c x}) :
                                (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | ∀ (i : Fin m), h (xs i) = c (xs i)} ENNReal.ofReal ((1 - ε) ^ m)

                                One-sided Hoeffding: for a fixed h with TrueError(h,c,D) = p > ε, the probability that h is consistent on all m IID samples is ≤ (1-ε)^m.

                                This is the Google formal-ml route (pac_finite_bound, nnreal_exp_bound2). Each sample point xᵢ ∈ {x | h(x) = c(x)} with probability 1-p ≤ 1-ε. IID: Pr[all m correct] = (1-p)^m ≤ (1-ε)^m.

                                Mathlib: Real.one_sub_div_pow_le_exp_neg gives (1-ε)^m ≤ exp(-εm).

                                This is SIMPLER than McDiarmid (one-sided, no expectation computation).

                                theorem growth_function_cover {X : Type u} [MeasurableSpace X] (D : MeasureTheory.Measure X) (C : ConceptClass X Bool) (c : Concept X Bool) (hcC : c C) (m : ) (ε : ) (xs : Fin mX) (hGF : 0 < GrowthFunction X C m) :
                                ∃ (n : ) (_ : n GrowthFunction X C m) (reps : Fin nConcept X Bool), (∀ (j : Fin n), reps j C) hC, (∀ (i : Fin m), h (xs i) = c (xs i))D {x : X | h x c x} > ENNReal.ofReal ε∃ (j : Fin n), ∀ (i : Fin m), reps j (xs i) = c (xs i)

                                Sample-dependent covering lemma: for a FIXED sample xs : Fin m → X, the set of "bad" hypotheses in C (those consistent with c on xs but with true error > ε) can be covered by at most GrowthFunction(C,m) representative hypotheses.

                                The key insight: for a fixed sample xs, the consistency condition ∀ i, h(xs i) = c(xs i) depends only on h's restriction to {xs i | i}. Two hypotheses with the same restriction produce identical consistency predicates on xs. The number of distinct restrictions of C to any m-point set is at most GrowthFunction(C,m) by definition.

                                This is the SAMPLE-DEPENDENT version (Γ₆₅-fix): representatives are chosen per-sample, which is the standard approach in PAC learning proofs. The previous sample-independent version was unprovable for infinite concept classes.

                                theorem pow_mul_exp_neg_le_factorial_div {d : } {t : } (ht : 0 < t) :
                                t ^ d * Real.exp (-t) (d + 1).factorial / t

                                Key arithmetic lemma for PAC bound: for t > 0, t^d * exp(-t) ≤ (d+1)!/t. Follows from exp(t) ≥ t^(d+1)/(d+1)! (partial sum of Taylor series).

                                theorem vcdim_finite_imp_growth_bounded (X : Type u) (C : ConceptClass X Bool) (hC : VCDim X C < ) :
                                ∃ (d : ), ∀ (m : ), d mGrowthFunction X C m iFinset.range (d + 1), m.choose i

                                VCDim < ⊤ → growth function polynomially bounded by partial binomial sum. Forward direction of fundamental_theorem conjunct 5. Uses Sauer-Shelah: GrowthFunction(m) ≤ ∑_{i≤d} C(m,i) where d = VCDim.

                                UniformConvergence: The bridge from VCDim to PAC #

                                Uniform convergence is the key property that makes finite VCDim imply PAC learnability. It says: with high probability over an iid sample, the empirical error of EVERY hypothesis in H is close to its true error.

                                ∀ ε > 0, ∃ m₀, ∀ m ≥ m₀, D^m { xs | ∀ h ∈ H, |TrueError(h) - EmpError(h)| < ε } ≥ 1 - δ

                                This is STRONGER than PAC learnability (which only needs the ERM hypothesis to be good). Uniform convergence → PAC learnability (via ERM). The converse fails in general (agnostic PAC ≠ uniform convergence).

                                HC at this joint: The quantifier structure is critical. PACLearnable has ∃ L, ∀ D, ∀ c ∈ C, while UniformConvergence has ∀ D, ∀ h ∈ H. The universal quantifier over h IN the probability event (inside the measure) makes uniform convergence strictly stronger.

                                Counterdefinition (COUNTER-3): If we need a non-iid variant: Replace Measure.pi with a general product measure (martingale convergence). Swap condition: Online-to-batch conversion proofs or non-iid PAC extensions.

                                KU₈: The definition below uses TrueError (ENNReal) and requires converting to ℝ for the absolute value. Is there a cleaner formulation in pure ENNReal? UK₃: Uniform convergence over UNCOUNTABLE hypothesis classes requires measurability of the supremum — a deep issue in empirical process theory. What is the Lean4 type-theoretic status of this?

                                Uniform convergence of empirical error to true error over a hypothesis class. This is the property that makes finite VCDim → PAC learnability work. BP₅ connects here: this is ONE of the five characterizations.

                                M-DefinitionRepair (Γ₃₅ → Γ₄₁): The m₀ must be INDEPENDENT of D and c. That's what "uniform" means — convergence is uniform over all distributions and all target concepts. The original definition had m₀ depending on D and c, making uc_imp_pac unprovable (PACLearnable's mf must be independent of D, c). Repaired: ∃ m₀ is now BEFORE ∀ D, ∀ c. This STRENGTHENS the definition (A5-valid: adds content, doesn't simplify).

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Quantitative uniform convergence: with explicit sample complexity bound. m ≥ (8/ε²)(d·ln(2em/d) + ln(4/δ)) suffices for VC classes of dimension d. KU₉: The exact constant depends on the proof technique (symmetrization, chaining, Dudley entropy integral). Which gives the tightest bound?

                                  Instances For
                                    theorem uc_imp_pac (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) (hC : Set.Nonempty C) (hUC : HasUniformConvergence X C) :

                                    Uniform convergence implies PAC learnability via ERM. The ERM learner (which exists by ermLearn) achieves PAC learning when uniform convergence holds. This is the second half of vcdim_finite_imp_pac.

                                    Concentration Inequality Bridge #

                                    Mathlib provides the exponential inequality chain:

                                    These dissolve the K4 obstruction. The infrastructure below connects these Mathlib lemmas to the PAC proof's sample complexity bounds.

                                    KU₁₃: Hoeffding's inequality for sums of bounded random variables is NOT in Mathlib as of 2026-03. But measure_sum_ge_le_of_iIndepFun provides the core concentration bound for independent random variables. What is the gap between what Mathlib provides and what we need?

                                    UK₆: The PAC proof's concentration step requires a chain: (1) Sauer-Shelah gives growth function bound (2) Union bound over growth function many effective hypotheses (3) Concentration for each fixed hypothesis (Hoeffding or Chebyshev) (4) Combine via (2) and (3) The union bound step (2) is trivial. Step (3) is where Mathlib helps. The question is whether step (3) needs Hoeffding specifically or whether Chebyshev + Sauer-Shelah polynomial bound suffices.

                                    noncomputable def PACsampleComplexity (d : ) (ε δ : ) :

                                    Sample complexity for PAC learning with VCDim = d. The standard bound: m ≥ (C/ε)(d · log(1/ε) + log(1/δ)) for a universal constant C. KU₁₄: The exact constant C depends on the proof technique. Symmetrization gives C = 8, chaining gives C = 4. UK₇: Is there a proof-theoretic reason to prefer one constant over another?

                                    Equations
                                    Instances For
                                      theorem pac_sample_complexity_pos (d : ) (ε δ : ) ( : 0 < ε) (hε1 : ε 1) ( : 0 < δ) (hδ1 : δ 1) (hd : 0 < d) :

                                      The sample complexity bound is positive for valid parameters. This is a prerequisite for all PAC bounds.

                                      Gold identification does NOT imply PAC learnability. There exist concept classes that are EX-learnable (identifiable in the limit) but not PAC-learnable (no finite sample suffices for (ε,δ) bounds). Example: the class of all computable functions is EX-learnable but has VCDim = ∞ (hence not PAC-learnable). This is the PAC/Gold paradigm separation — HC > 0 at this joint.

                                      noncomputable def Regret (X : Type u) (Y : Type v) (L : OnlineLearner X Y) (H : HypothesisSpace X Y) (seq : X × Y) (T : ) (loss : LossFunction Y) :

                                      Regret: cumulative excess loss of online learner vs best fixed hypothesis.

                                      Equations
                                      Instances For

                                        NFL Counting Infrastructure for vcdim_infinite_not_pac #

                                        The core counting argument: for any hypothesis h on a finite set, the average number of disagreements with a uniformly random labeling is exactly half the set size. This implies existence of a labeling with many disagreements, which drives the NFL/PAC lower bound proofs.

                                        theorem disagreement_sum_eq {α : Type u_1} [Fintype α] [DecidableEq α] (h : αBool) :
                                        f : αBool, {x : α | f x h x}.card = Fintype.card α * 2 ^ (Fintype.card α - 1)

                                        For any h : α → Bool on a Fintype, the sum over all functions f : α → Bool of #{x | f x ≠ h x} equals |α| * 2^(|α| - 1). This is the key counting identity for the NFL theorem: each point x contributes 2^(|α|-1) to the sum (exactly half the functions disagree with h at x).

                                        theorem exists_many_disagreements {α : Type u_1} [Fintype α] [DecidableEq α] (h : αBool) (hcard : 2 Fintype.card α) :
                                        ∃ (f : αBool), Fintype.card α < 4 * {x : α | f x h x}.card

                                        Pigeonhole consequence: for any h on a Fintype with card ≥ 2, there exists a function f disagreeing with h on more than |α|/4 points. This is the per-sample NFL counting lemma.

                                        theorem agreement_count_markov {α : Type u_1} [Fintype α] [DecidableEq α] (h : αBool) (hn : 1 Fintype.card α) :
                                        4 * {f : αBool | {x : α | f x h x}.card Fintype.card α / 4}.card < 3 * 2 ^ Fintype.card α

                                        Markov-type bound on the number of labelings with few disagreements. For any h : α → Bool on a Fintype with |α| ≥ 1: 4 · #{f : #{x | f x ≠ h x} ≤ |α|/4} < 3 · 2^|α|. This is the core of the double-averaging argument for NFL/PAC lower bounds. Proof: Markov's inequality on agreements, using disagreement_sum_eq.

                                        Uniform Measure Infrastructure #

                                        For NFL and PAC lower bound proofs, we need uniform probability measures on finite sets. Given a Finset S ⊆ X with |S| > 0, the uniform measure on S is (1/|S|) · Σ_{x ∈ S} δ_x.

                                        This is a special case of EmpiricalMeasure where all sample points are distinct. The key property: IsProbabilityMeasure for the uniform measure on a nonempty finite set.

                                        KU₁₉ (from Google formal-ml): Google uses a bespoke probability_space wrapper. We use Mathlib's MeasureTheory.Measure directly. The uniform measure construction needs Measure.count normalized by Fintype.card, or a manual Dirac sum.

                                        noncomputable def uniformMeasure (X : Type u) [MeasurableSpace X] [Fintype X] (_hne : Nonempty X) :

                                        Uniform probability measure on a Fintype: (1/|X|) · count. This gives each point probability 1/|X|. Requires |X| > 0 (nonempty).

                                        Equations
                                        Instances For

                                          The uniform measure is a probability measure when X is nonempty and finite.

                                          theorem nfl_core (X : Type u) [MeasurableSpace X] [Fintype X] [MeasurableSingletonClass X] (hX : 2 Fintype.card X) (m : ) (hm : 2 * m Fintype.card X) (L : BatchLearner X Bool) :
                                          ∃ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D ∃ (c : XBool), (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | D {x : X | L.learn (fun (i : Fin m) => (xs i, c (xs i))) x c x} > ENNReal.ofReal (1 / 8)} > 0

                                          NFL core: for any learner and any finite domain, there exists a hard distribution and concept. Factors through uniformMeasure construction. This is the core argument used by both nfl_fixed_sample and pac_lower_bound.

                                          theorem pac_lower_bound_core (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (C : ConceptClass X Bool) (d : ) (hd_pos : 1 d) (hd : VCDim X C = d) (ε : ) ( : 0 < ε) (hε1 : ε 1 / 4) (L : BatchLearner X Bool) (mf : ) :
                                          (∀ (δ : ), 0 < δδ 1∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure DcC, have m := mf ε δ; (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | D {x : X | L.learn (fun (i : Fin m) => (xs i, c (xs i))) x c x} ENNReal.ofReal ε} ENNReal.ofReal (1 - δ))(d - 1) / 2⌉₊ mf ε (1 / 7)

                                          PAC lower bound core: sample complexity is at least (d-1)/2. For any PAC learner with VCDim = d, at least ⌈(d-1)/2⌉ samples needed. Proof: construct d shattered points, uniform distribution, counting argument. Note: the tight constant is (d-1)/(2ε) (EHKV 1989); see EHKV.lean.

                                          theorem compress_injective_on_labelings {X : Type u} {n : } {C : ConceptClass X Bool} (cs : CompressionScheme X Bool C) (pts : Fin nX) (_hpts : Function.Injective pts) (f g : Fin nBool) (hf_real : cC, ∀ (i : Fin n), c (pts i) = f i) (hg_real : cC, ∀ (i : Fin n), c (pts i) = g i) (hfg : (cs.compress fun (i : Fin n) => (pts i, f i)) = cs.compress fun (i : Fin n) => (pts i, g i)) :
                                          f = g

                                          Pigeonhole core: compress is injective on C-realizable labelings. If two C-realizable samples over the same points with different labelings produce the same compressed set, correctness forces the labelings to agree. Γ₇₃: now requires realizability hypotheses for both samples.

                                          theorem compression_imp_vcdim_finite (X : Type u) (C : ConceptClass X Bool) (hcomp : ∃ (k : ) (cs : CompressionScheme X Bool C), cs.size = k) :
                                          VCDim X C <

                                          ∃ compression scheme → VCDim < ⊤. Pigeonhole: compress is injective on C-realizable labelings (by correctness), but compressed subsets of an n-point sample are bounded. Shatters X C T guarantees ALL labelings are C-realizable, so injectivity holds on all 2^n labelings. Contradiction for large n. Γ₇₃ RESOLVED: CompressionScheme parameterized by C with realizability guard. Shattered sets guarantee C-realizability of every labeling, so the pigeonhole argument is genuinely non-vacuous.

                                          theorem growth_bounded_imp_vcdim_finite (X : Type u) (C : ConceptClass X Bool) (hgrowth : ∃ (d : ), ∀ (m : ), d mGrowthFunction X C m iFinset.range (d + 1), m.choose i) :
                                          VCDim X C <

                                          Growth function polynomially bounded → VCDim < ⊤. Reverse direction: if GrowthFunction m ≤ ∑_{i≤d} C(m,i) for all m ≥ d, then VCDim ≤ d (otherwise GrowthFunction = 2^m for m = VCDim > d).

                                          theorem pac_lower_bound_member (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (C : ConceptClass X Bool) (d : ) (hd : VCDim X C = d) (ε δ : ) (_hε : 0 < ε) (hε1 : ε 1 / 4) ( : 0 < δ) (_hδ1 : δ 1) (hδ2 : δ 1 / 7) (hd_pos : 1 d) (m : ) (hm : m {m : | ∃ (L : BatchLearner X Bool), ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure DcC, (MeasureTheory.Measure.pi fun (x : Fin m) => D) {xs : Fin mX | D {x : X | L.learn (fun (i : Fin m) => (xs i, c (xs i))) x c x} ENNReal.ofReal ε} ENNReal.ofReal (1 - δ)}) :
                                          (d - 1) / 2⌉₊ m

                                          PAC lower bound membership: if m achieves PAC for C with VCDim = d, then m ≥ ⌈(d-1)/(64ε)⌉. This is the core adversarial counting argument factored for PAC.lean assembly. Note: the tight constant is (d-1)/(2ε) (EHKV 1989); see EHKV.lean.

                                          Proof route (double-averaging on shattered set):

                                          1. VCDim = d → ∃ shattered S with |S| = d
                                          2. D = uniform on S (probability measure, each point has weight 1/d)
                                          3. m < ⌈(d-1)/(64ε)⌉ → 2m < d → NFL counting applies
                                          4. Double-averaging over 2^d labelings: E_f[E_xs[error]] ≥ (d-m)/(2d) > 1/4
                                          5. Reversed Markov: ∃ c₀ ∈ C with Pr[error ≤ 1/8] ≤ 6/7
                                          6. For ε ≤ 1/8: Pr[error ≤ ε] ≤ 6/7 = 1 - 1/7, contradicting PAC
                                          def block_extract {α : Type u_1} (k m : ) (S : Fin (k * m)α) (j : Fin k) :
                                          Fin mα

                                          Extract block j from a flat array of k*m elements, using finProdFinEquiv.

                                          Equations
                                          Instances For
                                            def majority_vote (k : ) (votes : Fin kBool) :

                                            Boolean majority vote: returns true iff strictly more than half the votes are true.

                                            Equations
                                            Instances For
                                              theorem block_extract_disjoint (k m : ) (j₁ j₂ : Fin k) (hne : j₁ j₂) :

                                              Block index sets are disjoint for distinct blocks.

                                              theorem block_extract_measurable {X : Type u_1} [MeasurableSpace X] (k m : ) (j : Fin k) :
                                              Measurable fun (ω : Fin (k * m)X) => block_extract k m ω j

                                              Block extraction is measurable: extracting block j from a pi-type is measurable.

                                              theorem iIndepFun_block_extract {X : Type u_1} [MeasurableSpace X] (k m : ) (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] :
                                              ProbabilityTheory.iIndepFun (fun (j : Fin k) (ω : Fin (k * m)X) => block_extract k m ω j) (MeasureTheory.Measure.pi fun (x : Fin (k * m)) => D)

                                              Block extractions are independent under the product measure. Key infrastructure for boosting (D4) and probability amplification.

                                              theorem shatters_realize {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) (f : TBool) :
                                              cC, ∀ (x : T), c x = f x

                                              Shattering lifting: if T is shattered by C and f : ↥T → Bool, then there exists c ∈ C that agrees with f on all of T.

                                              theorem shatters_hard_labeling {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) (h : TBool) (hcard : 2 T.card) :
                                              cC, T.card < 4 * {x : T | c x h x}.card

                                              Key counting lemma: for any h : ↥T → Bool on a shattered set T with |T| ≥ 2, there exists c ∈ C with #{x ∈ T | c x ≠ h x} > |T|/4. Lifts exists_many_disagreements through shattering.

                                              theorem nfl_per_sample_shattered {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) {m : } (hTcard : 2 * m < T.card) (xs : Fin mX) (h : XBool) :
                                              cC, (∀ (i : Fin m), xs i Tc (xs i) = h (xs i)) T.card < 4 * {xT | c x h x}.card

                                              NFL per-sample lemma for shattered sets: for ANY fixed sample xs and ANY hypothesis h, there exists c ∈ C agreeing with h on sample points but with high error (> 1/4) on the shattered set T. Uses the counting argument on unseen points via disagreement_sum_eq.

                                              If VCDim = ⊤, then C is not PAC learnable. Proof: for any learner L with sample function mf, pick ε = 1/4, δ = 1/4. Let m = mf(1/4, 1/4). Since VCDim = ⊤, ∃ shattered set S with |S| ≥ 2m. Put D = uniform on S. For random labeling, any m-sample learner has expected error ≥ 1/4 on unseen points. This is the core of pac_imp_vcdim_finite (contrapositive direction).

                                              @[reducible, inline]
                                              abbrev DriftRate :

                                              Drift rate: how fast the target concept changes over time.

                                              Equations
                                              Instances For