Documentation

FLT_Proofs.PureMath.FiniteVCApprox

Finite Sample Approximation for FinitePMF #

Infrastructure for approximating FinitePMFs by empirical distributions of finite samples.

Main results #

Expectations #

def trueExpectation {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : H) :

Expected value ∑ h, μ.prob h * f h of a real-valued test under a finitely supported distribution. The base expectation primitive of the approximation layer; specialised to indicator tests in boolTestExpectation.

Equations
Instances For
    def boolTestExpectation {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : HBool) :

    Expected value of a Bool-valued test under a finite distribution, via the indicator embedding if f h then 1 else 0. The central quantity of the finite-VC approximation layer: a TV bound on distributions translates to a uniform bound on test expectations via expectation_approx_of_tv.

    Equations
    Instances For
      theorem boolTestExpectation_nonneg {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : HBool) :

      A convex combination of values in {0, 1} is nonnegative.

      theorem boolTestExpectation_le_one {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : HBool) :

      A convex combination of values in {0, 1} is at most 1.

      Total Variation Distance #

      def tvDistance {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) :

      Total variation distance ∑ h, |μ.prob h - ν.prob h| between two finitely supported distributions, in its L1 form. (The probabilists' normalised TV is half of this; the L1 form is what composes directly with the bound in expectation_approx_of_tv and so is carried through the rest of the layer.) The approximation metric for finite VC approximation: two distributions close in tvDistance have nearly identical expectations on every Bool-valued test, which is what the compression argument needs to substitute an empirical distribution for the true one.

      Equations
      Instances For
        theorem tvDistance_nonneg {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) :
        0 tvDistance μ ν

        Total variation distance is nonnegative.

        theorem tvDistance_comm {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) :
        tvDistance μ ν = tvDistance ν μ

        Total variation distance is symmetric.

        theorem tvDistance_self {H : Type u_1} [Fintype H] (μ : FinitePMF H) :
        tvDistance μ μ = 0

        Total variation distance from a distribution to itself is zero.

        Key Approximation Lemma #

        theorem expectation_approx_of_tv {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) (f : HBool) (δ : ) ( : tvDistance μ ν δ) :

        Transfer principle. If tvDistance μ ν ≤ δ, then for every Bool-valued test f, |E_μ[f] - E_ν[f]| ≤ δ. This is the inequality that lets the approximate-minimax route to compression replace a target distribution by an empirical distribution: any high-probability event under μ is approximately high-probability under ν, with slack controlled by the TV distance.

        theorem tv_bound_implies_all_tests {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) (ε : ) ( : tvDistance μ ν ε) (tests : Finset (HBool)) (f : HBool) :

        Uniform variant of expectation_approx_of_tv: a single tvDistance bound suffices for every test in any finite family simultaneously, with no union bound. The form actually used by the compression characterisation, where the family is the restricted concept class.

        Expectation of empiricalPMF equals average #

        theorem boolTestExpectation_empirical_eq_avg {H : Type u_1} [Fintype H] [DecidableEq H] {T : } (hT : 0 < T) (hs : Fin TH) (f : HBool) :
        boolTestExpectation (empiricalPMF hT hs) f = (∑ t : Fin T, if f (hs t) = true then 1 else 0) / T

        Bridges the FinitePMF view and the sample-average view: the expectation of a Bool-valued test under the empirical PMF of a sample equals the sample average (1/T) ∑_t f (s_t). This lets the MWU updates and the approximation transfer principle live in the same distributional framework.

        Approximate Minimax Connection #

        theorem boolGamePayoff_eq_boolTestExpectation {R : Type u_1} [Fintype R] [DecidableEq R] {C : Type u_2} (M : RCBool) (p : FinitePMF R) (c : C) :
        boolGamePayoff M p c = boolTestExpectation p fun (r : R) => M r c

        Identifies the game-theoretic payoff (a row distribution against a fixed column in the Bool game) with the corresponding test expectation. The translation that lets the MWU regret bound be applied directly to the compression problem.

        VC Dimension for Boolean Function Families #

        def boolFamilyToFinsetFamily {H : Type u_1} [Fintype H] [DecidableEq H] (A : Finset (HBool)) :

        Maps a finite family of Bool-valued functions to its image as a family of accepting sets. The set-system view is what Mathlib's Finset.Shatters and Finset.vcDim consume, so this is the entry point from the function-class view to the combinatorial VC machinery.

        Equations
        Instances For
          noncomputable def Finset.boolVCDim {H : Type u_1} [Fintype H] [DecidableEq H] (A : Finset (HBool)) :

          VC dimension of a finite Bool-valued family, computed via the set-system image boolFamilyToFinsetFamily and Mathlib's Finset.vcDim. Declared noncomputable because the underlying vcDim is.

          Equations
          Instances For

            Note #

            The VC uniform approximation theorem is in FLT_Proofs.Complexity.FiniteSupportUC as finite_support_vc_approx, proved via the H ⊕ ℕ infinite-envelope route.