Finite Sample Approximation for FinitePMF #

Infrastructure for approximating FinitePMFs by empirical distributions of finite samples.

Main results #

expectation_approx_of_tv : TV bound implies test approximation
tv_bound_implies_all_tests : TV ≤ ε implies all tests ε-approximated
boolFamilyToFinsetFamily / Finset.boolVCDim : VC dimension for Boolean function families

Expectations #

def trueExpectation {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : H → ℝ) :

Expected value ∑ h, μ.prob h * f h of a real-valued test under a finitely supported distribution. The base expectation primitive of the approximation layer; specialised to indicator tests in boolTestExpectation.

Equations

trueExpectation μ f = ∑ h : H, μ.prob h * f h

Instances For

source

def boolTestExpectation {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : H → Bool) :

ℝ

Expected value of a Bool-valued test under a finite distribution, via the indicator embedding if f h then 1 else 0. The central quantity of the finite-VC approximation layer: a TV bound on distributions translates to a uniform bound on test expectations via expectation_approx_of_tv.

Equations

boolTestExpectation μ f = trueExpectation μ fun (h : H) => if f h = true then 1 else 0

Instances For

source

theorem boolTestExpectation_nonneg {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : H → Bool) :

0 ≤ boolTestExpectation μ f

A convex combination of values in {0, 1} is nonnegative.

source

theorem boolTestExpectation_le_one {H : Type u_1} [Fintype H] (μ : FinitePMF H) (f : H → Bool) :

boolTestExpectation μ f ≤ 1

A convex combination of values in {0, 1} is at most 1.

Total Variation Distance #

source

def tvDistance {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) :

ℝ

Total variation distance ∑ h, |μ.prob h - ν.prob h| between two finitely supported distributions, in its L1 form. (The probabilists' normalised TV is half of this; the L1 form is what composes directly with the bound in expectation_approx_of_tv and so is carried through the rest of the layer.) The approximation metric for finite VC approximation: two distributions close in tvDistance have nearly identical expectations on every Bool-valued test, which is what the compression argument needs to substitute an empirical distribution for the true one.

Equations

tvDistance μ ν = ∑ h : H, |μ.prob h - ν.prob h|

Instances For

source

theorem tvDistance_nonneg {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) :

0 ≤ tvDistance μ ν

Total variation distance is nonnegative.

source

theorem tvDistance_comm {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) :

tvDistance μ ν = tvDistance ν μ

Total variation distance is symmetric.

source

theorem tvDistance_self {H : Type u_1} [Fintype H] (μ : FinitePMF H) :

tvDistance μ μ = 0

Total variation distance from a distribution to itself is zero.

Key Approximation Lemma #

source

theorem expectation_approx_of_tv {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) (f : H → Bool) (δ : ℝ) (hδ : tvDistance μ ν ≤ δ) :

|boolTestExpectation μ f - boolTestExpectation ν f| ≤ δ

Transfer principle. If tvDistance μ ν ≤ δ, then for every Bool-valued test f, |E_μ[f] - E_ν[f]| ≤ δ. This is the inequality that lets the approximate-minimax route to compression replace a target distribution by an empirical distribution: any high-probability event under μ is approximately high-probability under ν, with slack controlled by the TV distance.

source

theorem tv_bound_implies_all_tests {H : Type u_1} [Fintype H] (μ ν : FinitePMF H) (ε : ℝ) (hε : tvDistance μ ν ≤ ε) (tests : Finset (H → Bool)) (f : H → Bool) :

f ∈ tests → |boolTestExpectation μ f - boolTestExpectation ν f| ≤ ε

Uniform variant of expectation_approx_of_tv: a single tvDistance bound suffices for every test in any finite family simultaneously, with no union bound. The form actually used by the compression characterisation, where the family is the restricted concept class.

Expectation of empiricalPMF equals average #

source

theorem boolTestExpectation_empirical_eq_avg {H : Type u_1} [Fintype H] [DecidableEq H] {T : ℕ} (hT : 0 < T) (hs : Fin T → H) (f : H → Bool) :

boolTestExpectation (empiricalPMF hT hs) f = (∑ t : Fin T, if f (hs t) = true then 1 else 0) / ↑T

Bridges the FinitePMF view and the sample-average view: the expectation of a Bool-valued test under the empirical PMF of a sample equals the sample average (1/T) ∑_t f (s_t). This lets the MWU updates and the approximation transfer principle live in the same distributional framework.

Approximate Minimax Connection #

source

theorem boolGamePayoff_eq_boolTestExpectation {R : Type u_1} [Fintype R] [DecidableEq R] {C : Type u_2} (M : R → C → Bool) (p : FinitePMF R) (c : C) :

boolGamePayoff M p c = boolTestExpectation p fun (r : R) => M r c

Identifies the game-theoretic payoff (a row distribution against a fixed column in the Bool game) with the corresponding test expectation. The translation that lets the MWU regret bound be applied directly to the compression problem.

VC Dimension for Boolean Function Families #

source

def boolFamilyToFinsetFamily {H : Type u_1} [Fintype H] [DecidableEq H] (A : Finset (H → Bool)) :

Finset (Finset H)

Maps a finite family of Bool-valued functions to its image as a family of accepting sets. The set-system view is what Mathlib's Finset.Shatters and Finset.vcDim consume, so this is the entry point from the function-class view to the combinatorial VC machinery.

Equations

boolFamilyToFinsetFamily A = Finset.image (fun (f : H → Bool) => {h : H | f h = true}) A

Instances For

source

noncomputable def Finset.boolVCDim {H : Type u_1} [Fintype H] [DecidableEq H] (A : Finset (H → Bool)) :

ℕ

VC dimension of a finite Bool-valued family, computed via the set-system image boolFamilyToFinsetFamily and Mathlib's Finset.vcDim. Declared noncomputable because the underlying vcDim is.

Equations

A.boolVCDim = (boolFamilyToFinsetFamily A).vcDim

Instances For

Note #

The VC uniform approximation theorem is in FLT_Proofs.Complexity.FiniteSupportUC as finite_support_vc_approx, proved via the H ⊕ ℕ infinite-envelope route.

Documentation

FLT_Proofs.PureMath.FiniteVCApprox

Finite Sample Approximation for FinitePMF #

Main results #

Expectations #

Total Variation Distance #

Key Approximation Lemma #

Expectation of empiricalPMF equals average #

Approximate Minimax Connection #

VC Dimension for Boolean Function Families #

Note #