Documentation

FLT_Proofs.Theorem.Extended

Extended Theorems #

Universal trichotomy, computational hardness, advice reduction, meta-PAC bound, and separation results for compression and SQ dimension.

Advice Elimination Infrastructure #

Joint measurability of a sample-dependent advice learner's evaluation map.

Equations
Instances For

    PAC learnability with finite advice, plus measurability for holdout validation.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      def adviceTrainSample {X : Type u} {m₁ m₂ : } (S : Fin (m₁ + m₂)X × Bool) :
      Fin m₁X × Bool

      First m₁ coordinates of a sample of size m₁ + m₂.

      Equations
      Instances For
        def adviceValSample {X : Type u} {m₁ m₂ : } (S : Fin (m₁ + m₂)X × Bool) :
        Fin m₂X × Bool

        Next m₂ coordinates of a sample of size m₁ + m₂.

        Equations
        Instances For
          noncomputable def bestAdvice {X : Type u} [MeasurableSpace X] {A : Type u_1} [Fintype A] [Nonempty A] (cand : AConcept X Bool) {m : } (Sval : Fin mX × Bool) :
          A

          Choose the advice value with minimum validation empirical error.

          Equations
          Instances For
            noncomputable def adviceSelectedHypothesis {X : Type u} [MeasurableSpace X] {A : Type u_1} [Fintype A] [Nonempty A] (LA : LearnerWithAdvice X Bool A) {m₁ m₂ : } (S : Fin (m₁ + m₂)X × Bool) :

            The advice-elimination learner applied to a labeled sample.

            Equations
            Instances For

              Advice elimination (Ben-David & Dichterman 1998): If C is PAC-learnable with concept-dependent advice from a FINITE set A (with measurability regularity), then C is PAC-learnable without advice.

              Proof strategy: run the advice-augmented learner with each a ∈ A on a training portion of the sample, producing |A| candidate hypotheses. Use a validation portion to select the candidate with lowest empirical error. Union bound over |A| advice values + Hoeffding on validation controls total failure probability. Sample complexity: O(m_orig(ε/2, δ/(2|A|)) + log(|A|/δ)/ε²).

              The [Fintype A] constraint is essential: for infinite A, the theorem is false (no finite union bound). [Nonempty A] ensures the advice space is inhabited.

              theorem meta_pac_bound (X : Type u) [MeasurableSpace X] (_ML : MetaLearner X Bool) (numTasks : ) (_tasks : Fin numTasksConceptClass X Bool) (ε δ : ) (_hε : 0 < ε) (_hδ : 0 < δ) :
              ∃ (t₀ : ), t₀ numTasks∀ (C_new : ConceptClass X Bool), VCDim X C_new < ∃ (mf : ), ∀ (ε' δ' : ), 0 < ε'0 < δ'mf ε' δ' SampleComplexity X C_new ε' δ'

              Meta-PAC bound: after seeing enough tasks, the meta-learner's output learner generalizes to new tasks from the same environment. The meta-learner's sample complexity over tasks is bounded by a function of ε, δ, and the complexity of the task environment.

              Multi-Task Meta-Learning Infrastructure #

              structure TaskEnvironment (X : Type u) :

              A task environment: a finite collection of concept classes (tasks) that a meta-learner is trained on. Each task is a concept class over the same domain X.

              This is the formalization of Baxter (2000)'s "learning environment." In the full theory, tasks are drawn i.i.d. from a distribution over concept classes; here we use a finite deterministic collection as the base case.

              Instances For
                structure MetaLearnerPAC (X : Type u) [MeasurableSpace X] :

                A meta-learner with PAC guarantees: given a task environment (training tasks), produces a BatchLearner and sample complexity function for new tasks.

                Compared to MetaLearner (in Active.lean), this structure:

                • takes a TaskEnvironment (multiple training tasks) rather than a single ConceptClass
                • exposes the sample complexity function (not just the learner)
                • is designed for quantitative PAC bounds, not just learnability

                The key question: does seeing n training tasks reduce the per-task sample complexity on new tasks? Baxter (2000) shows the answer is yes under task similarity, but the NFL lower bound still applies per-task.

                Instances For

                  A task sample environment: n training tasks, each with m samples. The meta-learner observes labeled samples from each task and must produce a learner for a new (unseen) task.

                  This extends TaskEnvironment by specifying sample sizes and the actual samples drawn. The meta-learner's output may depend on the samples but not on the true concepts.

                  Instances For
                    structure SampleMetaLearner (X : Type u) [MeasurableSpace X] :

                    A sample-based meta-learner: sees labeled samples from n training tasks, produces a BatchLearner for new tasks. Unlike MetaLearnerPAC (which takes a TaskEnvironment directly), this meta-learner only sees the data, not the concept classes.

                    • learn {n m : } : (Fin nFin mX × Bool)BatchLearner X Bool

                      Given n × m labeled samples, produce a learner

                    • sampleComplexity :

                      Given n × m, produce sample complexity for the new task

                    Instances For
                      theorem baxter_base_case (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (ML : MetaLearnerPAC X) (env : TaskEnvironment X) (C_new : ConceptClass X Bool) (d : ) (hd : VCDim X C_new = d) (hd_pos : 1 d) (ε δ : ) ( : 0 < ε) (hε1 : ε 1 / 4) ( : 0 < δ) (hδ1 : δ 1) (hδ2 : δ 1 / 7) (hPAC : ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure DcC_new, (MeasureTheory.Measure.pi fun (x : Fin (ML.sampleComplexity env ε δ)) => D) {xs : Fin (ML.sampleComplexity env ε δ)X | D {x : X | (ML.learn env).learn (fun (i : Fin (ML.sampleComplexity env ε δ)) => (xs i, c (xs i))) x c x} ENNReal.ofReal ε} ENNReal.ofReal (1 - δ)) :
                      (d - 1) / 2⌉₊ ML.sampleComplexity env ε δ

                      Baxter base case: any meta-learner's output is subject to the NFL lower bound. Even after seeing arbitrarily many training tasks, the meta-learner's output learner on a NEW task C_new with VCDim = d requires at least ⌈(d-1)/2⌉ samples.

                      This is the n=1 (single environment) base case of Baxter (2000). The full Baxter bound (n environments, per-task m ≥ d/(ε²·n)) requires multi-environment product measure infrastructure not yet built.

                      Proof: the meta-learner produces a BatchLearner L and sample complexity mf. If (L, mf) achieves PAC on C_new, then mf ε δ is a PAC-valid sample size, so pac_lower_bound_member gives ⌈(d-1)/2⌉ ≤ mf ε δ.

                      TODO: Strengthen to full Baxter bound with n training tasks giving per-task improvement m ≥ Ω(d/(ε²·n)). Requires TaskEnvironment distribution

                      • multi-task product measure infrastructure.
                      theorem baxter_full (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (SML : SampleMetaLearner X) (C_new : ConceptClass X Bool) (d : ) (hd : VCDim X C_new = d) (hd_pos : 1 d) (ε δ : ) ( : 0 < ε) (hε1 : ε 1 / 4) ( : 0 < δ) (hδ1 : δ 1) (hδ2 : δ 1 / 7) (n m : ) (training_data : Fin nFin mX × Bool) (hPAC : ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure DcC_new, have mf := SML.sampleComplexity n m ε δ; (MeasureTheory.Measure.pi fun (x : Fin mf) => D) {xs : Fin mfX | D {x : X | (SML.learn training_data).learn (fun (i : Fin mf) => (xs i, c (xs i))) x c x} ENNReal.ofReal ε} ENNReal.ofReal (1 - δ)) :
                      (d - 1) / 2⌉₊ SML.sampleComplexity n m ε δ

                      Baxter's multi-task lower bound: any sample-based meta-learner that achieves PAC on a new task C_new with VCDim = d, after seeing n training tasks with m samples each, requires ⌈(d-1)/2⌉ samples for the new task.

                      This is the n-independent version. The n-dependent improvement m ≥ Ω(d/(ε²·n)) requires the product-measure information-theoretic argument.

                      Key insight: the meta-learner's output (L, mf) is a PAC witness for C_new. By pac_lower_bound_member, any PAC witness requires at least ⌈(d-1)/2⌉ samples. The meta-learner's training phase (seeing n tasks) cannot reduce this bound because the new task's concept class is adversarially chosen AFTER training.

                      The n-dependent improvement (Baxter 2000, Theorem 3): For the PRODUCT measure over n tasks × m samples per task, the adversary argument gives m ≥ Ω(d/(ε²·n)). This requires:

                      • TaskDistribution: a measure over concept classes
                      • Product measure: D^(n×m) decomposed as (D^m)^n
                      • Information-theoretic counting: n·m bits vs 2^d labelings These are future infrastructure targets. The current theorem proves the n-INDEPENDENT base case which is already non-trivial.
                      theorem vcdim_not_implies_hardness :
                      ∃ (X : Type) (x : MeasurableSpace X) (C : ConceptClass X Bool), VCDim X C < ∃ (D : MeasureTheory.Measure X) (x_1 : MeasureTheory.IsProbabilityMeasure D) (τ : ), 0 < τ SQDimension X C D τ =

                      VC dimension does not determine SQ hardness: there exists a concept class with finite VC dimension but infinite SQ dimension under some distribution at some positive correlation threshold. Witness: singleton indicators on ℕ, with SQDimension = ⊤ at τ = 1. For any probability D on ℕ, the correlation between distinct indicators 1_i, 1_j is |1 - 2(D({i}) + D({j}))| ≤ 1, so every finite subset of C qualifies at τ = 1. Since C is infinite, SQDimension = ⊤. M-DefinitionRepair (Γ₈₄): added MeasurableSpace, existential over D and τ. Previous statement had True placeholder due to missing SQDimension parameters.