Documentation

FLT_Proofs.Data

Data Presentations #

The interfaces through which learners receive data. The three main paradigms (PAC, Online, Gold) present data in fundamentally different ways:

These are typed separately because no common interface captures all three without losing the structural properties that theorems depend on.

Also includes query-learning interfaces (MembershipOracle, EquivalenceOracle), noisy data, and advice.

Time and Streams #

@[reducible, inline]
abbrev TimeStep :

Time index for sequential learning. Just ℕ.

Equations
Instances For
    structure DataStream (X : Type u) (Y : Type v) :
    Type (max u v)

    A data stream: an infinite sequence of labeled examples. Primary data interface for Gold-style learning. The stream is an enumeration - it must eventually cover the relevant domain.

    • observe : TimeStepX × Y

      The stream of examples at each time step

    Instances For
      structure TextPresentation (X : Type u) (c : XBool) extends DataStream X Bool :

      Text presentation: a stream of POSITIVE examples only. For a concept c : X → Bool, a text for c enumerates all x with c(x) = true. Used in Gold's theorem: identification from text.

      Instances For
        structure InformantPresentation (X : Type u) (c : XBool) extends DataStream X Bool :

        Informant presentation: a stream of (example, membership) pairs covering all of X. The learner sees both positive and negative examples. Strictly more informative than text presentation.

        Instances For
          structure NoisyDataStream (X : Type u) (Y : Type v) extends DataStream X Y :
          Type (max u v)

          Noisy input: a data stream where labels may be corrupted. Instance_of DataStream with noise model.

          • observe : TimeStepX × Y
          • trueLabel : TimeStepY

            The true labels (unobserved by learner)

          • noiseRate :

            Noise rate: probability of corruption at each step

          • noiseRate_pos : 0 self.noiseRate

            Noise rate is bounded

          • noiseRate_lt : self.noiseRate < 1 / 2
          Instances For

            IID Samples (PAC paradigm) #

            This is where the PAC/Gold break manifests at the data level. IID samples are meaningless in the Gold setting (no distribution). IID samples are irrelevant in the online setting (adversarial).

            structure IIDSample (X : Type u) (Y : Type v) [MeasurableSpace X] [MeasurableSpace Y] :
            Type (max u v)

            An i.i.d. sample from a distribution over X. This is the data interface for PAC and statistical learning. Requires measure theory - MeasurableSpace on X.

            Instances For

              All Bool-valued functions on X are measurable. Domain-level property: the σ-algebra is fine enough that concept measurability is never an issue.

              Instances
                @[instance 50]

                MeasurableBoolSpace implies MeasurableHypotheses for every C.

                noncomputable def IIDSample.marginalX {X : Type u} {Y : Type v} [MeasurableSpace X] [MeasurableSpace Y] (S : IIDSample X Y) :

                The marginal distribution over X (ignoring labels). Extracted from an IIDSample. Needed for generalization error definitions.

                Equations
                Instances For

                  Query Learning Interfaces #

                  Active learning paradigm: the learner ASKS questions rather than passively receiving data. These are function types - the oracle is a callable interface.

                  structure MembershipOracle (X : Type u) (Y : Type v) :
                  Type (max u v)

                  Membership oracle: given x, returns c(x). Used in exact learning (Angluin's framework) and active learning.

                  • query : XY

                    The oracle's response function

                  • target : Concept X Y

                    The target concept the oracle answers about

                  • truthful (x : X) : self.query x = self.target x

                    Oracle is truthful

                  Instances For
                    structure EquivalenceOracle (X : Type u) (Y : Type v) :
                    Type (max u v)

                    Equivalence oracle: given a hypothesis h, either confirms h = c or returns a counterexample x where h(x) ≠ c(x).

                    • target : Concept X Y

                      The target concept

                    • query : Concept X YOption X

                      Query: given hypothesis, either confirm or give counterexample

                    • correct_none (h : Concept X Y) : self.query h = noneh = self.target

                      If query returns none, hypothesis equals target

                    • correct_some (h : Concept X Y) (x : X) : self.query h = some xh x self.target x

                      If query returns some x, it's a genuine counterexample

                    Instances For
                      def Counterexample (X : Type u) (Y : Type v) (h c : Concept X Y) :

                      A counterexample is a domain element where the hypothesis disagrees with the target.

                      Equations
                      Instances For
                        def Advice (A : Type u_1) :
                        Type u_1

                        Advice: additional information given to a learner beyond the data. Used in advice_reduction theorems and learner_with_advice.

                        Equations
                        Instances For