Data Presentations #
The interfaces through which learners receive data. The three main paradigms (PAC, Online, Gold) present data in fundamentally different ways:
- PAC: i.i.d. sample from a distribution (IIDSample)
- Gold: infinite stream enumerating the concept (DataStream, TextPresentation)
- Online: adversary-chosen sequence (no distributional assumptions)
These are typed separately because no common interface captures all three without losing the structural properties that theorems depend on.
Also includes query-learning interfaces (MembershipOracle, EquivalenceOracle), noisy data, and advice.
Time and Streams #
A data stream: an infinite sequence of labeled examples. Primary data interface for Gold-style learning. The stream is an enumeration - it must eventually cover the relevant domain.
The stream of examples at each time step
Instances For
Text presentation: a stream of POSITIVE examples only. For a concept c : X → Bool, a text for c enumerates all x with c(x) = true. Used in Gold's theorem: identification from text.
Every element in the stream is a positive example
Every element in the stream satisfies c
Every positive element eventually appears
Instances For
Informant presentation: a stream of (example, membership) pairs covering all of X. The learner sees both positive and negative examples. Strictly more informative than text presentation.
Labels are correct: the stream tells the truth about membership
Every element of X eventually appears
Instances For
IID Samples (PAC paradigm) #
This is where the PAC/Gold break manifests at the data level. IID samples are meaningless in the Gold setting (no distribution). IID samples are irrelevant in the online setting (adversarial).
An i.i.d. sample from a distribution over X. This is the data interface for PAC and statistical learning. Requires measure theory - MeasurableSpace on X.
- distribution : MeasureTheory.Measure (X × Y)
The underlying distribution over labeled examples
- isProbability : MeasureTheory.IsProbabilityMeasure self.distribution
The distribution is a probability measure
- sampleSize : ℕ
Sample size
- sample : Fin self.sampleSize → X × Y
The sample itself: a function from index to labeled example
Instances For
All Bool-valued functions on X are measurable. Domain-level property: the σ-algebra is fine enough that concept measurability is never an issue.
- all_bool_measurable (f : X → Bool) : Measurable f
Instances
MeasurableBoolSpace implies MeasurableHypotheses for every C.
The marginal distribution over X (ignoring labels). Extracted from an IIDSample. Needed for generalization error definitions.
Equations
Instances For
Query Learning Interfaces #
Active learning paradigm: the learner ASKS questions rather than passively receiving data. These are function types - the oracle is a callable interface.
Membership oracle: given x, returns c(x). Used in exact learning (Angluin's framework) and active learning.
- query : X → Y
The oracle's response function
- target : Concept X Y
The target concept the oracle answers about
Oracle is truthful
Instances For
Equivalence oracle: given a hypothesis h, either confirms h = c or returns a counterexample x where h(x) ≠ c(x).
- target : Concept X Y
The target concept
Query: given hypothesis, either confirm or give counterexample
If query returns none, hypothesis equals target
If query returns some x, it's a genuine counterexample