Basic Types for Formal Learning Theory #
Foundational vocabulary: domains, labels, concepts, concept classes, hypothesis spaces, inductive bias, version spaces, and loss functions. These are the base types that all learners, criteria, and complexity measures are defined in terms of.
Alternative definitions for ConceptClass and InductiveBias are provided as commented-out variants for different proof contexts (decidable, RE, measurable, multiclass, Bayesian).
Atomic Types #
A concept class is a set of concepts. Used by every paradigm, complexity measure, and criterion.
Primary definition: Set of functions. Used for PAC/agnostic PAC where concept classes are sets over which VC dimension, Rademacher complexity, covering numbers, etc. are measured. Alternative definitions below for contexts requiring decidability, enumerability, or measurability.
Equations
- ConceptClass X Y = Set (Concept X Y)
Instances For
Every concept in C is a measurable function. Krapp-Wirth precondition: Γ(h) ∈ Σ_Z for all h ∈ H.
- mem_measurable (h : Concept X Bool) : h ∈ C → Measurable h
Instances
A hypothesis space is a set of candidate concepts that the learner searches over. When H = C (the realizable case), every concept in the target class is available. When H ⊂ C or H ⊃ C, we are in the improper/agnostic regime.
Structurally identical to ConceptClass but semantically distinct: ConceptClass is the ground truth collection; HypothesisSpace is what the learner has access to.
Equations
- HypothesisSpace X Y = Set (Concept X Y)
Instances For
A single hypothesis - an element of the hypothesis space. Just a Concept by another name, but the semantic distinction matters: a hypothesis is a learner's GUESS, a concept is a ground truth.
Equations
- Hypothesis X Y = Concept X Y
Instances For
The target concept is the specific concept the learner is trying to identify/approximate. It's an element of the concept class.
Instances For
The proper learning flag: whether the learner's hypothesis space equals the concept class. Proper: H = C. Improper: H ⊃ C (learner can output hypotheses outside C). This distinction matters for computational hardness results (proper_improper_separation).
Instances For
Structured Types #
Inductive bias: a preference ordering or scoring over hypotheses. This is NOT just a set; it bundles a hypothesis space with a way to RANK hypotheses. In Bayesian learning: the prior distribution. In SRM: the complexity hierarchy. In NFL theorem: the object whose absence makes learning impossible.
- hypotheses : HypothesisSpace X Y
The hypothesis space the bias operates over
Preference score: lower is more preferred. Could be complexity, prior probability, etc.
- not_preferred (h : Concept X Y) : h ∉ self.hypotheses → ∀ h' ∈ self.hypotheses, self.preference h' ≤ self.preference h
Hypotheses outside the space are not preferred over those inside: everything outside H is penalized at least as much as anything inside H. This works for both MDL (description length) and Bayesian (-log prior) readings.
Instances For
Version space: the set of hypotheses consistent with all data seen so far. Central to Gold-style learning; the learner narrows the version space as data arrives, and convergence = version space shrinks to the target.
Defined using HypothesisSpace and a consistency predicate over data.
- hypotheses : HypothesisSpace X Y
The full hypothesis space
Data seen so far: list of (input, label) pairs
The consistent subset - hypotheses that agree with all data
Consistency condition: every consistent hypothesis is in H and agrees with data
- consistent_agrees (h : Concept X Y) : h ∈ self.consistent → ∀ p ∈ self.data, h p.1 = p.2
Instances For
Realizability Flags #
Realizability assumption: the target concept is in the hypothesis space. This is a Prop because it's a condition on the learning setup, not a type.
Equations
- Realizable X Y C H = (C ⊆ H)
Instances For
Agnostic setting: no assumption that target ∈ H. The learner competes against the best hypothesis in H.
Instances For
Loss Functions #
Loss functions bridge the domain/label types to ℝ for measuring error. They are universal across paradigms (PAC uses expected loss, online uses cumulative loss).
A loss function measures the discrepancy between a prediction and true label.
Equations
- LossFunction Y = (Y → Y → ℝ)
Instances For
The 0-1 loss for classification.