Separation Theorems #
These prove that the paradigms are genuinely different — the criteria do NOT imply each other.
Online learnable → PAC learnable. Γ₄₈: requires LittlestoneDim → VCDim bridge or online-to-batch conversion.
Joint measurability of a batch learner's evaluation map.
Equations
- LearnEvalMeasurable L = ∀ (m : ℕ), Measurable fun (p : (Fin m → X × Bool) × X) => L.learn p.1 p.2
Instances For
Universal learnable → PAC learnable. Proof sketch: UniversalLearnable gives learner L with rate → 0 and Pr[error ≤ rate(m)] ≥ 2/3. Two components:
- Event containment: rate(m) < ε ⟹ {error ≤ rate(m)} ⊆ {error ≤ ε} (monotonicity).
- Confidence boosting: 2/3 → 1-δ via median-of-means (Γ₆₇, sorry'd in boost_two_thirds_to_pac). Routes through boost_two_thirds_to_pac which encapsulates the Chernoff-based boosting.
EX does not imply PAC. Witness: X = ℕ, C = all indicator functions of finite subsets of ℕ. EX-learnable: learner outputs "true on everything seen so far" — converges on any text. Not PAC-learnable: VCDim = ⊤ (every finite set is shattered).
Online learning is strictly stronger than PAC learning.
EX learning is strictly stronger than finite learning.
Online-PAC-Gold three-way separation. Pl-REPAIR: first conjunct had [Fintype X] which made it false. Fixed to match pac_not_implies_online repair.