PAC-Bayes Bounds #
McAllester's PAC-Bayes bound for finite hypothesis classes. The Gibbs learner draws h ~ Q (posterior) and classifies with h. The bound relates the Gibbs learner's true error to its empirical error plus a complexity term involving KL(Q‖P).
Main results #
pac_bayes_per_hypothesis: per-hypothesis Hoeffding with prior-weighted tailpac_bayes_all_hypotheses: simultaneous bound for all h via union boundpac_bayes_finite: the PAC-Bayes bound (Jensen over Q)
References #
- McAllester, "PAC-Bayesian Model Averaging", COLT 1999
- McAllester, "Simplified PAC-Bayesian Margin Bounds", COLT 2003
The Gibbs error: expected true error under posterior Q. E_{h~Q}[D{x | h(x) ≠ c(x)}].
Equations
- gibbsError Q hs c D = expectFinitePMF Q fun (h : H) => TrueErrorReal X (hs h) c D
Instances For
Empirical Gibbs error: expected empirical error under Q. E_{h~Q}[EmpErr(h, S)].
Equations
- gibbsEmpError Q hs c S = expectFinitePMF Q fun (h : H) => EmpiricalError X Bool (hs h) (fun (i : Fin m) => (S i, c (S i))) (zeroOneLoss Bool)
Instances For
Per-hypothesis Hoeffding with prior-weighted tail. For each h with prior weight P(h), the probability that TrueErr(h) exceeds EmpErr(h,S) + √(log(1/(P(h)·δ))/(2m)) is at most P(h)·δ.
This is Hoeffding's inequality with t = √(log(1/(P(h)·δ))/(2m)).
We require the bound parameter t ≤ 1 for Hoeffding's applicability.
Simultaneous PAC-Bayes bound: with probability ≥ 1-δ, ALL hypotheses h simultaneously satisfy TrueErr(h) ≤ EmpErr(h,S) + √(log(1/(P(h)·δ))/(2m)).
McAllester's PAC-Bayes bound (finite hypothesis class, union-bound version).
With probability ≥ 1-δ over the sample S of size m, for ALL posteriors Q:
E_{h~Q}[TrueErr(h)] ≤ E_{h~Q}[EmpErr(h,S)] + √((crossEntropyFinitePMF Q P + log(1/δ)) / (2m))
where crossEntropyFinitePMF Q P = ∑_h Q(h)·log(1/P(h)) = KL(Q‖P) + H(Q).
This is the union-bound version. The tight change-of-measure version replaces crossEntropyFinitePMF with klDivFinitePMF and adds log(m).
TODO: Prove the tight version via change of measure (Catoni 2007): E_Q[TrueErr] ≤ E_Q[EmpErr] + √((KL(Q‖P) + log(2√m/δ))/(2m))
Reference: McAllester, COLT 1999.