Formal Learning Theory Kernel: Blueprint v1

3.2.3 Stage 3: Uniform Convergence Implies ERM Succeeds

Proposition 3.14 ERM is a PAC learner under uniform convergence

If \(\mathcal{H}\) has the uniform convergence property, then \(\operatorname{ERM}_\mathcal {H}\) is a PAC learner for \(\mathcal{H}\).

Proof

Suppose \(m \geq m_{\mathrm{UC}}(\varepsilon /2, \delta )\), so that with probability \(\geq 1 - \delta \),

\[ \sup _{h \in \mathcal{H}} |R_D(h) - \hat{R}_S(h)| \leq \varepsilon /2. \]

Let \(h_S = \operatorname{ERM}_\mathcal {H}(S)\). In the realizable case, \(\hat{R}_S(c^*) = 0\), so \(\hat{R}_S(h_S) = 0\) as well. Then:

\[ R_D(h_S) = \underbrace{(R_D(h_S) - \hat{R}_S(h_S))}_{\leq \varepsilon /2} + \underbrace{\hat{R}_S(h_S)}_{= 0} \leq \varepsilon /2 {\lt} \varepsilon . \]

In the general (non-realizable) case, let \(h^* = \arg \min _{h \in \mathcal{H}} R_D(h)\). Then:

\begin{align*} R_D(h_S) & \leq \hat{R}_S(h_S) + \varepsilon /2 \leq \hat{R}_S(h^*) + \varepsilon /2 \leq R_D(h^*) + \varepsilon . \qedhere \end{align*}