3.2.3 Stage 3: Uniform Convergence Implies ERM Succeeds
Proposition
3.14
ERM is a PAC learner under uniform convergence
If \(\mathcal{H}\) has the uniform convergence property, then \(\operatorname{ERM}_\mathcal {H}\) is a PAC learner for \(\mathcal{H}\).
Proof
Suppose \(m \geq m_{\mathrm{UC}}(\varepsilon /2, \delta )\), so that with probability \(\geq 1 - \delta \),
\[ \sup _{h \in \mathcal{H}} |R_D(h) - \hat{R}_S(h)| \leq \varepsilon /2. \]
Let \(h_S = \operatorname{ERM}_\mathcal {H}(S)\). In the realizable case, \(\hat{R}_S(c^*) = 0\), so \(\hat{R}_S(h_S) = 0\) as well. Then:
\[ R_D(h_S) = \underbrace{(R_D(h_S) - \hat{R}_S(h_S))}_{\leq \varepsilon /2} + \underbrace{\hat{R}_S(h_S)}_{= 0} \leq \varepsilon /2 {\lt} \varepsilon . \]
In the general (non-realizable) case, let \(h^* = \arg \min _{h \in \mathcal{H}} R_D(h)\). Then:
\begin{align*} R_D(h_S) & \leq \hat{R}_S(h_S) + \varepsilon /2 \leq \hat{R}_S(h^*) + \varepsilon /2 \leq R_D(h^*) + \varepsilon . \qedhere \end{align*}