3.5.1 The PAC Lower Bound
The upper bound of Theorem 3.7 gives \(m = O(d/\varepsilon )\) in the realizable case. The following shows this is tight up to constants.
For any hypothesis class \(\mathcal{H}\) with \(\mathrm{VCdim}(\mathcal{H}) = d \geq 1\), any PAC learner for \(\mathcal{H}\) requires sample complexity
for \(\varepsilon \leq 1/8\) and \(\delta \leq 1/7\).
Since \(\mathrm{VCdim}(\mathcal{H}) = d\), there exists a shattered set \(C = \{ x_1, \ldots , x_d\} \). For each subset \(T \subseteq C\), let \(h_T \in \mathcal{H}\) be the hypothesis that labels exactly \(T\) as positive. This gives \(2^d\) distinct hypotheses.
Fix \(\varepsilon \leq 1/8\). For each \(T \subseteq C\) with \(|T| = \lfloor d/2 \rfloor \), define the distribution \(D_T\): uniform on \(C\), with target \(h_T\). The learner receives \(m\) i.i.d. samples from \(D_T\).
Each sample reveals the label of one point in \(C\). After \(m\) samples, the expected number of unseen points in \(C\) is \(d(1 - 1/d)^m \geq d \cdot e^{-2m/d}\) (for \(m \leq d\)). For \(m \leq d/(8\varepsilon )\), the number of unseen points is at least \(d/4\) in expectation.
On each unseen point, the learner must guess the label. Since the target is consistent with both labels for unseen points (by the shattering property), no strategy can beat chance on unseen points. The error on each unseen point contributes \(1/d\) to the total risk (since \(D_T\) is uniform on \(C\)). With at least \(d/4\) unseen points in expectation, the expected risk is at least \(1/4 {\gt} \varepsilon \). A Markov argument converts this to a high-probability statement, showing \(m = \Omega (d/\varepsilon )\).