Formal Learning Theory Kernel: Blueprint v1: The PAC Lower Bound

3.5.1 The PAC Lower Bound

The upper bound of Theorem 3.7 gives \(m = O(d/\varepsilon )\) in the realizable case. The following shows this is tight up to constants.

Theorem 3.23 PAC Lower Bound

For any hypothesis class \(\mathcal{H}\) with \(\mathrm{VCdim}(\mathcal{H}) = d \geq 1\), any PAC learner for \(\mathcal{H}\) requires sample complexity

\[ m(\varepsilon , \delta ) = \Omega \! \left(\frac{d}{\varepsilon }\right) \]

for \(\varepsilon \leq 1/8\) and \(\delta \leq 1/7\).

Proof ▼

Since \(\mathrm{VCdim}(\mathcal{H}) = d\), there exists a shattered set \(C = \{ x_1, \ldots , x_d\} \). For each subset \(T \subseteq C\), let \(h_T \in \mathcal{H}\) be the hypothesis that labels exactly \(T\) as positive. This gives \(2^d\) distinct hypotheses.

Fix \(\varepsilon \leq 1/8\). For each \(T \subseteq C\) with \(|T| = \lfloor d/2 \rfloor \), define the distribution \(D_T\): uniform on \(C\), with target \(h_T\). The learner receives \(m\) i.i.d. samples from \(D_T\).

Each sample reveals the label of one point in \(C\). After \(m\) samples, the expected number of unseen points in \(C\) is \(d(1 - 1/d)^m \geq d \cdot e^{-2m/d}\) (for \(m \leq d\)). For \(m \leq d/(8\varepsilon )\), the number of unseen points is at least \(d/4\) in expectation.

On each unseen point, the learner must guess the label. Since the target is consistent with both labels for unseen points (by the shattering property), no strategy can beat chance on unseen points. The error on each unseen point contributes \(1/d\) to the total risk (since \(D_T\) is uniform on \(C\)). With at least \(d/4\) unseen points in expectation, the expected risk is at least \(1/4 {\gt} \varepsilon \). A Markov argument converts this to a high-probability statement, showing \(m = \Omega (d/\varepsilon )\).