Formal Learning Theory Kernel: Blueprint v1: Exercises

Exercises

VC dimension of convex polygons. Let \(\mathcal{H}_k\) be the class of convex \(k\)-gons in \(\mathbb {R}^2\): \(h_P(x) = 1\) iff \(x\) lies inside a convex polygon \(P\) with at most \(k\) vertices. Prove that \(\mathrm{VCdim}(\mathcal{H}_k) = 2k + 1\).

Lower bound: Place \(2k+1\) points in convex position (on a circle). For any subset \(T\) of these points with \(|T| \leq k\), a convex \(k\)-gon can be drawn to include exactly \(T\). For larger subsets, use the complement labeling and the observation that \(2k+1 - |T| \leq k\) of the remaining points can be avoided.

Upper bound: Show that any \(2k+2\) points include a labeling unrealizable by a \(k\)-gon. Argue that a convex polygon with \(k\) vertices can “separate” at most \(2k\) contiguous arcs on any circle through the points, and \(2k+2\) points in convex position create \(2k+2\) arcs.
The distribution-free assumption is essential. The Fundamental Theorem characterizes distribution-free PAC learnability. Show that the equivalence breaks if “distribution-free” is replaced by “distribution-dependent.”
1. Construct a class \(\mathcal{H}\) with \(\mathrm{VCdim}(\mathcal{H}) = \infty \) and a specific distribution \(D\) such that \(\mathcal{H}\) is PAC learnable under \(D\) with \(m(\varepsilon , \delta ) = O(1)\) samples.
2. More subtly: construct a class \(\mathcal{H}\) with \(\mathrm{VCdim}(\mathcal{H}) = \infty \) that is PAC learnable under every fixed distribution \(D\) (with sample complexity depending on \(D\)), yet is not distribution-free PAC learnable. Hint: Let \(\mathcal{H}\) be all measurable functions on \([0,1]\). For any fixed \(D\), the metric entropy of \(\mathcal{H}\) under \(L^1(D)\) is finite at every scale, one can \(\varepsilon \)-net the class with \(O(1/\varepsilon )\) functions, so PAC learning under \(D\) is possible. The distribution-free failure comes from the adversary’s ability to concentrate \(D\) on the points where the learner has not yet gathered information.
Tight agnostic lower bound via Assouad’s lemma. Prove the lower bound \(m(\varepsilon , \delta ) = \Omega (d/\varepsilon ^2)\) for agnostic PAC learning of any class \(\mathcal{H}\) with \(\mathrm{VCdim}(\mathcal{H}) = d\), using Assouad’s lemma rather than Le Cam’s method.

Construction: Let \(C = \{ x_1, \ldots , x_d\} \) be a shattered set. For each \(b \in \{ 0,1\} ^d\), define \(D_b\) as the distribution that places mass \(1/(2d)\) on each \(x_i\) and mass \(1/2\) on a “noise point” \(z \notin C\) with label drawn from \(\mathrm{Bernoulli}(1/2)\). The target function labels \(x_i\) as \(b_i\) and \(z\) as \(1\). The optimal risk under \(D_b\) is \(1/4\) (from the noise at \(z\)).

Show that any algorithm distinguishing \(D_b\) from \(D_{b'}\) (where \(b\) and \(b'\) differ in one coordinate) with advantage \(\varepsilon \) requires \(\Omega (1/\varepsilon ^2)\) samples from the signal at a single \(x_i\). Since there are \(d\) coordinates, Assouad’s lemma gives the combined bound \(\Omega (d/\varepsilon ^2)\). Verify that this matches the agnostic gap of Theorem 3.21.