Formal Learning Theory Kernel: Blueprint v1

7.1 Borel parameterization and the symmetrization route

The symmetrization argument at the heart of the uniform convergence proof (Stage 2 of Section 3.2) rewrites the one-sided bad event

\[ \mathrm{Bad}_1(C, m, \varepsilon ) \; =\; \Bigl\{ \, S \in X^m \; :\; \sup _{h \in C}\, \bigl|\, R_D(h) - \hat{R}_S(h) \, \bigr| {\gt} \varepsilon \, \Bigr\} \]

in terms of a ghost sample \(S' \sim D^m\) and the two-sided comparison event

\[ \mathrm{Bad}_2(C, m, \varepsilon ) \; =\; \Bigl\{ \, (S, S') \in X^m \times X^m \; :\; \sup _{h \in C}\, \bigl|\, \hat{R}_{S'}(h) - \hat{R}_S(h) \, \bigr| {\gt} \varepsilon \, \Bigr\} . \]

The bound \(\Pr _S[\mathrm{Bad}_1] \leq 2\, \Pr _{S,S'}[\mathrm{Bad}_2]\) transfers control of the deviation between empirical and true error over to the two-sample event. The union-bound-over-labelings step that follows requires the inner event at each fixed labeling to carry a well-defined probability: for each split of the combined \(2m\) points into \(S\) and \(S'\), Hoeffding’s inequality is applied to the \(2m\) centered Bernoulli contributions, and the resulting tail bound is summed over the \(\Pi _C(2m)\) distinct effective labelings.

What does this argument actually require of \(\mathrm{Bad}_2\)? Three things:

  1. A probability for \(\mathrm{Bad}_2\). The outer symmetrization bound compares \(\Pr _{S,S'}[\mathrm{Bad}_2]\) against a sum of inner tail bounds. For this comparison to be meaningful, \(\mathrm{Bad}_2\) must be assigned a probability with respect to the product measure \(D^{\otimes 2m}\).

  2. Measurability of each slice at a fixed labeling. The inner Hoeffding bound is applied to the event that a specific (fixed) labeling produces a large gap between the \(S\)-average and the \(S'\)-average. This inner event lives in the product Borel structure and is Borel automatically.

  3. Closure of the probability under countable operations. The supremum over \(h \in C\) is rewritten as a finite union over the growth-function-many effective labelings on the combined sample, so the only closure needed is under finite union inside the product \(\sigma \)-algebra extended with the \(D^{\otimes 2m}\)-null sets.

Requirement (1) is exactly null-measurability with respect to the completion of \(D^{\otimes 2m}\). Requirements (2) and (3) are automatic for any \(\mathrm{Bad}_2\) built from a class with a Borel parameter space. Nowhere does the proof need \(\mathrm{Bad}_2\) itself to be Borel with respect to the product Borel structure.

Remark 7.1 The Krapp and Wirth hypothesis, diagnosed

The natural place to plant a measurability hypothesis in the proof is after (1): require \(\mathrm{Bad}_2\) to sit in whatever ambient \(\sigma \)-algebra the argument is using. Krapp and Wirth  [ plant it inside the product Borel \(\sigma \)-algebra. That is a strictly stronger requirement than (1) alone. The gap between “Borel on the product space” and “null-measurable for every product measure built from \(D\)” is invisible in every Borel-parameterized example, where both happen to hold, but it is a real gap in general.

The kernel names the refined hypothesis.

Definition 7.2 WellBehavedVCMeasTarget

Lean: WellBehavedVCMeasTarget

A concept class \(C\) over a measurable space \((X, \Sigma )\) satisfies WellBehavedVCMeasTarget if, for every sample size \(m\), every gap threshold \(\varepsilon \), and every probability distribution \(D\) on \(X\), the two-sided ghost-gap bad event \(\mathrm{Bad}_2(C, m, \varepsilon )\) is a null-measurable subset of \((X^m \times X^m,\, \Sigma ^{\otimes 2m})\) with respect to the product measure \(D^{\otimes 2m}\). Here null-measurability means that the event differs from a set in \(\Sigma ^{\otimes 2m}\) by a \(D^{\otimes 2m}\)-null set, equivalently, it sits in the completion of \(\Sigma ^{\otimes 2m}\) under \(D^{\otimes 2m}\).

Definition 7.3 KrappWirthWellBehaved

Lean: KrappWirthWellBehaved

A concept class \(C\) satisfies the KrappWirthWellBehaved hypothesis of  [ if, for every \(m\) and \(\varepsilon \), the two-sided ghost-gap bad event \(\mathrm{Bad}_2(C, m, \varepsilon )\) is a Borel subset of \(X^m \times X^m\) in the product Borel structure.

Proposition 7.4 KrappWirth is stronger than WellBehavedVCMeasTarget

Lean: KrappWirthWellBehaved.toWellBehavedVC

Every Borel set is null-measurable with respect to any completion of a Borel probability measure. Consequently any concept class carrying KrappWirthWellBehaved also carries WellBehavedVCMeasTarget.

Proof

\(\mathcal{B} \subseteq \mathcal{B}_\mu ^\ast \) for every Borel probability measure \(\mu \), because the completion only enlarges the \(\sigma \)-algebra by \(\mu \)-null sets. Apply this to the product structure \(\Sigma ^{\otimes 2m}\) with the product measure \(D^{\otimes 2m}\) and the statement follows immediately.

What Proposition 7.4 leaves open is the reverse. The reverse direction is the subject of Section 7.3. We prepare the ground by introducing the regularity the kernel uses in practice to produce instances of WellBehavedVCMeasTarget.

Definition 7.5 Borel-parameterized concept class

A concept class \(C \subseteq \{ 0,1\} ^X\) over a measurable space \((X, \Sigma )\) is Borel-parameterized by a standard Borel space \((\Theta , \mathcal{B}_\Theta )\) if there exists a jointly measurable evaluation map \(\mathrm{eval} : \Theta \times X \to \{ 0,1\} \) such that \(C = \bigl\{ \, \mathrm{eval}(\theta , \cdot )\, :\, \theta \in \Theta \, \bigr\} \).

Borel parameterization is the standard regularity assumption in descriptive-set-theory approaches to learning theory. Every concept class considered in practice admits a Borel parameterization: finite classes, neural networks with measurable weights, decision trees of bounded depth, thresholds on \(\mathbb {R}\), halfspaces, axis-aligned rectangles, and so on. When the parameter space \(\Theta \) is discrete or finite, joint measurability is automatic. When \(\Theta = \mathbb {R}^k\) for some \(k\), joint measurability amounts to continuity of \(\mathrm{eval}\) in \(\theta \) for each fixed \(x\) together with measurability of \(\mathrm{eval}\) in \(x\) for each fixed \(\theta \). The kernel’s WellBehavedVCMeasTarget setting captures exactly this regularity, and the next section proves that the refinement is automatic for every such class.

Definition 7.6 Parameterized ghost-gap bad event

Lean: paramBadEvent

For a Borel-parameterized class with evaluation map \(\mathrm{eval}\), sample size \(m\), and gap threshold \(\varepsilon \), the parameterized ghost-gap bad event is

\[ \mathrm{Bad}_{\mathrm{eval}}(m, \varepsilon ) \; =\; \bigcup _{\theta \in \Theta } \Bigl\{ \, (S, S') \in X^m \times X^m \; :\; \bigl| \hat{R}_{S'}(\mathrm{eval}(\theta ,\cdot )) - \hat{R}_S(\mathrm{eval}(\theta ,\cdot )) \bigr| {\gt} \varepsilon \, \Bigr\} . \]

Equivalently, \(\mathrm{Bad}_{\mathrm{eval}}(m, \varepsilon )\) is the projection onto the last two factors of a jointly Borel subset of \(\Theta \times X^m \times X^m\).

The “projection of a jointly Borel event” reading is the key. A projection of a Borel set onto a product factor is not Borel in general. It is, however, always analytic. This observation is what connects the kernel’s refinement to classical descriptive set theory, and it is the topic of Section 7.2.