2.1 Atomic vocabulary
Lean: Concept
A domain \(X\) is a set whose elements are called instances. A label set \(Y\) is a set of possible outputs; in binary classification, \(Y = \{ 0, 1\} \). A concept is a function \(c : X \to Y\).
Equivalently, when \(Y = \{ 0,1\} \), a concept \(c\) can be identified with the subset \(\{ x \in X : c(x) = 1\} \subseteq X\). Both perspectives are used throughout the literature. We adopt the function view as primary and the set view when it simplifies combinatorial arguments (as in shattering, Chapter 3).
A hypothesis \(h : X \to Y\) is a candidate prediction rule that a learning algorithm might output. The target concept \(c^* \in \mathcal{C}\) is the specific concept that generated the training data. Learning is proper if the algorithm’s output is constrained to lie in \(\mathcal{C}\), and improper if it may use a larger hypothesis space \(\mathcal{H} \supseteq \mathcal{C}\).