5.2 Ex-Learning and Gold’s Impossibility Theorem
Lean: EXLearnable
A learner \(M\) \(\mathbf{Ex}\)-identifies a language \(L\) from text if, for every text \(t\) for \(L\), there exists an index \(e\) and a time \(n_0\) such that \(M(t_0, \ldots , t_n) = e\) for all \(n \geq n_0\), and \(W_e = L\) (where \(W_e\) is the language computed by program \(e\)).
A class \(\mathcal{L}\) of languages is \(\mathbf{Ex}\)-identifiable if there exists a single learner \(M\) that \(\mathbf{Ex}\)-identifies every \(L \in \mathcal{L}\) from text.
The definition requires syntactic convergence: the learner must eventually output the same index forever. It is not enough to output a sequence of indices that all happen to compute the correct language, the index itself must stabilize. (Relaxing this requirement gives BC-learning; see Section 5.3.2.)
Let \(\mathcal{L}_{\mathrm{fin}}\) be the class of all finite subsets of \(\mathbb {N}\). Here is an \(\mathbf{Ex}\)-learner for \(\mathcal{L}_{\mathrm{fin}}\): at time \(n\), output an index for \(\{ t_0, \ldots , t_n\} \). After all elements of \(L\) have appeared (which happens at some finite time \(n_0\), since \(L\) is finite), the set \(\{ t_0, \ldots , t_n\} \) equals \(L\) for all \(n \geq n_0\). The hypothesis stabilizes.
This simple algorithm illustrates a crucial asymmetry: the learner does not know when it has converged. It has no way to announce “I am done.” It merely stabilizes, silently. This lack of a convergence signal is what makes Gold’s impossibility theorem possible.
Lean: gold_theorem
Let \(\mathcal{L}\) be any class of languages that contains all finite languages and at least one infinite language. Then \(\mathcal{L}\) is not \(\mathbf{Ex}\)-identifiable from text.
This is the centerpiece of the chapter. The proof is a diagonalization argument: we defeat every candidate learner by constructing a text that forces it to fail. The construction is adversarial, not probabilistic, there is no distribution, no \(\varepsilon \)-net, no covering number. The adversary simply watches the learner and manipulates the future of the stream to prevent convergence.
Let \(M\) be any learner. We construct a text \(t\) for some language \(L \in \mathcal{L}\) such that \(M\) fails to \(\mathbf{Ex}\)-identify \(L\) on \(t\).
We build \(t\) in stages. Let \(L_\infty \in \mathcal{L}\) be the infinite language that \(\mathcal{L}\) contains by assumption. Enumerate \(L_\infty = \{ w_0, w_1, w_2, \ldots \} \).
Stage 0. Present \(w_0\) to \(M\). The learner outputs some hypothesis \(h_0 = M(w_0)\).
Stage \(k \geq 1\). At this point we have presented the finite sequence \((w_0, \ldots , w_{n_{k-1}})\) for some \(n_{k-1}\), and \(M\) has output hypothesis \(h_{n_{k-1}}\). Let \(F_k = \{ w_0, \ldots , w_{n_{k-1}}\} \) be the set of strings presented so far.
Consider two cases:
- [leftmargin=*]
Case A: \(W_{h_{n_{k-1}}} = F_k\) (the current hypothesis computes exactly the finite set seen so far). Then present \(w_{n_{k-1}+1}\) (the next element of \(L_\infty \)), enlarging \(F_k\). Let \(n_k = n_{k-1} + 1\). Proceed to stage \(k+1\).
Case B: \(W_{h_{n_{k-1}}} \neq F_k\). Then the current hypothesis is already wrong for the finite language \(F_k\). We may define \(L = F_k\): since \(F_k\) is finite and \(\mathcal{L}\) contains all finite languages, \(F_k \in \mathcal{L}\). We extend the text by repeating elements of \(F_k\) forever. The learner \(M\) has output a hypothesis \(h_{n_{k-1}}\) with \(W_{h_{n_{k-1}}} \neq F_k = L\), and the text for \(L\) can be arranged so that \(M\) never converges to a correct index (we continue to the next stage rather than stopping, as shown below).
The key observation is the diagonalization: we never let the learner rest.
If Case A occurs at every stage, then every element of \(L_\infty \) is eventually presented, so \(t\) is a text for \(L_\infty \). At each stage, \(M\)’s current hypothesis \(h_{n_k}\) is checked: does \(W_{h_{n_k}} = F_k\)? If yes, we expand. But \(F_k \subsetneq L_\infty \) for all \(k\) (since \(L_\infty \) is infinite), so \(W_{h_{n_k}} = F_k \neq L_\infty \). Therefore \(M\) outputs a hypothesis that is wrong at every stage, it cannot converge to a correct index for \(L_\infty \).
If Case B occurs at some stage \(k\), then \(M\) has already failed for \(F_k\). We commit to \(L = F_k\) and repeat elements of \(F_k\) forever. But we can do more: before committing, we wait to see if \(M\) changes its mind. If \(M\) later outputs a hypothesis \(h'\) with \(W_{h'} = F_k\), we switch to Case A and add the next element of \(L_\infty \). This forces \(M\) to fail for \(L_\infty \) instead.
In either case, \(M\) fails. Since \(M\) was an arbitrary learner, no learner \(\mathbf{Ex}\)-identifies \(\mathcal{L}\).
The proof has a recursive structure that repays careful study. The adversary maintains a “threat” at every stage: either the current language is the finite set seen so far, or it is the infinite language \(L_\infty \). Whenever the learner commits to one possibility, the adversary switches to the other. The learner is forced to change its mind infinitely often, and \(\mathbf{Ex}\)-identification requires that it eventually stop changing its mind.
Compare this proof to the lower bound for PAC learning (Chapter 3). The PAC lower bound constructs a distribution that fools the learner with positive probability; it is probabilistic. Gold’s proof constructs a text that fools the learner with certainty; it is adversarial. The PAC proof uses counting (how many hypotheses can be distinguished by \(m\) samples); Gold’s proof uses the halting problem implicitly (the adversary must check whether \(W_e = F\) for arbitrary \(e\)). These are entirely different mathematical worlds.
The proof technique of Theorem 5.4 is a priority argument in the sense of recursion theory, though a simple one. More sophisticated priority arguments appear in the study of learning with resource bounds (Case and Smith [ ). The connection between Gold-style learning and recursion theory runs deep: Barzdinš [ showed that the class of languages \(\mathbf{Ex}\)-identifiable from informant is exactly the class of \(\Sigma ^0_2\)-definable families, establishing a precise link to the arithmetical hierarchy.