Formal Learning Theory Kernel: Blueprint v1

5.2 Ex-Learning and Gold’s Impossibility Theorem

Definition 5.2 Explanatory Learning (\(\mathbf{Ex}\))

Lean: EXLearnable

A learner \(M\) \(\mathbf{Ex}\)-identifies a language \(L\) from text if, for every text \(t\) for \(L\), there exists an index \(e\) and a time \(n_0\) such that \(M(t_0, \ldots , t_n) = e\) for all \(n \geq n_0\), and \(W_e = L\) (where \(W_e\) is the language computed by program \(e\)).

A class \(\mathcal{L}\) of languages is \(\mathbf{Ex}\)-identifiable if there exists a single learner \(M\) that \(\mathbf{Ex}\)-identifies every \(L \in \mathcal{L}\) from text.

The definition requires syntactic convergence: the learner must eventually output the same index forever. It is not enough to output a sequence of indices that all happen to compute the correct language, the index itself must stabilize. (Relaxing this requirement gives BC-learning; see Section 5.3.2.)

Example 5.3 \(\mathbf{Ex}\)-identification of finite languages

Let \(\mathcal{L}_{\mathrm{fin}}\) be the class of all finite subsets of \(\mathbb {N}\). Here is an \(\mathbf{Ex}\)-learner for \(\mathcal{L}_{\mathrm{fin}}\): at time \(n\), output an index for \(\{ t_0, \ldots , t_n\} \). After all elements of \(L\) have appeared (which happens at some finite time \(n_0\), since \(L\) is finite), the set \(\{ t_0, \ldots , t_n\} \) equals \(L\) for all \(n \geq n_0\). The hypothesis stabilizes.

This simple algorithm illustrates a crucial asymmetry: the learner does not know when it has converged. It has no way to announce “I am done.” It merely stabilizes, silently. This lack of a convergence signal is what makes Gold’s impossibility theorem possible.

Theorem 5.4 Gold’s Impossibility Theorem [

Lean: gold_theorem

Let \(\mathcal{L}\) be any class of languages that contains all finite languages and at least one infinite language. Then \(\mathcal{L}\) is not \(\mathbf{Ex}\)-identifiable from text.

This is the centerpiece of the chapter. The proof is a diagonalization argument: we defeat every candidate learner by constructing a text that forces it to fail. The construction is adversarial, not probabilistic, there is no distribution, no \(\varepsilon \)-net, no covering number. The adversary simply watches the learner and manipulates the future of the stream to prevent convergence.

Proof

Let \(M\) be any learner. We construct a text \(t\) for some language \(L \in \mathcal{L}\) such that \(M\) fails to \(\mathbf{Ex}\)-identify \(L\) on \(t\).

We build \(t\) in stages. Let \(L_\infty \in \mathcal{L}\) be the infinite language that \(\mathcal{L}\) contains by assumption. Enumerate \(L_\infty = \{ w_0, w_1, w_2, \ldots \} \).

Stage 0. Present \(w_0\) to \(M\). The learner outputs some hypothesis \(h_0 = M(w_0)\).

Stage \(k \geq 1\). At this point we have presented the finite sequence \((w_0, \ldots , w_{n_{k-1}})\) for some \(n_{k-1}\), and \(M\) has output hypothesis \(h_{n_{k-1}}\). Let \(F_k = \{ w_0, \ldots , w_{n_{k-1}}\} \) be the set of strings presented so far.

Consider two cases:

  • [leftmargin=*]
  • Case A: \(W_{h_{n_{k-1}}} = F_k\) (the current hypothesis computes exactly the finite set seen so far). Then present \(w_{n_{k-1}+1}\) (the next element of \(L_\infty \)), enlarging \(F_k\). Let \(n_k = n_{k-1} + 1\). Proceed to stage \(k+1\).

  • Case B: \(W_{h_{n_{k-1}}} \neq F_k\). Then the current hypothesis is already wrong for the finite language \(F_k\). We may define \(L = F_k\): since \(F_k\) is finite and \(\mathcal{L}\) contains all finite languages, \(F_k \in \mathcal{L}\). We extend the text by repeating elements of \(F_k\) forever. The learner \(M\) has output a hypothesis \(h_{n_{k-1}}\) with \(W_{h_{n_{k-1}}} \neq F_k = L\), and the text for \(L\) can be arranged so that \(M\) never converges to a correct index (we continue to the next stage rather than stopping, as shown below).

The key observation is the diagonalization: we never let the learner rest.

If Case A occurs at every stage, then every element of \(L_\infty \) is eventually presented, so \(t\) is a text for \(L_\infty \). At each stage, \(M\)’s current hypothesis \(h_{n_k}\) is checked: does \(W_{h_{n_k}} = F_k\)? If yes, we expand. But \(F_k \subsetneq L_\infty \) for all \(k\) (since \(L_\infty \) is infinite), so \(W_{h_{n_k}} = F_k \neq L_\infty \). Therefore \(M\) outputs a hypothesis that is wrong at every stage, it cannot converge to a correct index for \(L_\infty \).

If Case B occurs at some stage \(k\), then \(M\) has already failed for \(F_k\). We commit to \(L = F_k\) and repeat elements of \(F_k\) forever. But we can do more: before committing, we wait to see if \(M\) changes its mind. If \(M\) later outputs a hypothesis \(h'\) with \(W_{h'} = F_k\), we switch to Case A and add the next element of \(L_\infty \). This forces \(M\) to fail for \(L_\infty \) instead.

In either case, \(M\) fails. Since \(M\) was an arbitrary learner, no learner \(\mathbf{Ex}\)-identifies \(\mathcal{L}\).

The proof has a recursive structure that repays careful study. The adversary maintains a “threat” at every stage: either the current language is the finite set seen so far, or it is the infinite language \(L_\infty \). Whenever the learner commits to one possibility, the adversary switches to the other. The learner is forced to change its mind infinitely often, and \(\mathbf{Ex}\)-identification requires that it eventually stop changing its mind.

Remark 5.5 Diagonalization vs. concentration

Compare this proof to the lower bound for PAC learning (Chapter 3). The PAC lower bound constructs a distribution that fools the learner with positive probability; it is probabilistic. Gold’s proof constructs a text that fools the learner with certainty; it is adversarial. The PAC proof uses counting (how many hypotheses can be distinguished by \(m\) samples); Gold’s proof uses the halting problem implicitly (the adversary must check whether \(W_e = F\) for arbitrary \(e\)). These are entirely different mathematical worlds.

The proof technique of Theorem 5.4 is a priority argument in the sense of recursion theory, though a simple one. More sophisticated priority arguments appear in the study of learning with resource bounds (Case and Smith  [ ). The connection between Gold-style learning and recursion theory runs deep: Barzdinš  [ showed that the class of languages \(\mathbf{Ex}\)-identifiable from informant is exactly the class of \(\Sigma ^0_2\)-definable families, establishing a precise link to the arithmetical hierarchy.