3.2 Binary KL and Pinsker’s inequality

Definition 3.4 Binary KL

Lean: InformationTheory.klBin

For \(p, q \in [0, 1]\),

\[ \mathrm{KL}_\mathrm {bin}(p, q) \; :=\; p \log \tfrac {p}{q} + (1-p) \log \tfrac {1-p}{1-q} \]

with the convention \(0 \cdot \log (0/\cdot ) = 0\) [ 7 , § 2.3, eq. (2.26) ] , [ 16 , § 2.1 ] .

Lemma 3.5 Derivative in \(q\)

Lean: InformationTheory.hasDerivAt_klBin_q

For \(q \in (0, 1)\), \(\partial _q \mathrm{KL}_\mathrm {bin}(p, q) = (q - p) / (q (1 - q))\) [ 16 , § 2.1, eq. (2.6) ] , [ 5 , § 4.4 ] .

Theorem 3.6 Binary Pinsker, sharp constant \(2\)

Lean: InformationTheory.binary_pinsker

For \(p \in [0, 1]\) and \(q \in (0, 1)\), \(2 (p - q)^2 \; \le \; \mathrm{KL}_\mathrm {bin}(p, q)\) [ 15 ] , [ 7 , Lemma 11.6.1 ] .

Theorem 3.7 Pinsker’s inequality

Lean: InformationTheory.pinsker_proof

For probability measures \(P \ll Q\) with \(\mathrm{KL}(P,Q) {\lt} \infty \),

\[ \mathrm{TV}(P, Q) \; \le \; \sqrt{\mathrm{KL}_\mathbb {R}(P, Q) / 2} \]

[ 7 , Thm 11.6.1 ] . The convention matches Definition 2.22 (half-sup TV); with the \(L^1\) convention the bound reads \(\| P - Q \| _1 \le \sqrt{2\, \mathrm{KL}}\).

Theorem 3.8 Binary-to-general coarsening

Lean: InformationTheory.klBin_le_klDivReal

For every measurable \(A\) with \(P(A) = p\), \(Q(A) = q\), \(\mathrm{KL}_\mathrm {bin}(p, q) \le \mathrm{KL}_\mathbb {R}(P, Q)\). This is the two-point data-processing inequality and is the step linking the binary Pinsker to the general measure-theoretic Pinsker [ 7 , Thm 2.8.1 ] .