Bridging klBin to the general KL: Jensen + DPI for indicators #
Three steps bridge the binary KL to the full KL via klFun:
- Algebraic identity:
klBin p q = q · klFun(p/q) + (1 − q) · klFun((1 − p)/(1 − q)). - Jensen on a subset:
Q(A) · klFun(avg_A f) ≤ ∫_A klFun(f) dQ. - Data-processing inequality for indicators:
klBin(P(A), Q(A)) ≤ (klDiv P Q).toReal.
These are the three pieces that assemble into the per-set squared bound
2 (P(A) − Q(A))² ≤ klDivReal P Q in the next shard.
theorem
InformationTheory.klFun_integral_ge_of_measurableSet
{α : Type u_1}
[MeasurableSpace α]
{Q : MeasureTheory.Measure α}
[MeasureTheory.IsFiniteMeasure Q]
(f : α → ℝ)
(hf_int : MeasureTheory.Integrable f Q)
(hf_int_kl : MeasureTheory.Integrable (fun (x : α) => klFun (f x)) Q)
(hf_nn : ∀ᵐ (x : α) ∂Q, 0 ≤ f x)
{A : Set α}
(hQA_pos : 0 < Q.real A)
:
Jensen on a subset. For a finite measure Q and a measurable set A with
positive mass, if f and klFun ∘ f are integrable and f ≥ 0 almost everywhere,
then Q(A) · klFun((1/Q(A)) · ∫_A f dQ) ≤ ∫_A klFun(f) dQ.
theorem
InformationTheory.klBin_le_klDivReal
{α : Type u_1}
[MeasurableSpace α]
(P Q : MeasureTheory.Measure α)
[MeasureTheory.IsProbabilityMeasure P]
[MeasureTheory.IsProbabilityMeasure Q]
(hac : P.AbsolutelyContinuous Q)
(h_int : MeasureTheory.Integrable (MeasureTheory.llr P Q) P)
(A : Set α)
(hA : MeasurableSet A)
:
Data processing inequality for indicators. For probability measures P ≪ Q
with finite KL divergence and any measurable set A, the binary KL between the
marginals on (A, Aᶜ) is bounded by the full KL:
klBin(P(A), Q(A)) ≤ klDivReal P Q.