Documentation

ZPM.Probability.Decision.Minimax.MWU.Update

MWU update step and one-step potential bound #

Given a column c that the row player hits (M r c = true), the MWU update multiplies weights c by 1 - η and leaves all other weights unchanged. The one-step potential bound Φ' ≤ Φ · (1 - η · v) follows from the row player's best-response property composed with the hypothesis that every PMF-valued column mixture admits a pure response with payoff at least v.

def ProbabilityTheory.mwuUpdateWeights {C : Type u_1} [Fintype C] {R : Type u_2} (M : RCBool) (η : ) (hη1 : η < 1) (cfg : MWUConfig C) (r : R) :

One step of the multiplicative-weights update: columns c with M r c = true have their weights scaled by 1 - η; all other weights are unchanged.

Equations
Instances For
    theorem ProbabilityTheory.best_response_payoff_weights {R : Type u_1} {C : Type u_2} [Fintype R] [Fintype C] [Nonempty C] (M : RCBool) (v : ) (hrow : ∀ (q : FintypePMF C), ∃ (r : R), v c : C, q.prob c * if M r c = true then 1 else 0) (cfg : MWUConfig C) :
    v * cfg.potential c : C, cfg.weights c * if M .choose c = true then 1 else 0

    Best-response payoff in terms of weights: a pure response against the normalized PMF gives expected payoff at least v · Φ when weighed by the un-normalized weight vector.

    theorem ProbabilityTheory.potential_one_step_bound {R : Type u_1} {C : Type u_2} [Fintype R] [Fintype C] [Nonempty C] (M : RCBool) (η : ) ( : 0 η) (hη1 : η < 1) (v : ) (hrow : ∀ (q : FintypePMF C), ∃ (r : R), v c : C, q.prob c * if M r c = true then 1 else 0) (cfg : MWUConfig C) :
    (mwuUpdateWeights M η hη1 cfg .choose).potential cfg.potential * (1 - η * v)

    One-step MWU potential bound: after one update, the potential is multiplied by (1 - η · v) or less.