Exploration by Optimisation in Partial Monitoring

07/12/2019
by   Tor Lattimore, et al.
1

We provide a simple and efficient algorithm for adversarial k-action d-outcome non-degenerate locally observable partial monitoring games for which the n-round minimax regret is bounded by 3(d+1) k^3/2√(8n (k)), matching the best known information-theoretic upper bounds.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset