Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial k-action d-outcome non-degenerate locally observable partial monitoring games for which the n-round minimax regret is bounded by 3(d+1) k^3/2√(8n (k)), matching the best known information-theoretic upper bounds.
READ FULL TEXT