Alternative Function Approximation Parameterizations for Solving Games: An Analysis of f-Regression Counterfactual Regret Minimization

12/06/2019
by   Ryan D'Orazio, et al.
0

Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a flexible and simple algorithm for approximately solving imperfect information games with policies parameterized by a normalized rectified linear unit (ReLU). In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and has a regret bound with a better dependence on the number of actions in the tabular case. We derive approximation error-aware regret bounds for (Φ, f)-regret matching, which applies to a general class of link functions and regret objectives. These bounds recover a tighter bound for RCFR and provides a theoretical justification for RCFR implementations with alternative policy parameterizations (f-RCFR), including softmax. We provide exploitability bounds for f-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games, and examine empirically how the link function interacts with the severity of the approximation to determine exploitability performance in practice. Although a ReLU parameterized policy is typically the best choice, a softmax parameterization can perform as well or better in settings that require aggressive approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2019

Bounds for Approximate Regret-Matching Algorithms

A dominant approach to solving large imperfect-information games is Coun...
research
05/03/2012

No-Regret Learning in Extensive-Form Games with Imperfect Recall

Counterfactual Regret Minimization (CFR) is an efficient no-regret learn...
research
09/11/2018

Solving Imperfect-Information Games via Discounted Regret Minimization

Counterfactual regret minimization (CFR) is a family of iterative algori...
research
10/10/2018

Lazy-CFR: a fast regret minimization algorithm for extensive games with imperfect information

In this paper, we focus on solving two-player zero-sum extensive games w...
research
04/24/2019

Solving zero-sum extensive-form games with arbitrary payoff uncertainty models

Modeling strategic conflict from a game theoretical perspective involves...
research
09/10/2020

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Counterfactual regret minimization (CFR) is a popular method to deal wit...
research
10/15/2021

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information

Counterfactual regret Minimization (CFR) is an effective algorithm for s...

Please sign up or login with your details

Forgot password? Click here to reset