DeepAI AI Chat
Log In Sign Up

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

09/10/2020
by   Huale Li, et al.
0

Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike existing studies that mostly explore for solving larger scale problems or accelerating solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework. And the dynamic procedure of iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy is at each step. Extensive experimental results on various games have shown that the generalization ability of our method is significantly improved compared with existing state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/06/2020

Solving imperfect-information games via exponential counterfactual regret minimization

Two agents' decision-making problems can be modeled as the game with two...
05/26/2021

NNCFR: Minimize Counterfactual Regret with Neural Networks

Counterfactual Regret Minimization (CFR) is the popular method for findi...
10/15/2021

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information

Counterfactual regret Minimization (CFR) is an effective algorithm for s...
04/24/2019

Solving zero-sum extensive-form games with arbitrary payoff uncertainty models

Modeling strategic conflict from a game theoretical perspective involves...
01/31/2021

Fast Rates for the Regret of Offline Reinforcement Learning

We study the regret of reinforcement learning from offline data generate...
05/27/2019

Learning Policies from Human Data for Skat

Decision-making in large imperfect information games is difficult. Thank...