RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

09/10/2020
by   Huale Li, et al.
0

Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike existing studies that mostly explore for solving larger scale problems or accelerating solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework. And the dynamic procedure of iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy is at each step. Extensive experimental results on various games have shown that the generalization ability of our method is significantly improved compared with existing state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

Solving imperfect-information games via exponential counterfactual regret minimization

Two agents' decision-making problems can be modeled as the game with two...
research
05/26/2021

NNCFR: Minimize Counterfactual Regret with Neural Networks

Counterfactual Regret Minimization (CFR) is the popular method for findi...
research
07/22/2023

CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong

Counterfactual Regret Minimization(CFR) has shown its success in Texas H...
research
10/15/2021

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information

Counterfactual regret Minimization (CFR) is an effective algorithm for s...
research
12/06/2019

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of f-Regression Counterfactual Regret Minimization

Function approximation is a powerful approach for structuring large deci...
research
01/31/2021

Fast Rates for the Regret of Offline Reinforcement Learning

We study the regret of reinforcement learning from offline data generate...
research
05/27/2019

Learning Policies from Human Data for Skat

Decision-making in large imperfect information games is difficult. Thank...

Please sign up or login with your details

Forgot password? Click here to reset