I Introduction
Mahjong is a popular game in Asia and has been played over a hundred years with different rule sets according to the country or regions. Most rule sets of mahjong share common properties that makes developing AI challenging, e.g., the number of players is three or four (mostly four), size of the game tree is huge, size and number of information sets are large, and uncertainty strongly influences gameplay. Though the performance of AI has exceeded human experts in most twoplayer perfect information games and some multiplayer imperfect information games, this has not been the case for mahjong.
We propose a method of constructing an AI mahjong player and demonstrate that its performance is better than current AI players. We abstract the game of mahjong and treat it as multiple Markov decision processes (MDPs). We considered the averaged behavioral strategies of a variety of experts to replace three of the four players with a chance player. We introduce four MDPs as abstractions of mahjong and formulate a value functions by using these MDPs. The action probabilities of the chance player acting on behalf of three players are inferred from game records of experts and the authors’ experience. We also verified the performance of greedy players who always choose an action of the greatest value.
This paper is organized as follows. We explain the rules and features of mahjong in Sec. II. We review related research in Sec. III, and explain the contributions of our research in Sec. IV. We briefly outline our method in Sec. V, and give further details of it in Sec. VI. We discuss the performance evaluation of our method using gameplays visàvis existing the current strongest AI player in Sec. VII.
This research developed the contents of research on AI player of mahjong released at the domestic conference [1], organized the theoretical framework of the method, added a new computer experiment, and summarized it newly.
Ii Rules and Features of Mahjong
Iia Outline of Rules
There are variations in the rules of mahjong, but this section outlines the most basic mahjong rules commonly used in Japan (see [2]). Mahjong is a game played by four people. They use four sets of 34 tiles. These 34 tiles are different, and the total number of tiles is 136. Each player starts with 25,000 points. One gameplay of mahjong is a sequence of multiple hands^{1}^{1}1A hand also means a set of tiles owned by a player, and the points move from player to player by each hand. A standard way to earn points is to form a winning hand earlier than the other players. A typical wininning hand consists of four combinations of three tiles satisfying specific conditions (each combination is called mentsu) and one pair tiles of the same kind. The final rank of each player is determined by the final points of the game.
In addition to the four players, we consider a chance player who introduces contingency into gameplay. The actions of the chance player are classified into the following two types.

: The chance player distributes hands from the draw pile to each player. Each player receives a hand composed of 13 tiles, and one player receives an additional tile as . This player is called the parent player.

: The chance player distributes one tile from the draw pile to a player. The tile is not revealed to the other players.
The information sets of each player are categorized into two types. Any information set of the first type follows or . The player dealt is the player to choose an action from one of the following action types.

: The player declares a win when his/her hand (13 + 1 tiles) satisfies specific conditions. Then the player discloses the hand and earns points depending on the hand from the other players. All players then discard their hands.

: The player discards a tile (therefore, the size of the hand is kept at 13). The tile is now revealed to the others.
The second type follows or (explained below) of a player who discarded tile , where the other players sometimes gains the right to choose one of the following action types.

: Player declares a win when his/her hand (13 tiles) and ’s discarded (1 tile) satisfy specific conditions. Then earns points from , and all players discard their hands.

: Player assembles a mentsu using (take), discloses the mentsu, then discards another tile. Take behaviors are classified into a few classes such as pon (also known as pung) and chi (also known as chow) depending on mentsus assembled.

: Player does not declare anything. If all players pass, the next action is of a player next to .
These action types form the bulk of branching points in the hand. Each hand starts with , and the hand ends when one of the players chooses an action of type (set of and ), or when the number of tiles in the draw pile decreases to a specific number. Fig. 1 illustrates some branches of the gameplay from of player to of player (player 5 means 1). A hand consists of about 60 of such parts.
The rules specify whether the next hand starts or the entire game ends when a hand ends. The chance player determines one parent from four players for the first hand, and one parent for each subsequent hand is specified by the rules. If the rule called tonpumatch is applied, then a player usually plays four to six hands in a gameplay, and the player usually plays one or more hands as a parent.
We now describe several important terms in mahjong that we use in this paper.

tenpai: When a hand (13 tiles) becomes a winning hand with one tile, the hand is called tenpai.

shantennumber: The minimum number of tiles that need to be exchanged to make the hand tenpai.
IiB Features of Mahjong
Mahjong’s gameplay consists of playing multiple hands in a row. The game situation before can be explained from only a small amount of shared information (points of four players etc.). Also, four behavior strategies and the shared information determine the expected value of the final ranking of each player. Since it is possible to represent the game situation before by and obtain a sufficient number of expert’s record, the final ranking in this game situation is easily predicted by regression. Therefore, it is reasonable to represent a hand as a truncated partial game. The game tree handles the end of the hand (i.e., the beginning of the next hand) as terminal nodes to which the expected values of the final rank are given. Similar methods of treating the entire game as a continuous truncated partial game are used in other games. For example, it is common to play one game of point match backgammon as an individual game based on the reward of the match equity table [3].
Let be the probability that event , i.e., player acquires rank for , occurs under the condition . Then the expected value of player ’s payoff at is given by
(1) 
Here, is the payoff of rank , which is defined by the rules of the tournament (normally, the higher the rank, the higher the payoff).
We roughly estimate the number of information sets (i.e., decision points) of a player in a truncated partial game of one hand by ignoring
, , and . There are about ways to distribute hands to a player by . The number of legal actions in is about ten. After that, the player can see about 30 kinds of tiles at each of the other three players. Then, the player can see about 30 kinds of tiles at . We call a partial gameplay from discard to next draw of the same player a turn. Since the number of turns to play one hand is about 20 at most, we obtain a rough estimate by(2) 
The exponent value is a little smaller than that of the Go state space [4].
A hand falls into five scenarios from player ’s point of view.

win: chooses an action in . Usually this is the most favorable scenario.

lose: Another player chooses against a tile discarded by . Usually this is the most unfavorable scenario.

other win: Another player chooses or against a tile discarded by another player different from . It is difficult to realize this scenario with ’s will.

tenpai washout: The hand ends due to a shortage of the draw pile when has a tenpai hand.

noten washout: The hand ends due to a shortage of the draw pile when does not have a tenpai hand.
We ignore other scenarios because they are rare. Choosing one of these scenarios according to the current game situation is one of the most important strategies for playing mahjong.
Iii Previous Research
Due to research over the past 20 years, AI has exceeded human ability in many twoplayer zerosum games with perfect information, e.g., backgammon [5], checkers [6], chess [7], shogi [8, 9], and Go [10]
. One of the techniques that has played a central role in the development of these AI players is heuristic search using the property that two players share symmetric information
[11]. However, heuristic search has not been powerful in games with three or more players and imperfect information. The reason for this is that it is difficult to construct a search tree that is easy to finish searching and effective for representing proper game situations.There are also interesting research results from twoplayer games with imperfect information. Counterfactual regret minimization (CFR) is a powerful technique based on selfplay for constructing a strong player of a game belonging to such a class [12]. In fact, Nash equilibrium of headsup limit Texas hold’em, which has about decision points for a player, was obtained using , a variant of CFR [13]. Moreover, an expertlevel AI player of headsup nolimit Texas hold’em, which has more than
decision points, has been developed using tree search with bet abstraction and deep learning of counterfactual values
[14]. In research other than on poker AI, an expertlevel AI of Scrabble has been developed using a selective move generator, simulations of likely game scenarios, and the heuristic search algorithm [15].Relatively few studies have been reported on multiplayer imperfectinformation games such as mahjong. Even in such games, one of the research objectives may also be to compute approximations to some of Nash equilibrium points. A case study on limit Texas hold’em with three players was conducted [16] in which an AI player based on CFR outperformed other AI players, although this method loses the theoretical guarantees of twoplayer zerosum games. However, applying CFR variants to other multiplayer games is not easy. Implementation of a mahjong player based on CFR is difficult because the size of the game tree is too large to search, and the abstraction for reducing the search space is unknown.
Another research objective in multiplayer imperfectinformation games is to construct an AI player by using heuristic methods, which are known to be effective in twoplayer perfectinformation games. There are AI players in multiplayer Texas hold’em. Poki, which is an AI player of Texas hold’em with multiple players, adopts a betting strategy based on heuristic evaluation of hand strength [17]. Commercial software called Snowie is considered to have the same strength as experts, but its algorithm is unpublished.
Besides poker games, an expertlevel AI player of Skat has been constructed based on heuristic search algorithms of perfectinformation games. The search algorithms have been used in the game using gamestate inference and static evaluation obtained by regression using game records [18]. It is interesting to build AI players based on such heuristic search algorithms in other games with multiplayers and imperfect information, but it is difficult to construct an effective search tree. In fact, it has been reported that an AI player of The Settlers of Catan applying MonteCarlo treesearch methods is not as strong as human players [19].
There has been research on AI players of mahjong. There is an opensource beginnerlevel player based on the MonteCarlo simulation called manue^{2}^{2}2Hiroshi Ichikawa https://github.com/gimite/mjaimanue. To model actions of opponent players statically, it uses inferred probabilities that an action in (sum of and ) by a player induces a win of another player. Bakuuchi is another player that carries out MonteCarlo simulations. Early Bakuuchi uses such probabilities with higher accuracy, Eq. (1), to evaluate each simulation at the end of the hand and simulation policies learned from game records [20]. In that study, they reported that point dependency on the policy is inappropriate and had reached only the intermediate level. Note that recent Bakuuchi, which is unpublished, has reached the advanced level. To the best of our knowledge, no tree has yet been discovered to search for better decisions.
Our method abstracts mahjong to construct effective search trees to appropriately deal with various game situations. Game abstraction is known as an effective means to reduce a huge search space of an extensiveform game with imperfect information [21]. For example, the effectiveness of information and action abstraction is shown in the aforementioned poker and patrolling security games [22].
Iv Contributions
The contribution of this paper are as follows.
(1) We define an abstraction of mahjong, Inclusive Policy Solitary Mahjong . is an MDP that is expected to be effective to evaluate a shortterm behavior strategies to compete on the most favorable scenario win. Three other players are replaced with a static environment, and the decisionmaking player goes through the process and ends with a win, lose, other win, tenpai washout, or noten washout scenario.
(2) We introduce several features in machine learning that are expected to be representative of a longterm behavior strategies of a hand and be useful for inferring state values. The features are computed using three other MDPs. Three other players are replaced with static environment, and the decisionmaking player goes through each process and ends with a few specific scenarios.
(3) We propose a method for constructing an AI player using (1) and (2). We present the experimental results of 3557 gameplays with the stateoftheart AI mahjong player, in which our AI player achieved significantly higher average rank We also present that our player makes each decision in a few seconds using a realistic computational resource.
V Outline of Proposed Method
We discuss action values of mahjong by separating the cases in which a hand ends immediately. Let us consider the first few actions from information set . Recall that most actions belong to three types, , , and .
We first consider action type . After such an action, , a hand ends without any action of the other players. When player at takes , the action value is
(3) 
We compute using Eq.(1), where
is inferred using a multiclass logistic regression model, as in a previous study
[20].Next, we consider action type . Such an action, , is accompanied by discarding a tile, and the hand also ends immediately if another player chooses against the tile. Let us assume that the other players determine actions according to static probability and treat them as if they are also the chance player. When at takes , we approximate the action value as
The probability and corresponding expected payoff can be inferred using orthodox machine learning methods because a hand immediately terminates if is followed by the of another player. We discuss these methods in Sec. VIC.
We then consider action type . When at takes such an action, , we approximate the action value as
(5) 
The value is the corresponding expected payoff.
After separating the cases of immediate ends of a hand, we need to compute and to estimate the action value at . Our method uses two models to compute these values. These models represent the game state , which can be determined from to the end of a hand, as tuple . Here, is ’s hand, is a tile obtained by most recently, is a type of state described below, and is the number of tiles discarded by since . We omit of below.
To set up the first model, in addition to using the state representation, we define inclusive policy oneplayer mahjong which is an MDP and takes into account as many scenarios as possible. This MDP requires comprehensive search and is designed to predict hands ending with a relatively small number of steps with high accuracy.
To set up the second model, in addition to using the state representation, we define several oneplayer mahjong games, which are different MDPs, and take into account different small subsets of all scenarios. The estimation of action values by one of these oneplayer mahjong games is not accurate because each subset is restricted. However, these oneplayer mahjong games are amenable to longterm computation and can be used to provide good features to predict the scenario of hands with a relatively large number of steps.
Vi Proposed Method
This chapter is organized as follows. In Sec. VIA, we define multiple MDPs as mahjong abstractions and formulate their actionvalue functions. Then we represent action values of the original game from these MDPs in Sec. VIB. In Sec. VIC, we describe methods of calculating input parameters of the MDPs. In Sec. VID, we describe an efficient search algorithm of the MDPs.
Via Abstraction to MDPs
Consider player at information set of a hand which is a truncated partial game of mahjong. We abstract the hand rooted at to an MDP in four ways. Here, is the agent who makes decisions, and decision making of the others are represented by transitions probabilities of states. This section defines four MDPs and formulas that approximately represent the expected value the final ranking of .
ViA1 Inclusive Policy Solitary Mahjong
MDP covers various scenarios from ’s point of view. Type of state in indicates one of the following:

: Player at of this type can choose to gain payoff only if and satisfy conditions used in the original game. If does not, then has to choose an action in to discard a tile from and .

: Player at of this type can choose to gain payoff or an action in only if and satisfy conditions used in the original game. If does not, then chooses .

: Player at of this type chooses either or . If chooses , gains payoff .
and correspond to ’s information sets following or of other players in the original game. Though an information set corresponds to does not exist in the original game, we introduce this type of states for simplification.
MDP terminates immediately if chooses either , , or . Otherwise, the chance player choose actions, which are categorized as follows.

: terminates at probability after ’s action of or , where is a tile discarded by the action and gains payoff . If does not terminate, the action number increases by one. Then terminates if , and gains payoff . Otherwise, the state transfers to a state of . ( is an input parameter of , which will be described in Sec.VIC)

: terminates at probability after choose , and gains payoff . Otherwise, the chance player choose an action of .

: The chance player chooses tile at probability and the state transfers to a state of .

: The chance player deals tile at probability after chooses at a state of , and the state transfers to a state of .
corresponds to that of the original game, while other types of branches correspond to the averaged actions of other players of the original game. and correspond to lose and other win scenarios in the original game, respectively. The flow of is schematically shown in Fig. 2.
When the type of state is , i.e., , the actionvalue function is as follows. For , where is legal, we have
(6) 
MDP terminates with this action. When action is in and tile is selected, we have
(7) 
Here, and is the hand after discarding from and .
When the type of state is , i.e., , and the action is , we have
where and is the set of all tile kinds.
When the type of state is , i.e., , and the action is , we have
(9) 
where . When action is in , we have
(10)  
where is the discarded tile. The formulas of payoff functions are outlined in Sec. VIC.
ViA2 Folding Solitary Mahjong ()
MDP covers two scenarios (lose and other win) to represent folding strategies of player . Folding is a behavior strategy in which abandons the most favorable scenario win and avoids the most unfavorable scenario lose of the current hand. Actions types and of the other players are ignored to simplify the game.
In , under the probability of lose and payoff of lose , is only allowed to discard from ’s hand. The action type of is only , the state type of is only , and the number of tiles in ’s hand decreases monotonically from the initial state because actions in are ignored. There are two types of branches due to the chance player as follows.

: terminates with probability of discarded just before, and gains payoff . If have discarded more than once from the initial state, this type is not selected.

: When does not terminate by , it terminates with constant probability (we tentatively set ) and gains payoff . Otherwise, the state transfers to .
MDP also terminates when discards all or tiles and gains payoff . For the sake of simplicity, we assume a natural condition holds for all . Under this assumption, if has tiles that have been discarded once or more, should always discard one of these tiles.
Let be ’s hand, be a tile kind to discard, and be the number of discards from the initial state. Actionvalue functions are formulated in terms of , which specifies a state in , and , which specifies an action in as
(11) 
Here, , and is a hand where is subtracted from .
The optimal policy is to discard the tiles in ascending order of , i.e.,
(12) 
where is the number of tiles of kind in ’s hand in the initial state, and the optimal value is given by
(13)  
Here, is in ascending order of Eq.(12), and is the number of tile kinds in ’s hand in the initial state. The optimal policy ends up with the scenario lose with the probability
(14)  
Let be the optimal value under the condition of lose termination, which is discussed in Sec. VIB. Eq.(13) can be transformed using as follows
(15)  
ViA3 Winning Solitary Mahjong () and Solitary Mahjong ()
MDPs and are specialized for representing win and tenpai strategies, respectively. Both are expected to have a smaller search space than . Terminal nodes that do not have direct relations to the purpose of win for or tenpai for are ignored. Specifically, terminal nodes related to , , and are ignored in both MDPs. Moreover, the payoff of washout in does not depend on , which we write as . Also, of is unable to take an action in . We omit formulas of action values, but they are derived by replacing zero with probabilities of those actions.
ViB Value Inference Using Multiple MDPs
In this section, we introduce two methods of inferring values of legal actions of the original game using multiple MDPs introduced in the previous section. The first method simply adopts the optimal value of to calculate the approximate values in Eqs. (LABEL:eq:valuedahai) and (5) as
(16) 
where is player ’s information set and is ’s hand after action .
The second method uses the results of value evaluations using , , and . Let be a set of hand scenarios . This method calculates the approximate values in Eqs. (LABEL:eq:valuedahai) and (5) as
(17) 
We calculate in Eq. (17) using the product of probabilities obtained by playing these MDPs starting from initial state . The relations between and these probabilities , , , and are
(18) 
These probabilities are inferred by logistic regression using features that are the results of value evaluations of these MDPs. To explain their features, let us introduce the following symbols: and are values from , where the former is a state value of and the latter is the probability that in this state finally chooses an action in ; is the probability that in of will have a tenpai hand when it terminates; and and are values from Eqs. (14) and (15), where the initial hand of is and is adjusted according to . The features used for the regressions are as follows.

:


The number of players declaring riich (riich is discussed in Sec. VIE).

. Here, runs over all players who is not and does not declaring riich.


:

The number of players declaring riich.

. Here, runs over all players who is not and does not declaring riich.


:


The number of players declaring riich.


:


The number of actions in has chosen since .

Here, is an inferred probability that player is tenpai at . This probability is modeled using logistic regression similar to that in a previous study [20], but the difference is that the model is fitted for each number of ’s past actions in and for each number of ’s past actions in since .
We calculate in Eq. (17) as
(19) 
We calculate and on the basis of mahjong rules and tenpai probabilities of the other players. These probabilities, which should be those when a hand ends strictly speaking, are inferred at .
ViC Parameters Used in MDPs
This section describes methods for determining parameters in the MDPs. Let the agent of these MDPs be player , and ’s current information set of the original mahjong be as before. The first parameter to be described is . Let be the maximum number of ’s future actions in until the current hand ends assuming that no player will choose actions in . We set to for . For the other MDPs, we set to . We determine ratio on the basis of logistic regression using the same features as those used for and label , where is the number of future actions in until the current hand ends. The training data (the pairs of features and a label) are sampled from information sets that did not end up with win of the corresponding player in the game records.
The next parameters to be described are those related to the lose scenario. These parameters, such as in Eq. (LABEL:eq:valuedahai), can be determined by , the probability that another player chooses when discards in and the hand ends immediately with game situation . Because ’s hand must be tenpai when chooses , the probability can be factorized as
(20) 
We infer the conditional probability in two different ways. When has chosen no action in since , it is inferred in such a way as to further factorize the probability and draw histograms from game records. When has chosen one or more actions in , it is inferred in such a way as to enumerate all possible tenpai hands for . When a player has chosen actions in twice, the number of possible tenpai hands is order of 100 thousands, and enumerating all of them does not significantly affect the total calculation time. When the number of such actions that player has chosen is one, it is not realistic to enumerate all tenpai hands. However, it is possible to enumerate the remaining seven tiles by ignoring one mentsu.
ViD Outline of Search Algorithm of MDPs
Our search algorithm to compute the expected final rank of a player at an information set has computational complexity proportional to the number of states of . Even ignoring actions in , there are about patterns of a player’s hand, and it is not realistic to search all states related to each hand. It is therefore desirable to reduce a sufficient number of states and actions of so that the search algorithm ends with a realistic computational resource and the error of expected final rank does not increase.
For the purpose of such reductions, we focus on states and actions related only to hands that can realize tenpai with a relatively small number of tile exchange. We construct a set of such hands by carrying out the following four steps: (1) consider a graph where a vertex represents a hand, an edge represents a tile exchange, and the graph takes into account all possible hands and tile exchanges, (2) enumerate paths with length or less connecting the current hand and a tenpai hand, (3) construct the set of hands by enumerating vertices along all the paths including two terminals (i.e., and a tenpai hand), and (4) construct the set of hands consisting of all hands satisfying the condition that in exists such that or fewer mentsus of are revealed by taking from .
Two integers and are parameters that control the size of the search space. Parameter must be greater than or equal to the shantennumber of because the space must have some tenpai hands. As and are larger, the final rank prediction is expected to be more accurate. In our experiment, we adjusted these parameters according to the shantennumber of so that AI player can make each decision in a few seconds with lightweight desktop computers. In this way, the size of is controlled to be about 50,000. The search algorithm ignores any action that realizes a hand not belonging to . Our search algorithm is based on retrograde analysis [23], where the state values are determined from states with larger .
ViE Dealing with Some Popular Rules
In this section, we describe how our AI player deals with some popular rules. Dora is a tile that increases the points of a hand if it is in the winning hand. The Dora tile is selected by a dora indicator tile, which is chosen by the chance player with . This choice is shared by all players. The payoff of win or lose is determined in accordance with the dora tiles.
Riich declaration is an action that can be chosen by a player who formed a tenpai hand without choosing an action in since . The player who declared riich is unable to change hands but is able to earn more points when he/she wins. We deal with riich declarations by adding hands after the declaration in and modifying the payoff of win if is the hand after the declaration. In addition, the folding tendency, i.e., other players tend to fold the hand when one declares riich, is reflected by modifying the values of and according to .
Vii Experiments
This section presents the results of gameplays visàvis existing AI players. We constructed the AI player with the proposed method as follows. When the shantennumber of the hand is zero or one, we use Eq. (16) to evaluate the values of legal actions. When the shantennumber of the hand is two or three, we use Eq. (17) to evaluate these values. In both cases, the player is greedy, i.e. the action with the highest value was selected. We tentatively set in Eq. (11). When the shantennumber of the hand is greater than three, we adopt a simple rulebased strategy. The rules used in this strategy basically determine whether to decrease shanten number to win or fold current hand. To decrease the shantennumber, the rules state to choose one of the isolated tiles to discard. To fold the hand, the rules state to choose a tile on the basis of value estimation using . The three AI players are one Bakuuchi and two copies of manue. The version of Bakuuchi is the one that achieved its highest grade and ratings (R2206) in tenhou^{3}^{3}3http://tenhou.net/, and is stronger than that published in a previous paper [20]. Table. I lists the result from 3557 gameplays of mahjong with the tonpu rule^{4}^{4}4This took several months using an ordinary desktop PC.. Because manue is clearly week, we pay attention to the difference between two ranks of our AI player and Bakuuchi for each gameplay, and observed that the mean and deviation of the difference are 0.0574 and 1.822, respectively. Given the sample size was , the sample mean was
, and the sample standard deviation was 1.822, the mean was positive with onetailed significance level
from the analysis using the standard error of the mean. This indicates that the performance of the AI player constructed with the proposed method reached the world heighest level.
1st  2nd  3rd  4th  Average Ranking  
Our AI player  0.33  0.28  0.21  0.17  2.23 0.04 
Bakuuchi  0.32  0.27  0.21  0.20  2.29 0.04 
manue  0.17  0.22  0.29  0.31  2.74 0.02 
Viii Conclusion
We proposed a method of building a stateoftheart AI mahjong player. With this method, multiple MDPs are introduced related to scenarios of a hand. When the shantennumber of the hand is less than two, MDP plays an essential role for estimating actions values in the original game. It takes into account as many scenarios as possible, and the analysis results are directly used for evaluation of actions in the original game. When the shantennumber of the hand is two or more, we use the results of , , and . These MDPs are focused on a few specific scenarios, and the analysis results are used as features for inferring state values. We reduced the number of MDP states to the extent that the expected finalrank error does not increase so that the calculation ends in a few seconds.
We presented the results of 3557 gameplays of mahjong with the AI player constructed with the proposed method and two current AI players, i.e., one version of Bakuuchi, the strongest player, and two versions of manue whose source code is published. The results indicate the effectiveness of the proposed method.
Acknowledgment
The authors would like to thank Naoki Mizukami for his fruitful comments and supports for experiments. This work was supported by JSPS KAKENHI Grant Numbers JP16K00503 and JP18H03347.
References
 [1] M. Kurita and K. Hoki, “Development of mahjong player on the basis of kyoku abstraction for multiple goals and value inference (in japanese),” in The 22nd Game Programming Workshop, vol. 2017, nov 2017, pp. 72–79.
 [2] S. D. Miller, Riichi Mahjong The Ultimate Guide to the Japanese Game Taking the World by Storm. Psionic Press, 2015.
 [3] K. Woolsey, How to Play Tournament Backgammon. The Gammon Press, 1993.
 [4] L. V. Allis, Searching for Solutions in Games and Artificial Intelligence. Ph.D. thesis, University of Limburg, 1994.
 [5] G. Tesauro, “Temporal difference learning and tdgammon,” Commun. ACM, vol. 38, no. 3, pp. 58–68, Mar. 1995.
 [6] J. Schaeffer, One Jump Ahead: Computer Perfection at Checkers (2nd Edition). Springer, 2008.
 [7] M. Campbell, A. Hoane, and F. hsiung Hsu, “Deep blue,” Artificial Intelligence, vol. 134, no. 1, pp. 57 – 83, 2002.
 [8] K. Hoki, D. Yokoyama, T. Obata, H. Yamashita, T. Kaneko, Y. Tsuruoka, and T. Ito, “Distributedshogisystem akara 2010 and its demonstration,” International Journal of Computer and Information Science, vol. 14, no. 2, pp. 55 – 63, 2013.

[9]
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through selfplay,”
Science, vol. 362, no. 6419, pp. 1140–1144, 2018. 
[10]
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,”
Nature, vol. 529, pp. 484–503, 2016.  [11] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall, 2009.
 [12] M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, “Regret minimization in games with incomplete information,” in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., 2008, pp. 1729–1736.
 [13] M. Bowling, N. Burch, M. Johanson, and O. Tammelin, “Headsup limit hold’em poker is solved,” Commun. ACM, vol. 60, no. 11, pp. 81–88, Oct. 2017.
 [14] M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling, “Deepstack: Expertlevel artificial intelligence in headsup nolimit poker,” Science, vol. 356, no. 6337, pp. 508–513, 2017.
 [15] B. Sheppard, “Worldchampionshipcaliber scrabble,” Artificial Intelligence, vol. 134, no. 1, pp. 241 – 275, 2002.
 [16] N. A. Risk and D. Szafron, “Using counterfactual regret minimization to create competitive multiplayer poker agents,” in Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1  Volume 1, ser. AAMAS ’10. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2010, pp. 159–166.
 [17] D. Billings, A. Davidson, J. Schaeffer, and D. Szafron, “The challenge of poker,” Artificial Intelligence, vol. 134, no. 1, pp. 201 – 240, 2002.
 [18] M. Buro, J. R. Long, T. Furtak, and N. Sturtevant, “Improving state evaluation, inference, and search in trickbased card games,” in Proceedings of the 21st International Jont Conference on Artifical Intelligence, ser. IJCAI’09. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2009, pp. 1407–1413.
 [19] I. Szita, G. Chaslot, and P. Spronck, “Montecarlo tree search in settlers of catan,” in Advances in Computer Games, H. J. van den Herik and P. Spronck, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 21–32.
 [20] N. Mizukami and Y. Tsuruoka, “Building a computer mahjong player based on monte carlo simulation and opponent models,” 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 275–283, 2015.
 [21] T. Sandholm, “Abstraction for solving large incompleteinformation games,” in Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence, ser. AAAI’15. AAAI Press, 2015, pp. 4127–4131.
 [22] N. Basilico and N. Gatti, “Automated abstractions for patrolling security games,” 2011.
 [23] J. Schaeffer, N. Burch, Y. Björnsson, A. Kishimoto, M. Müller, R. Lake, P. Lu, and S. Sutphen, “Checkers is solved,” Science, vol. 317, no. 5844, pp. 1518–1522, 2007.
Comments
There are no comments yet.