I Introduction
Board games have always attracted attention in AI due to they clear rules, mathematical elegance and simplicity. Since the early works of Claude Shannon on Chess [1] and Arthur Samuel on Checkers [2], a lot of research have been conducted in the area of board games towards finding either perfect players (Connect4, [3]), or stronger than human players (Othello, [4]
). The bottom line is that board games still constitute valuable testbeds for improving general artificial and computational intelligence game playing methods such as reinforcement learning, Monte Carlo tree search, branch and bound, and (co)evolutionary algorithms.
Most of these techniques employ a position evaluation function to quantify the value of a given game state. In the context of Othello, one of the most successful position evaluation functions is tabular value function [5] or ntuple network [6]. It consists of a number of tuples, each associated with a look up table, which maps contents of board fields into a real value. The effectiveness of ntuple network highly depends on the placement of ntuples [7]. Typically, ntuples architectures consist of a small number of long, randomly generated, snakeshaped ntuples [8, 7, 9].
Despite the importance of network architecture, to the best of our knowledge no study exist that studies and evaluates different ways of placing ntuples on the board.
In this paper, we propose an ntuple network architecture consisting of a large number of short, straight ntuples, generated in a systematic way. In the extensive computational experiments, we show that for learning position evaluation for Othello, such an architecture is significantly more effective than the one involving randomly generated ntuples. We also investigate how the length of ntuples affects the learning results. Finally, the performance of the best evolved ntuple network is evaluated in the online Othello League.
Ii Methods
Iia Othello
Othello (a.k.a. Reversi) is a two player, deterministic, perfect information strategic game played on an board. There are pieces being black on one side and white on the other. The game starts with two white and two black pieces forming an askew cross in the center on the board. The players take turns putting one piece on the board with their color facing up. A legal move consists in placing a piece on a field so that it forms a vertical, horizontal, or diagonal line with another player’s piece, with a continuous, nonempty sequence of opponent’s pieces in between (see Fig. 1), which are reversed after the piece is placed. Player passes if and only if it cannot make a legal move. The game ends when both players passed consecutively. Then, the player having more pieces with their color facing up wins.
IiB Position Evaluation Functions
In this paper, our goal is not to design stateoftheart Othello players, but to evaluate position evaluation functions. That is why our players are simple state evaluators in a ply setup: given the current state of the board, a player generates all legal moves and applies the position evaluation function to the resulting states. The state gauged as the most desirable determines the move to be played. Ties are resolved at random.
The simplest position evaluation function is positionweighted piece counter (WPC), which is a linear weighted board function. It assigns a weight to a board location and uses scalar product to calculate the utility of a board state :
where is in the case of an empty location, if a black piece is present or in the case of a white piece.
A WPC player often used in Othello research as an expert opponent [18, 6, 19, 13, 16, 7]
is Standard WPC Heuristic Player (
swh). Its weights, handcrafted by Yoshioka et al. [20], are presented in Table I.IiC Othello Position Evaluation Function League
WPC is only one of the possible position evaluation functions. Others popular ones include neural networks and ntuple networks. To allow direct comparison between various position evaluation functions and algorithms capable of learning their parameters, Lucas and Runarsson
[18] have appointed the Othello Position Evaluation Function League ^{1}^{1}1http://algoval.essex.ac.uk:8080/othello/League.jsp. Othello League, for short, is an online ranking of Othello 1ply state evaluator players. The players submitted to the league are evaluated against SWH (the Standard WPC Heuristic Player).Both the game itself and the players are deterministic (with an exception of the rare situation when at least two positions have the same evaluation value). Therefore, to provide more continuous performance measure, Othello League introduces some randomization to Othello. Both players are forced to make random moves with the probability of
. As a consequence the players no longer play (deterministic) Othello, but stochastic Othello. However, it was argued that the ability to play Othello is highly correlated with the ability to play Othello [18].The performance in Othello League is determined by the number of wins against SWH player in Othello in double games, each consisting of two single games played once white and once black. To aggregate the performance into a scalar value, we assume that a win counts as point, while a draw points. The average score obtained in this way against SWH constitutes the Othello League performance measure, which we incorporate in this paper.
IiD Ntuple Network
The best performing evaluation function in the Othello League is ntuple network [16]. Ntuple networks have been first applied to optical character recognition problem by Bledsoe and Browning [21]. For games, it have been used first by Buro under the name of tabular value functions [5], and later popularized by Lucas [6]. According to Szubert et al. their main advantages of ntuple networks “include conceptual simplicity, speed of operation, and capability of realizing nonlinear mappings to spaces of higher dimensionality” [7].
Ntuple network consists of tuples, where is tuple’s size. For a given board position , it returns the sum of values returned by the individual ntuples. The th tuple, for , consists of a predetermined sequence of board locations , and a look up table . The latter contains values for each board pattern that can be observed on the sequence of board locations. Thus, ntuple network is a function
Among possible ways to map the sequence to an index in the look up table, the following one is arguably convenient and computationally efficient:
where is a constant denoting the number of possible values on a single board square, and is the sequence of board values (the observed pattern) such that for . In the case of Othello, , and white, empty, and black squares are encoded as , , and , respectively.
The effectiveness of ntuple networks is improved by using symmetric sampling, which exploits the inherent symmetries of the Othello board [11]. In symmetric sampling, a single ntuple is employed eight times, returning one value for each possible board rotation and reflection. See Fig. 2 for an illustration.
IiE Ntuple Network Architecture
Due to the spatial nature of game boards, ntuples are usually consecutive snakeshaped sequences of locations, although this is not a formal requirement. If each ntuple in a network is of the same size, we denote it as tuple network, having weights. Apart from choosing and , an important design issue of ntuples network architecture is the location of individual ntuples on the board [7].
IiE1 Random Snakeshaped Ntuple Network
Thus it is surprising that so many investigations in game strategy learning have involved randomly generated snakeshaped ntuple networks. Lucas [6] generated individual ntuples by starting from a random board location, then taking a random walk of steps in any of the eight orthogonal or diagonal directions. The repeated locations were ignored, thus the resulting ntuples were from to squares long. The same method Krawiec and Szubert used for generating , and tuple networks [22, 7], and Thill et al. [23] for generating tuple networks playing Connect Four.
An tuple network generated in this way we will denote as rand (see Fig. (a)a for an example).
IiE2 Systematic Straight Ntuple Network
Alternatively, we propose a deterministic method of constructing ntuple networks. Our systematic straight ntuple networks consist of all possible vertical, horizontal or diagonal ntuples placed on the board. Its smallest representative is a network of tuples. Thanks to symmetric sampling, only of them is required for an Othello board, and such tuple network, which we denote as all contains weights. all network containing tuples is shown in Fig. (b)b. Table II shows the number of weights in selected architectures of rand* and all* networks.
IiE3 Other Approaches
Logistello [4], computer player, which beat the human Othello world champion in 1997, used tuples of , handcrafted by an expert. External knowledge has also been used by Manning [8], who, generated a diverse
tuple network using random inputs method from Breiman’s Random Forests basing on a set of
labeled random games.IiF Learning to Play Both Sides
When a single player defined by its evaluation function is meant to play both as black and white, it must interpret the result of the evaluation function complementary depending on the color it plays. There are three methods serving this purpose.
The first one is doubled function (e.g., [23]), which simply employs two separate functions: one for playing white and the other for playing black. It allows to fully separate the strategy for white and black players. However, its disadvantage consists in that two times more weights must be learned, and the experience learned when playing as black does not used when playing as white and vice versa.
Output negation and board inversion (e.g., [9]) are alternatives to doubled function. They use only single set of weights, reducing the search space and allowing to transfer the experience between the white and black player. When using output negation, black selects the move leading to a position with the maximal value of the evaluation function whereas white selects the move leading to a position with the minimal value.
If a player uses board inversion it learns only to play black. As the best black move it selects the one leading to the position with the maximum value. If it has to play white, it temporarily flips all the pieces on the board, so it can interpret the board as if it played black. Then it selects the best ‘black’ move, flips all the pieces back, and plays the white piece in the selected location.
The swh player uses output negation.
Iii Experiments and Results
architecture  weights  architecture  weights 

all ()  rand  
all ()  rand  
all ()  rand 
Iiia Common Settings
IiiA1 Evolutionary Setup
In order to compare different ntuple network architectures, we performed several computational experiments. In each of them the weights of ntuple networks have been learned by evolution strategy [24] for generations. The weights of individuals in the initial population were drawn from the interval. Evolution strategy used Gaussian mutation with
. The individual’s fitness was calculated using the Othello League performance measure estimated over
double games (cf. IIC).In total, games were played in each evolutionary run. This makes our experiments exceptionally large compared to the previous studies. For example, in a recent study concerning ntuple networks [7] games were played. Also, despite using the much simpler WPC representation, Samothrakis et al. [16] performed games per run.
Such extensive experiment was possible due to efficient ntuple network and Othello implementation in Java, which is capable of running about games per second on a single CPU core. Thanks to it, we were able to finish one evolutionary run in hours on a core Intel(R) Core(TM) i72600 CPU @GHz.
IiiA2 Performance Evaluation
We repeated each evolutionary times. Every generations, we measured the (Othello League) performance of the fittest individual in the population using double games. The performance of the fittest individual from the last generation is identified with method’s performance. Since, the sample size is only per method, for statistical analysis of the following experiments, we used nonparametric Wilcoxon rank sum test (a.k.a. the MannWhitney U test) with the significance level and Holm’s correction when comparing more than two methods at once.
IiiB Preliminary: Board Inversion vs. Output Negation
Figure 4 presents the results of learning with board inversion against output negation for representatives of both types of ntuple networks architectures: rand having , and all with weights.
The figure shows that board inversion surpasses output negation regardless of the player architecture, which confirms a previous study of the two methods for preference learning [9]. The differences between the methods are statistically significant (see also the detailed results in Table IV).
Moreover, visual inspection of the violin plots reveals that board inversion leads to more robust learning, since the variance of performances is lower. Therefore, in the following experiments we employ exclusively board inversion.
IiiC All Short Straight vs. Random Long Snakeshaped Ntuples
In the main experiment, we compare ntuple networks consisting of all possible short straight ntuples (all, all, and all) with long random snakeshaped ones (rand, rand and rand). We chosen the number of ntuples and size of them to make the number of weights in of corresponding architectures are equal, or, if impossible, similar (see Table II).
The results of the experiment are shown in Figure 5 as violin plots. Statistical analysis of three pairs having equal or similar number of weights reveals that:

all is better than rand,

all is better than rand, and

all is better than rand.
Let us notice that the differences in performance are substantial: for the pair all vs. rand, where the difference in performance is the lowest, the best result obtained by rand is still lower than the worst result obtained by all (see Table IV for details).
All* architectures are also more robust, due to lower variances than rand* architectures (cf. Fig. 5). This is because the variance of rand* architectures is attributed to both its random initialization and nondeterministic learning process, while the variance of all* is only due to the latter.
IiiD 2tuples are Long Enough
Intuitively, longer ntuples should lead to higher network’s performance, since they can ‘react’ to patterns that the shorter ones cannot. However, the results presented in Fig. 5 show no evidence that this is a case. Despite having two times more weights, all does not provide better performance than all (no statistical difference). Furthermore, all is significantly worse than both than all and all.
Figure 6 shows the pace of learning for each of six analyzed architectures. It plots methods’ performance as a function of computational effort, which is proportional to the number of generations.
The figure suggests that all is not only the best (together with all) in the long run, but it is also the method that learns the quickest. all catches up all eventually, but it does not seem to be able to surpass it. all learns even slower than all. Although the gap between all and all decreases over time, it is still noticeable after generations.
Thus, our results suggest that for Othello, all with just weights, the smallest among the six considered ntuple network architectures, is also the best one.
IiiE Othello League Results
The best player obtained in this research consists of all tuples; its performance is with confidence delta of . This result is significantly higher than the best results reported to this date in the Othello League (see III). Notice also how small it is (in terms of the number of weights) compared to other players in the league. Unfortunately, the online Othello League accepts only players employing output negation; it does not allow for board inversion. Thus, our player could not be submitted to the Othello League.
date  player name  encoding  weights  performance 

n/a  allinv  ntuple network  288  0.9592 
20130917  wj123tuples  ntuple network  966  0.9149 
20110130  epTDLmpx_12x6 [7]  ntuple network  
20110128  prb_nt15_001  ntuple network  
20110125  epTDLxover [7]  ntuple network  
20080503  t15x6x8  ntuple network  
20080503  x30x6x8  ntuple network  
20080328  Stunner  ntuple network  
20070914  MLP(…)312ties0.FF  neural network 
To be accepted in the Othello League, we performed some experiments also with output negation. The best output negation player we were able to evolve was submitted under the name of wj123tuples. It consists of all straight 1, 2, and 3tuples, thus having weights in total.
wj123tuples took the lead in the league and is the first player exceeding the performance of . It obtained in the league, but this result should be taken with care, since to evaluate player’s performance Othello League plays just games. We estimate its performance to basing on double games.
We suspect that the performance of ca. against Standard WPC Heuristic player that all and all converge to, cannot be significantly improved at ply. random moves using in all games leads to the situation when even a perfectplaying player cannot guarantee not losing a game.
Despite the first place obtained in the Othello League, the evolved player is not good in ‘general’, against a variety of opponents, because is was evolved specifically to play against the Standard WPC Heuristic player. When evaluated against random WPC players (the expected utility measure [14, 25]), the best all player obtains a score of only . This is not much, since with considerably less computational effort that used in this paper, it is possible to evolve an ntuple player scoring [26, 7]. However, our goal here was not to design good players in general, but to compare different position evaluation functions.
The best all player evolved in this paper is printed in Fig. 7.
Iv Discussion: the more weights, the worse for evolution?
We have shown that among all* methods, the more weights the worse results; the same applies to rand* methods (see Fig. 5). This finding confirms the one of Szubert et al. [7], who found out that among the networks of rand ( weights), rand ( weights), and rand ( weights), it is the latter that allows (co)evolutionary algorithm for obtaining best results. The authors stated that this effect it due to the higher dimensionality of the search space, for which “the weight mutation operator is not sufficiently efficient to elaborate fast progress”.
Although we do not challenge this claim, our results suggest that the number of weights in a network is not the only performance factor. all has weights, thus, the dimensionality of its search space is considerably higher than the one for rand and rand, which have and weights, respectively. Nonetheless, among these three architectures, it is the all network that obtains the highest performance (see Fig. 5). Therefore, the second performance factor in learning an ntuple network is its (proper or not) architecture.
Finally, let us notice that an alternative to a fixed ntuple network architecture is a selfadaptive one, which can change in response to variation operators [7], such as mutation or crossover. Although such architecture is, in principle, more flexible, it adds another dimension to the search space, making the learning problem even harder.
V Conclusions
In this paper, we have analyzed ntuple network architectures for position evaluation function in board games. We have shown that a network consisting of all possible, systematically generated, short ntuples leads to a significantly better play than long random snakeshaped tuples originally used by Lucas [11]. With a simple network consisting of all possible straight tuples (with just weights) we were able to beat the best result in the online Othello League (having usually many times more weights).
Moreover, our results suggest that tuples longer than give no advantage, causing slower learning rate, at the same time. This is surprising, since capturing opponent’s pieces in Othello requires a line of at least three pieces (e.g. white, black, white).
Let us emphasize that our result has been obtained in an intensive computational experiment involving generations, an order of magnitude more than other studies in this domain. Nevertheless, it remains to be seen whether they hold for different experimental settings. We used evolution against an expert player in ply Othello. The interesting questions are: i) whether our systematic short tuple network is also advantageous for reinforcement learning, such as temporal difference learning, and ii) whether such networks are also profitable for other board games, e.g. Connect Four.
Acknowledgment
This work has been supported by the Polish National Science Centre grant no. DEC2013/09/D/ST6/03932. The computations have been performed in Poznań Supercomputing and Networking Center. The author would like to thank Marcin Szubert for his helpful remarks on an earlier version of this article.
mean  median  

all2inv  
all3inv  
all4inv  
rand10x3inv  
rand8x4inv  
rand7x5inv  
rand8x4neg  
all1inv  
all1neg 
References
 [1] C. E. Shannon, “XXII. Programming a computer for playing chess,” Philosophical magazine, vol. 41, no. 314, pp. 256–275, 1950.

[2]
A. L. Samuel, “Some studies in machine learning using the game of checkers,”
IBM Journal of Research and Development, vol. 3, no. 3, pp. 211–229, 1959.  [3] L. V. Allis, A knowledgebased approach of connectfour. Vrije Universiteit, Subfaculteit Wiskunde en Informatica, 1988.
 [4] M. Buro, “Experiments with MultiProbCut and a new highquality evaluation function for Othello,” Games in AI Research, pp. 77–96, 2000.
 [5] ——, “An evaluation function for othello based on statistics,” NEC, Princeton, NJ, NECI 31, Tech. Rep., 1997
 [6] S. Lucas, “Learning to play Othello with ntuple systems,” Australian Journal of Intelligent Information …, no. 4, pp. 1–20, 2008
 [7] M. Szubert, W. Jaśkowski, and K. Krawiec, “On Scalability, Generalization, and Hybridization of Coevolutionary Learning: a Case Study for Othello,” IEEE Transactions on Computational Intelligence and AI in Games, 2013.
 [8] E. P. Manning and A. Othello, “Using ResourceLimited Nash Memory to Improve an Othello Evaluation Function,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 1, pp. 40–53, 2010.
 [9] T. Runarsson and S. Lucas, “Preference Learning for Move Prediction and Evaluation Function Approximation in Othello,” Computational Intelligence and AI in Games, IEEE Transactions on, 2014.

[10]
V. L. Allis, “Searching for solutions in games and artificial intelligence,” Ph.D. dissertation, University of Limburg, Maastricht, The Netherlands, 1994.
 [11] S. M. Lucas, “Learning to play Othello with Ntuple systems,” Australian Journal of Intelligent Information Processing Systems, Special Issue on Game Technology, vol. 9, no. 4, pp. 01–20, 2007.
 [12] Y. Osaki, K. Shibahara, Y. Tajima, and Y. Kotani, “An Othello evaluation function based on Temporal Difference Learning using probability of winning,” 2008 IEEE Symposium On Computational Intelligence and Games, pp. 205–211, Dec. 2008
 [13] E. P. Manning, “Using resourcelimited nash memory to improve an othello evaluation function,” Computational Intelligence and AI in Games, IEEE Transactions on, vol. 2, no. 1, pp. 40 –53, march 2010.

[14]
S. Y. Chong, P. Tino, D. C. Ku, and Y. Xin, “Improving Generalization
Performance in CoEvolutionary Learning,”
IEEE Transactions on Evolutionary Computation
, vol. 16, no. 1, pp. 70–85, 2012  [15] S. van den Dries and M. A. Wiering, “NeuralFitted TDLeaf Learning for Playing Othello With Structured Neural Networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 11, pp. 1701–1713, Nov. 2012
 [16] S. Samothrakis, S. Lucas, T. Runarsson, and D. Robles, “Coevolving GamePlaying Agents: Measuring Performance and Intransitivities,” IEEE Transactions on Evolutionary Computation, no. 99, pp. 1–15, 2012
 [17] W. Jaśkowski, M. Szubert, and P. Liskowski, “Multicriteria comparison of coevolution and temporal difference learning on othello,” in EvoGames, ser. Lectures Notes in Computer Science, 2014.
 [18] S. M. Lucas and T. P. Runarsson, “Temporal difference learning versus coevolution for acquiring othello position evaluation,” in IEEE Symposium on Computational Intelligence and Games. IEEE, 2006, pp. 52–59.
 [19] M. Szubert, W. Jaśkowski, and K. Krawiec, “Coevolutionary Temporal Difference Learning for Othello,” in IEEE Symposium on Computational Intelligence and Games, 2009, Conference proceedings (article), pp. 104–111
 [20] T. Yoshioka, S. Ishii, and M. Ito, “Strategy acquisition for the game,” Strategy Acquisition for the Game "Othello" Based on Reinforcement Learning, vol. 82, no. 12, pp. 1618–1626, 1999.

[21]
W. W. Bledsoe and I. Browning, “Pattern recognition and reading by machine,” in
Proc. Eastern Joint Comput. Conf., 1959, pp. 225–232.  [22] K. Krawiec and M. Szubert, “Learning ntuple networks for othello by coevolutionary gradient search,” in GECCO 2011 Proceedings, N. K. et al, Ed., ACM. ACM, 2011, pp. 355–362.
 [23] M. Thill, P. Koch, and W. Konen, “Reinforcement Learning with Ntuples on the Game Connect4,” in Parallel Problem Solving from Nature  PPSN XII, ser. Lecture Notes in Computer Science, C. A. C. Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, and M. Pavone, Eds., vol. 7491. Springer, 2012, pp. 184–194.
 [24] H.G. Beyer and H.P. Schwefel, “Evolution strategies–a comprehensive introduction,” Natural computing, vol. 1, no. 1, pp. 3–52, 2002.
 [25] W. Jaśkowski, P. Liskowski, M. Szubert, and K. Krawiec, “Improving coevolution by random sampling,” in GECCO’13: Proceedings of the 15th annual conference on Genetic and Evolutionary Computation, C. Blum, Ed. Amsterdam, The Netherlands: ACM, July 2013, pp. 1141–1148.
 [26] P. Liskowski, “CoEvolution Versus Evolution with Random Sampling for Acquiring Othello Position Evaluation,” Ph.D. dissertation, Poznan University of Technology, 2012.
Comments
There are no comments yet.