1.1 Scrabble
Scrabble[wiki:scrabble]
is a board game, where two players alternate on forming words on a 15x15 board by placing tiles bearing a single letter onto a board. The tiles must form words which, in crossword fashion, read left to right in rows or downwards in columns, and be defined in a standard dictionary or lexicon. Also, at least one of the tiles must be placed next to an existing tile on the board. The board is also marked with “premium” squares, which multiply the number of points awarded for a given word.
The game begins with a total of 100 tiles, placed in a bag in which the letters are invisible. The set of tiles that a player holds is called the rack and each player starts off with drawing seven tiles each from the bag. The set of tiles left after a player has moved is called the rack leave and for every letter placed on the board a replacement is drawn from the bag. The game ends when one player is out of tiles and no tiles are left to draw, or both players pass. The interested reader can find more details about the rules of Scrabble here.
1.2 Why Scrabble?
Scrabble is a game of imperfect information, i.e. the current player is unaware about the rack of the opponent player, making it very hard to guess the opponent’s next move until the end of the game. Also, there is inherent randomness present in Scrabble as random letters are drawn from the bag to the current player’s rack at each round. The state space in Scrabble is also quite complex due to the tiles being marked with specific letters as opposed to being black and white. All these factors contribute to the difficulty of creating a perfect AI agent for Scrabble.
1.3 Outline
As a prerequisite to begin our exposition, we provide an overview of current stateoftheart Scrabble agents with an high level description of the techniques employed in these agents in Chapter 2. Chapter 3 provides a detailed discussion of our approach towards the improving the evaluation function using blackbox optimization methods. In Chapter 4, we shed light on our current framework involving supervised learning for approaching the problem. We conclude with a brief page of summary and directions for future work in Chapter 5.
2.1 Maven
Maven’s[sheppard2002world] game play is subdivided into 3 phases:

Midgame
: This phase lasts from the beginning of the game up until there are 9 or fewer tiles left in the bag. In this phase, all possible moves from the given rack are generated followed by sorting using some simple heuristics. The most promising moves among these moves are evaluated using 2ply MonteCarlo simulation where both players are assumed to draw tiles from the bag in each turn, and are assumed to play the best move based on the aforementioned heuristics. The move which is evaluated to be best after the simulations is played during the game.

Preendgame: This phase is almost similar to the midgame phase and is designed to attempt to yield a good endgame situation.

Endgame: During this phase, there are no tiles left in the bag and thus, Scrabble becomes a game of perfect information. Maven uses the search algorithm to analyze the game tree during the endgame phase.
2.2 Quackle
Quackle[katzbrown_o'laughlin] uses a similar approach as used by Maven. Quackle’s heuristic function^{1}^{1}1Henceforth, shall be referred to as the static evaluation function
used in the midgame phase is a sum of the move score and the value of the rack leave. This leave value is computed using a precomputed database: it favors both the chance of playing a Bingo on the next turn as well as leaves that garnered high scores when they appeared on a player’s rack in a large sample of Quackle versus Quackle games. The winpercentage of each move is estimated and the move with the best winpercentage is played. Figure
2.1 presents a high level flow chart of the algorithm used in Quackle.These are mainly two types of players present in the Quackle open source code which was used for our experiments:

Speedy Player: The player only uses only static evaluations throughout the game and no MonteCarlo simulations. The move predicted to be the best by the static evaluation function is played at a given board configuration.

Championship Player: This player uses simulations in addition to static evaluations along with a perfect endgame player. This player can be provided with a particular amount of thinktime^{2}^{2}2The winrate of second player when the first player is a “Twenty Second Championship Player” i.e. it has a thinktime of 20 seconds/move and the second player is a “Speedy Player” is calculated using 50000 games. to run the truncated MonteCarlo simulations for a single turn. For example, “Five Minute Championship Player” can take as long as five minutes to decide upon the move to play in a single turn.
Please note that the winrate of second player when both players are Speedy Players is , calculated using 50000 selfplay games.
3.1 Fitness Function
3.1.1 True Fitness Function
The true fitness function, for a given evaluation function is given by the winpercentage of the Scrabble agent using the given evaluation function against another agent (usually Speedy Player) using another fixed evaluation function. The winpercentage should be estimated using a large number of games, in order to keep the errorbars around the winpercentage small, leading to a reasonable estimate of the function .
Using Hoeffding’s inequality[wiki:hoeffding]
, the true mean of a Bernoulli random variable that has been sampled N times will lie, with probability at least 1 
, within the empirical mean . The winrate approximated using N selfplay games is estimated correctly with an error bound given by the above inequality. Using , and N = 5000 leads to a error bar of . Further, increasing N to 50000 games still leads to an error of . We can achieve slightly tighter error bounds based on KL Divergence[kaufmann2013information] given by equations 3.1 and 3.2.(3.1)  
(3.2) 
where denotes the true mean of the Bernoulli and N is the number of selfplay games. These equations also a error of for games for .
3.1.2 Other fitness functions
The idea of imitating the simulation agent (“Championship Player”) to improve the static evaluation function also seemed quite reasonable and in order to explore this idea more, we experimented with the fitness function , which utilized the ranking of the moves by the “FiveMinute Championship Player”. We generated 70000 board configurations, and for each board configuration b, we calculated a ranked list of size min moves out of all possible moves using the “Championship Player”. For each board configuration, a list of top min was generated using the given static evaluation function. The player was provided a score of where k is the index of the best move in among those 10 moves. The function is given by the sum of all such scores.
We also tested the fitness function given by the sum of final score differences for a “Speedy Player” over all games played between a fixed Quackle player and the “Speedy Player” which uses a static evaluation function. However, the preliminary results^{1}^{1}1 obtained a winrate of 46% as compared to the winrate of 44.5% obtained by using CMAES with 6 generations and a population size of 50. Please note that these results are correct to within as they were obtained using fitness function evaluation over 5000 games only . which we obtained by using were worse as compared to using and therefore, we didn’t use in any further experiments.
3.2 CmaEs
We used an evolutionary algorithm[spears1993overview], called CMAES (Covariance Matrix Adaptation Evolution Strategy)[hansen2016cma], to find the optimal weights for a featurebased evaluation function. An evolutionary algorithm is broadly based on the principle of biological evolution: In each generation, new individuals are generated by stochastic variation of the current parental individuals. Then, the top individuals out of are selected to become the parents in the next generation based on their fitness function value . In this manner, over the generation sequence, individuals with better and better values are generated. Figure 3.1 presents a high level overview of the CMAES algorithm.
For our experiments, we ran 46 generations of CMAES with and agents using the fitness function calculated through selfplay games between two “Speedy Players” as described in section 3.1.
3.2.1 Linear Evaluation Function
The linear static evaluation function scores a move based on some stateaction features
generated using the current board configuration and the move to be evaluated. The computed score is a dot product of the feature values with a weight vector. Note that the stateonly features are not used as they will be same for all the moves at a given state and thus would not affect the ranking using a linear function. The move with the highest value of the evaluation function is played. The feature set
consists of the following features:
Move Score: The score we obtain by playing the given move on the board.

Leave Value: A precomputed value for each rack leave generated by the Quackle code

Leave Playability: A value calculated for each rack leave which calculates the expected value of the move score that can be obtained after sampling letters from the bag to complete the rack and using that rack to form a word

Difference in consonants and vowels left on the rack

Number of blanks left on the rack after playing the move
The current static evaluation function in Quackle, only uses the top two features mentioned above i.e. the move score and leave value, with a weight of for each of the two features.
Figure 3.2 shows the results we obtained after running CMAES with abovementioned features. As shown in the upper left plot in this figure , the value of the  min() is increasing as the generation number increases except for the last generation. The final weights with the min() (or max()) results in a winpercentage of ^{2}^{2}2This is an improvement of 0.3% over calculated over 50000 games for the second player.
We also experimented with other features such as the length of the rack leave and features corresponding to the board configuration changes after playing a move. For example, when a player plays a move on the board, new openings (positions which can be used to play a valid move) are created on the board for the opponent player. Some of these openings can also utilize premium squares. We experimented with the feature corresponding to number of such openings for each kind of premium square, leading to a total of such features. However, these new features deteriorated the maximum fitness value obtained using CMAES and therefore, were skipped for further experiments. The fact that the above mentioned features didn’t lead to any improvement is surprising, and we believe that this was due to the limited representation ability of the linear evaluation function.
Using CMAES, , as described in section 3.1.2 also didn’t lead to improvement with the above mentioned features^{3}^{3}3Surprisingly, this experiment resulted in a winrate of 41.8% when was evaluated using 50000 games.. The baseline value for indicates that the best move generated by the “Championship Player” is expected to lie withing the top 5 moves predicted by .
3.2.2 NonLinear Evaluation function
Since a linear model has limited representation ability, we also experimented with evaluation functions represented by a neural network with 12 hidden layers and nonlinear activation functions such as tanh and ReLU
[nair2010rectified]. The neural network is only used to introduce the nonlinearity in a structured manner. Stateonly features pertaining only to the board such as differences of vowels and consonants on the board, number of blanks on board etc. were also used as input to the network, in addition to stateaction features mentioned in section 3.2.1. The results from this experiment were not encouraging and the given class of nonlinear functions were not able improve upon .3.3 Bayesian Optimization
Bayesian optimization[brochu2010tutorial] is a method of optimization of expensive cost functions without calculating derivatives. This kind of optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This allows an utilitybased selection of the next observation using an acquisition function[snoek2012practical] that determines what the next query point should be. The acquisition function takes into account both sampling from areas of high uncertainty as well as areas likely to offer improvement over the current best observation.
Since our fitness functions are noisy (as only finite number of games are used to compute them), the idea of applying Bayesian optimization to find the optimal static evaluation function looked promising. However, this experiment also didn’t result in a significant improvement over the current winrate of obtained by .
3.4 Multiple evaluation functions
Instead of using the same evaluation function for all stages^{4}^{4}4Refer to section 2.1 for detailed information about the different phases. of the game, it seemed plausible to us that a combination of evaluation functions could work better for different game stages in Scrabble. This idea has been previously employed successfully in the game of Robot Soccer[macalpine2012ut] and already been proposed in the context of Scrabble[romero2009] as well. Keeping the evaluation function weights fixed for the earlygame^{5}^{5}5Earlygame was defined as the phase of the game when the number of tiles in the bag were 80. and endgame phases, we only evolved the weights pertaining to the midgame where the midgame was defined to be the phase of game when the number of tiles in the bag were between 20 to 80. This experiment also proved to be futile and didn’t result in any improvement over the function .
4.1 Learning to Rank
We initially experimented with a neural network approximator as the static evaluation function with input as a move represented by the various stateaction features (including stateonly features as well) described in section 3.2.2. We trained this neural network using a fixed dataset of stateaction pairs with 70000 board configurations. For each board configuration , the Quackle “Five Minute Championship Player” provides a sorted list of size min{10, }, where is the list of all possible moves at configuration . The dataset D is further split into training and validation datasets and with a ratio of 97:3.
Figure 4.1 presents the Ranknet[burges2005learning] architecture for which invokes the network . For each board configuration in dataset , the network is provided with pair of moves (, ) where where L[i] denotes the element of the list . The output label,
for all such pairs. The binary crossentropy loss function used for training is given by
However, results from this experiment were not very encouraging. The function approximator was only able to achieve a slightly better winrate than . Further, the validation accuracy achieved using the ranking loss was close to the baseline accuracy() produced by . We believe that the limited representation ability of the features used for this experiment was the limiting factor in this experiment.
4.2 Raw Board Representation
The problem of designing good features for the game of Scrabble is quite important. The goodness of a board configuration depends both on spatial arrangement and on the presence or absence of sets of letters and ultimately needs consultation with the Scrabble dictionary as well. However, we think that combining these dimensions into a small number of discriminative signals is not an easy task. In order to tackle this problem, we decided to use the raw board representation of the Scrabble board as input to our evaluation function instead of using only a fixed number of handcoded features.
AlphaGo Zero[silver2017mastering] used the raw board representation as input to neural networks trained using selfplay and mastered the game of Go without using any human knowledge. Unlike Go, which only has white and black stones, Scrabble has 27 letters, each with a different value, and with combinations of letters having different a value from the sum of the constituents. This makes the Scrabble board a much harder proposition for a convolutional neural net(CNN[lecun2015deep])^{1}^{1}1Henceforth, convolutional neural net is abbreviated as CNN to model well. However, given the enormous success of AlphaGo Zero, we decided to further investigate the idea of using raw board representations for Scrabble.
A Scrabble board is encoded into a feature vector where the third dimension pertain to the different type of features used in our encoding of the Scrabble board. For a particular feature, each board position is a given a value. We used the following features for our encoding:

Whether a position is blank or not

Whether a position contains a particular alphabet, leading to a total of 26 features for the letters AZ

The score of the tile placed on a position. All positions not containing any tiles are given a score of zero except the premium scores which are given a score of 1.
The move at a given board configuration is provided as input using the concatenation of the feature vectors and (let’s call it input1) where is the configuration obtained after playing the given move on . In addition to input1, the two features used in (denoted by input2) also provided as input to our evaluation function. The neural network consists of many residual blocks[he2016deep]
of convolutional layers with batch normalization
[ioffe2015batch] and ReLU activations and is trained in a similar manner as described by figure 4.1.Specifically, consists of a single convolutional block followed by 2 residual blocks and a final block. The convolutional block applies the following modules sequentially to input1:

[nosep]

A convolution of 16 filters of kernel size
with stride 1

Batch normalization

A rectifier nonlinearity
Each residual block applies the following modules sequentially to its input:

[nosep]

A convolution of 16 filters of kernel size with stride 1

Batch normalization

A rectifier nonlinearity

A convolution of 16 filters of kernel size with stride 1

Batch normalization

A skip connection that adds the input to the block

A rectifier nonlinearity
The final block applies the following modules sequentially to its input:

[nosep]

A convolution of 1 filter of kernel size with stride 1

Batch normalization

A rectifier nonlinearity

A fully connected linear layer to a hidden layer of size 32

A rectifier nonlinearity

A fully connected linear layer to a hidden layer of size 4

A rectifier nonlinearity

A concatenation layer which combines the previous layer input and input2

A fully connected linear layer to a scalar
The number of weight parameters in were approximately 1/20 times the number of training example pairs used to train .
Figure 4.2 shows the learning curves for the network . This approach led to an validation accuracy of on which was not used for training. Given these initial results, this approach looks very promising and future work may involve exploring it further.
Comments
There are no comments yet.