In all perfect information, discrete games, there exists a policy which takes in the current game state and returns the optimal move for that state for the current player, which will result in the highest utility under perfect play. Although neural networks can theoretically approximate any function, including this policy, doing so in practice is essentially impossible. The ancient board game “Go” is one such perfect information game which has been studied for years in computer science and has been one of the hardest turn-based perfect information games to get good performance in.
A game of Go consists of two players alternating placing stones on a 19 by 19 board, with the player with black stones starting. Any stones that are directly adjacent to each other are part of the same group. If any group is completely surrounded by opponent stones, it is taken off of the board, as shown in Figure 1. The goal of the game is to surround as much territory as possible.
While the rules of go are simple, mastering the game is not. Almost all commercial Go playing programs rely heavily on some form of tree search to explore outcomes of certain moves in a game, most commonly Monte Carlo Tree Search (MCTS) . However, these programs don’t seem to fully understand the game, instead relying on brute force search to make good moves.
In the past two years, many companies have tried to create their own Go playing programs, such as DeepZen, FineArt, and, most notably, DeepMind’s AlphaGo. Each of these programs uses a combination of convolutional neural networks and Monte Carlo Tree Search to obtain a level of play at or above top professionals . However, like commercial go players, these programs are exploring millions of states per second across many computers and countless GPUs, far surpassing the number of positions calculated out by the human player they are playing against.
2 Related Work
Computer Go, the creation of Go-playing agents for computers, has existed as early as 1968 . As mentioned previously, before convolutional neural networks became popular, MCTS was the most powerful method to play Go. These techniques generally required a lot of enhancements and optimizations. For example, MCTS Solver to detect forced as described by 
allowed Go bots to quickly solve games. Rapid Action Value Estimation used in allowed bots to identify duplicate game states, thereby significantly reducing the number of computations necessary. The problem with these techniques was that there were too many states to evaluate. Therefore, programs such as introduced in  and  were only truly competitive on boards while only capable of achieving moderate success on boards.
The field incrementally grew steadily in a similar fashion until 2015, when AlphaGo first beat Fan Hui, a professional player, making a huge skill jump that hadn’t yet been seen  in Computer Go. This led to a huge boom in convolutional neural network-based architectures such as DeepZenGo, FineArt, and of course AlphaGo . The inspiration for using convolutional neural networks comes from the fact that the state of a Go board can be treated as an image. While they rely on a policy network much like this paper describes, their strength comes from exploration of millions of board positions to gain an advantage over their human opponents.
Previous work has been done on using only convolutional neural network to play Go. They offered boosts over traditional MCTS but were not able to achieve the same level of play as AlphaGo. In 2008,  created a Convolutional Neural Network to play Go using an ensemble of networks. They were only able to achieve a then state-of-the-art 36.9% accuracy with a relatively small number of parameters ()., in 2014, improved this to 41 and 44% validation accuracy on different data sets. Concurrently,  devised a deeper and larger network that achieved 55% accuracy. The first two papers used supervised learning, tied weights, and a relatively shallow network, which resulted in accuracies much below what we and the creators of AlphaGo and  have observed.  had a model around the size of AlphaGo’s policy network (13 convolution layers), and had comparable results.
More recent approaches such as  have introduced fancier convolutional neural networks that rely on long term predictions for extended play.  achieved a slightly better accuracy of around 56%-57% and placed 3rd overall at the KGS Computer Go Competition.
It’s important to remember that deep convolutional neural networks are not just used to play Go. Many games such as Chess, Stratego, Hexagon and, more obviously, Atari games can be treated as images with labels being where to move next    . This shows the flexibility of deep convolutional neural networks as a tool to model many hard to play (and hard to understand) games.
3 Problem Statement
The goal of this project is to create human-level understanding of the game by creating a player which does no tree search whatsoever; to simply rely on understanding the board position and “intuition”, rather than brute force calculation. We plan to train a convolutional neural network which tries to approximate the previously discussed optimal policy function and as a result create a strong go player which is able to beat traditional MCTS-based programs.
The rankings in Go scale from 25 kyu (worst rank) to 1kyu (best kyu rank), and from 1 dan to 9 dan (best rank). The 1 dan rank is one rank above the 1 kyu rank. Hopefully, a rank of at least 1 dan can be achieved with our program, tested through websites such as the Online Go Server (OGS), as well as with in-person matches to ballpark its strength. Playing our player against existing go programs with known ranks can also be a good way of determining an approximate rank for our player.
4.1 Cross-Shaped Convolutions
In addition to rectangular convolution filters, we implemented a novel cross-shaped convolution filter as shown in Figure 2.
These convolutions were motivated by patterns in Go called “Ladders”. Ladders are a capturing concept in Go that only appear in low-amateur level play, with some exceptions , but are considered in all levels of play. They are never played out because they are disastrous for one of the players involved to continue. The ladder shape continues across the board in a diagonal fashion, as seen in Figure 3.
However, if a white stone is present, the ladder can be broken, allowing white to escape, as seen in Figure 4. The board position in all diagonal directions must be considered in a game of go, and thus we aimed to solve this problem by developing a diagonal-shaped convolution.
These cross convolutions are built inside traditional convolutions with the unnecessary weights zeroed out. To further describe cross convolutions, we first define the notion of the cross-width and cross-area. The cross-width, , of a cross convolution is the thickness of the line as it spans from one corner to another. The cross-area is defined as the set that contains all the coordinates covered by a block moving from one corner to the opposite corner. The cross-area contains the active regions of the convolution. The cross width must fall within the range . Anything beyond this range will create a zero or normal convolution respectively.
In practice, it’s easier and to zero out values based off of points that are not in because a block moving from one corner to another tends to create a lot of overlap. We realize this task is made even easier by the fact that the points not in simply form four triangles on each side. Our problem is therefore simplified down to creating a filter that zeros out the coordinates of the four triangles. Once the filter has been created, it is then used throughout the duration of the model to ensure only the relevant weights are utilized.
By creating these cross convolutions, we introduced the cross-width as another hyper-parameter that we can tune. Adjusting these hyper-parameters determines how our player visualizes the board with respect to the ladder structure. Previous non-MCTS based techniques usually represent ladders and other structures as feed-input features (and only these features), but we wish to focus on representing the general state of the board so our player can play effectively from beginning to end . However, we might consider analyzing patterns as  does and inserting another layer in a way that doesn’t conflict with our current inputs.
4.2 Cross Layer
We did not directly incorporate our cross convolution directly in the network as its own individual layer. Instead, inspired by Google’s Inception Module as shown in, we decided to incorporate our model in a modular “cross layer” as shown in Figure 10. Cross convolutions are generally very sparse. Therefore using cross convolutions in isolation would more destructive than constructive. As such, the cross layer introduces the cross convolution as a concatenation to our original filters. Therefore, we do not loose any meaningful data. Additionally, concatenating the pre-cross convolution input is very reminiscent of ResNet potentially helped in strengthening gradient flow .
Our final model consists of a 23-layer convolutional neural network, using a completely new architecture. Our first layer consists of alayers, as well as a 1-width cross convolution, as discussed below. The convolutions surrounding the cross convolutions serve to lower the number of parameters in the layer, a technique similar to the popular SqueezeNet architecture . The input is also fed directly into a squeeze and 5-width cross convolution layer. All outputs are then concatenated and sent to the next layer. Layers 5-11 are all convolutional filters with stride 1 and pad 1, each with 256 layers. We add another convolution and squeeze cross convolutions because there is some evaluation of board positions present in the data once we are through layer 11 that we would like to know for the situation of the ladder. Layers 16-22 are similarly all convolutional filters with stride 1 and pad 1, each with 256 layers. The final layer is a convolution with only 1 filter, used to flatten the baord position into a
We previously had a fully connected layer to evaluate final board positions. However, we discovered that the a significant portion of training was spent at this layer. Therefore, we instead took the results of the final layer as the scores for each move at the given point on the board. We select from the list of scores the highest legal move.
We began training our network by method of supervised learning, using the 9 million of the 10 million state-move pairs discussed in section 5: Dataset and Features. 1 million of the 10 million state-move pairs were separated off to be used as a test set. We trained the network using vanilla stochastic gradient descent using a learning rate of
, and decayed the learning rate by 0.5 every epoch. We ran the training using an NVIDIA 960. Training the network to its current state, over the course of four epochs, took five days.
We were unfortunately out of time by the end of the project to train our network using reinforcement learning. However, future work over this summer will likely be conducted to further strengthen the player in real-game scenarios as opposed to the mimicking strategy currently employed in the supervised learning network.
4.5 Interfaces and Evaluation
As part of the project, we developed a visual interface to be able to play against the player as well as use it as a tool to evaluate board positions. The interface, shown in Figure 7
, communicates with the network to both give it the rotational symmetries, as discussed in the next section, and any other board features that are necessary. Once the network finishes its forward pass, the interface iterates through the scores given by the player, removing any illegal moves, and plays the highest-scoring move. We also implemented an interface that shows the top 10 probabilities given to certain moves on the board, as shown in the results section. These implementations allowed us to visualize and understand what our player was “thinking” in any given board position, as well as play against it to evaluate its playing strength.
Because all board positions in Go play out the same way when flipped or rotated, our player employs an ensemble of eight passes through the neural network using all eight symmetries of the two-dimensional board. Because we did not employ weights that were locked to each other in a symmetric fashion, as in , this helped our player play in the early game more generally, as some games start in a different corner, but are still equivalent in terms of symmetry.
5 Dataset and Features
Our data set consists of 53,000 professional games from as early as the 11th century to 2017 in Smart Game Format (SGF) . We implemented parsers using sgfmill , a python library for simulating sgf games, into 10 million state-move pairs to be used in the training process.
Each state is represented as a vector. Each position on the board has 24 one-hot features, as shown in Table 1:
Number of Feature Layers Feature Layer Contents 1 Our Stone Present 1 Opponent Stone Present 1 Blank Space 1 Legal Move 4 Liberties After Move Is Played 8 Time Since Last Move 8 Number of Liberties
The first three layers simply indicate presence of one of our stones, an opponent stone, or a blank space on the board. The fourth layer simply shows the legality of a move to the network.
Features 5-24 follow a slightly different structure. To preserve the non-linearities between different numbers of liberties (See Figure 8 for an example of liberties), number of moves since, etc, we opted for a system where a feature would be “on” if a stone had one liberty, the next feature layer would be “on” if a stone had two liberties, and so on and so forth. Any stone with more than eight liberties would have the eighth feature layer “on”. A similar idea holds for number of liberties after playing on a certain position, which may well be our only “lookahead search”, as well as times since move was played, which simply turns on the feature for the point that was played one move ago, two moves ago, etc.
This feature structure is similar to what is used by AlphaGo .
We developed a method of visualizing features, as shown in Figure 9. These features, while showing no more than basic rules of the game to the player, such as liberties and legality of moves, allowed the player to look deeper into positions much in the same way that a professional player would.
Through supervised learning and our novel cross convolutional architecture, we were able to attain a classification accuracy of 47% on the test set, better than the standard set in . Given the massive amount of test data we had, we were able to achieve this result over 5 epochs (1 million iterations). We also tested the player on a popular go server OGS (Online Go Server), where the player was able to beat 10kyu amateur players. It was also able to beat the well-known bot GnuGo, an MCTS-based algorithm which has a skill level of about 11kyu.
We were even able to evaluate our player by seeing what moves it predicted in the recent Ke Jie vs. AlphaGo matches. Only a couple of the moves played by both players were outside the top 10 considered moves by our network, indicating the strong skill in Go it has acquired by only looking at the current board position.
We have successfully created a player that is capable of playing intuitively along. Amazingly, we have shown through experimentation that our player is also capable of understanding basic patterns, formations, and playing aggressively.
As mentioned previously, we were not able to finish implementing reinforcement learning. Therefore, finishing reinforcement learning will be our next milestone to improve our player. Future work also involves expanding the model so it plays more effectively, understands the game at a deeper level, and knows when to pass or resign.
We also plan to host our player on an online Go server such as OGS or KGS so we can gather more evaluations for our player.
-  Sensei’s Library: Go Databases. http://senseis.xmp.net/?GoDatabases.
-  Sensei’s Library: Lee Sedol - Hong Chang Sik - ladder game. http://senseis.xmp.net/?LeeSedolHongChangSikLadderGame.
-  C. Clark and A. Storkey. Teaching deep convolutional neural networks to play go, 2014.
-  M. Enzenberger, M. Muller, B. Arneson, and R. Segal. Fuego—an open-source framework for board games and go engine based on monte carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):259–270, 2010.
-  S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesvári, and O. Teytaud. The grand challenge of computer go: Monte carlo tree search and extensions. Communications of the ACM, 55(3):106–113, 2012.
-  T. Graepel, M. Goutrié, M. Krüger, and R. Herbrich. Learning on Graphs in the Game of Go, pages 347–352. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
-  R. Herbrich. Machine learning in industry. http://mlss.tuebingen.mpg.de/2015/slides/herbrich/herbrich.pdf. 68–87.
-  Z. Huang. Google’s alpha go now has a serious game-playing rival from tencent. https://qz.com/936654/googles-alpha-go-now-has-a-serious-game-playing-rival-with-tencents-jueyi-or-fineart/.
-  F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016.
-  C.-S. Lee, M.-H. Wang, G. Chaslot, J.-B. Hoock, A. Rimmel, O. Teytaud, S.-R. Tsai, S.-C. Hsu, and T.-P. Hong. The computational intelligence of mogo revealed in taiwan’s computer go tournaments. IEEE Transactions on Computational Intelligence and AI in games, 1(1):73–89, 2009.
-  J. Lewis. Playing super hexagon with convolutional neural networks (milestone).
-  C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in go using deep convolutional neural networks. CoRR, abs/1412.6564, 2014.
-  V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Playing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013.
-  B. Oshri and N. Khandwala. Predicting moves in chess using convolutional neural networks.
-  A. Rimmel, F. Teytaud, and O. Teytaud. Biasing Monte-Carlo Simulations through RAVE Values, pages 59–68. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
-  D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, January 2016.
-  S. Smith. Learning to play stratego with convolutional neural networks.
-  I. Sutskever and V. Nair. Mimicking Go Experts with Convolutional Neural Networks, pages 101–110. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In , June 2015.
-  Y. Tian and Y. Zhu. Better computer go player with neural network and long-term prediction. CoRR, abs/1511.06410, 2015.
-  M. H. Winands, Y. Björnsson, and J.-T. Saito. Monte-carlo tree search solver. In Proceedings of the 6th International Conference on Computers and Games, CG ’08, pages 25–36, Berlin, Heidelberg, 2008. Springer-Verlag.
-  M. Woodcraft. Sgfmill. https://github.com/mattheww/sgfmill, 2017.
-  A. Zobrist. Feature extraction and representation for pattern recognition and the game of go. 1970. 152.