
Bayesian Optimization in AlphaGo
During the development of AlphaGo, its many hyperparameters were tuned with Bayesian optimization multiple times. This automatic tuning process resulted in substantial improvements in playing strength. For example, prior to the match with Lee Sedol, we tuned the latest AlphaGo agent and this improved its winrate from 50 in the final match. Of course, since we tuned AlphaGo many times during its development cycle, the compounded contribution was even higher than this percentage. It is our hope that this brief case study will be of interest to Go fans, and also provide Bayesian optimization practitioners with some insights and inspiration.
12/17/2018 ∙ by Yutian Chen, et al. ∙ 128 ∙ shareread it

Playing Atari with Deep Reinforcement Learning
We present the first deep learning model to successfully learn control policies directly from highdimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Qlearning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
12/19/2013 ∙ by Volodymyr Mnih, et al. ∙ 0 ∙ shareread it

Fast NonParametric Tests of Relative Dependency and Similarity
We introduce two novel nonparametric statistical hypothesis tests. The first test, called the relative test of dependency, enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the HilbertSchmidt Independence Criterion (HSIC). The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD). To construct these tests, we have used as our test statistics the difference of HSIC statistics and of MMD statistics, respectively. The resulting tests are consistent and unbiased, and have favorable convergence properties. The effectiveness of the relative dependency test is demonstrated on several realworld problems: we identify languages groups from a multilingual parallel corpus, and we show that tumor location is more dependent on gene expression than chromosome imbalance. We also demonstrate the performance of the relative test of similarity over a broad selection of model comparisons problems in deep generative models.
11/17/2016 ∙ by Wacha Bounliphone, et al. ∙ 0 ∙ shareread it

A Test of Relative Similarity For Model Selection in Generative Models
Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches. Model selection in this generative setting can be challenging, however, particularly when likelihoods are not easily accessible. To address this issue, we introduce a statistical test of relative similarity, which is used to determine which of two models generates samples that are significantly closer to a realworld reference dataset of interest. We use as our test statistic the difference in maximum mean discrepancies (MMDs) between the reference dataset and each model dataset, and derive a powerful, lowvariance test based on the joint asymptotic distribution of the MMDs between each referencemodel pair. In experiments on deep generative models, including the variational autoencoder and generative moment matching network, the tests provide a meaningful ranking of model performance as a function of parameter and training settings.
11/14/2015 ∙ by Wacha Bounliphone, et al. ∙ 0 ∙ shareread it

Mastering Chess and Shogi by SelfPlay with a General Reinforcement Learning Algorithm
The game of chess is the most widelystudied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domainspecific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of selfplay. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a worldchampion program in each case.
12/05/2017 ∙ by David Silver, et al. ∙ 0 ∙ shareread it

Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and backup those evaluations to the root of a search tree. Among these algorithms, MonteCarlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimized to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to backup those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulationbased search inside a neural network, by expanding, evaluating and backingup a vector embedding. The parameters of the network are trained endtoend using gradientbased optimisation. When applied to small searches in the well known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.
02/13/2018 ∙ by Arthur Guez, et al. ∙ 0 ∙ shareread it
Ioannis Antonoglou
is this you? claim profile