
Stable Opponent Shaping in Differentiable Games
A growing number of learning methods are actually games which optimise multiple, interdependent objectives in parallel  from GANs and intrinsic curiosity to multiagent RL. Opponent shaping is a powerful approach to improve learning dynamics in such games, accounting for the fact that the 'environment' includes agents adapting to one another's updates. Learning with OpponentLearning Awareness (LOLA) is a recent algorithm which exploits this dynamic response and encourages cooperation in settings like the Iterated Prisoner's Dilemma. Although experimentally successful, we show that LOLA can exhibit 'arrogant' behaviour directly at odds with convergence. In fact, remarkably few algorithms have theoretical guarantees applying across all differentiable games. In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead. We prove that LookAhead locally converges and avoids strict saddles in all differentiable games, the strongest results in the field so far. SOS inherits these desirable guarantees, while also shaping the learning of opponents and consistently either matching or outperforming LOLA experimentally.
11/20/2018 ∙ by Alistair Letcher, et al. ∙ 74 ∙ shareread it

Openended Learning in Symmetric Zerosum Games
Zerosum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then selfplay generates sequences of agents of increasing strength. However, nontransitive games, such as rockpaperscissors, can exhibit strategic cycles, and there is no longer a clear objective  we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zerosum games, in order to construct adaptive sequences of objectives that yield openended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses gametheoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.
01/23/2019 ∙ by David Balduzzi, et al. ∙ 16 ∙ shareread it

StronglyTyped Recurrent Neural Networks
Recurrent neural networks are increasing popular models for sequential learning. Unfortunately, although the most effective RNN architectures are perhaps excessively complicated, extensive searches have not found simpler alternatives. This paper imports ideas from physics and functional programming into RNN design to provide guiding principles. From physics, we introduce type constraints, analogous to the constraints that forbids adding meters to seconds. From functional programming, we require that stronglytyped architectures factorize into stateless learnware and statedependent firmware, reducing the impact of sideeffects. The features learned by stronglytyped nets have a simple semantic interpretation via dynamic averagepooling on onedimensional convolutions. We also show that stronglytyped gradients are better behaved than in classical architectures, and characterize the representational power of stronglytyped nets. Finally, experiments show that, despite being more constrained, stronglytyped architectures achieve lower training and comparable generalization error to classical architectures.
02/06/2016 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

Deep Online Convex Optimization by Putting Forecaster to Sleep
Methods from convex optimization such as accelerated gradient descent are widely used as building blocks for deep learning algorithms. However, the reasons for their empirical success are unclear, since neural networks are not convex and standard guarantees do not apply. This paper develops the first rigorous link between online convex optimization and error backpropagation on convolutional networks. The first step is to introduce circadian games, a mild generalization of convex games with similar convergence properties. The main result is that error backpropagation on a convolutional network is equivalent to playing out a circadian game. It follows immediately that the wakingregret of players in the game (the units in the neural network) controls the overall rate of convergence of the network. Finally, we explore some implications of the results: (i) we describe the representations learned by a neural network gametheoretically, (ii) propose a learning setting at the level of individual units that can be plugged into deep architectures, and (iii) propose a new approach to adaptive model selection by applying bandit algorithms to choose which players to wake on each round.
09/06/2015 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

StronglyTyped Agents are Guaranteed to Interact Safely
As artificial agents proliferate, it is becoming increasingly important to ensure that their interactions with one another are wellbehaved. In this paper, we formalize a commonsense notion of when algorithms are wellbehaved: an algorithm is safe if it does no harm. Motivated by recent progress in deep learning, we focus on the specific case where agents update their actions according to gradient descent. The first result is that gradient descent converges to a Nash equilibrium in safe games. The paper provides sufficient conditions that guarantee safe interactions. The main contribution is to define stronglytyped agents and show they are guaranteed to interact safely. A series of examples show that strongtyping generalizes certain key features of convexity and is closely related to blind source separation. The analysis introduce a new perspective on classical multilinear games based on tensor decomposition.
02/24/2017 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

The Shattered Gradients Problem: If resnets are the answer, then what is the question?
A longstanding obstacle to progress in deep learning is the problem of vanishing and exploding gradients. The problem has largely been overcome through the introduction of carefully constructed initializations and batch normalization. Nevertheless, architectures incorporating skipconnections such as resnets perform much better than standard feedforward architectures despite wellchosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise. In contrast, the gradients in architectures with skipconnections are far more resistant to shattering decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fullyconnected networks and convnets. Finally, we present a new "looks linear" (LL) initialization that prevents shattering. Preliminary experiments show the new initialization allows to train very deep networks without the addition of skipconnections.
02/28/2017 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

On the informationtheoretic structure of distributed measurements
The internal structure of a measuring device, which depends on what its components are and how they are organized, determines how it categorizes its inputs. This paper presents a geometric approach to studying the internal structure of measurements performed by distributed systems such as probabilistic cellular automata. It constructs the quale, a family of sections of a suitably defined presheaf, whose elements correspond to the measurements performed by all subsystems of a distributed system. Using the quale we quantify (i) the information generated by a measurement; (ii) the extent to which a measurement is contextdependent; and (iii) whether a measurement is decomposable into independent submeasurements, which turns out to be equivalent to contextdependence. Finally, we show that only indecomposable measurements are more informative than the sum of their submeasurements.
07/06/2011 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks
Modern convolutional networks, incorporating rectifiers and maxpooling, are neither smooth nor convex. Standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets. The guarantee matches a lower bound for convex nonsmooth functions. The key technical tool is the neural Taylor approximation  a straightforward application of Taylor expansions to neural networks  and the associated Taylor loss. Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. The second half of the paper applies the Taylor approximation to isolate the main difficulty in training rectifier nets: that gradients are shattered. We investigate the hypothesis that, by exploring the space of activation configurations more thoroughly, adaptive optimizers such as RMSProp and Adam are able to converge to better solutions.
11/07/2016 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

Deep Online Convex Optimization with Gated Games
Methods from convex optimization are widely used as building blocks for deep learning algorithms. However, the reasons for their empirical success are unclear, since modern convolutional networks (convnets), incorporating rectifier units and maxpooling, are neither smooth nor convex. Standard guarantees therefore do not apply. This paper provides the first convergence rates for gradient descent on rectifier convnets. The proof utilizes the particular structure of rectifier networks which consists in binary active/inactive gates applied on top of an underlying linear network. The approach generalizes to maxpooling, dropout and maxout. In other words, to precisely the neural networks that perform best empirically. The key step is to introduce gated games, an extension of convex games with similar convergence properties that capture the gating function of rectifiers. The main result is that rectifier convnets converge to a critical point at a rate controlled by the gatedregret of the units in the network. Corollaries of the main result include: (i) a gametheoretic description of the representations learned by a neural network; (ii) a logarithmicregret algorithm for training neural nets; and (iii) a formal setting for analyzing conditional computation in neural nets that can be applied to recently developed models of attention.
04/07/2016 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it

ComplianceAware Bandits
Motivated by clinical trials, we study bandits with observable noncompliance. At each step, the learner chooses an arm, after, instead of observing only the reward, it also observes the action that took place. We show that such noncompliance can be helpful or hurtful to the learner in general. Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on sublinear regret. We present hybrid algorithms that maintain regret bounds up to a multiplicative factor and can incorporate compliance information. Simulations based on real data from the International Stoke Trial show the practical potential of these algorithms.
02/09/2016 ∙ by Nicolás Della Penna, et al. ∙ 0 ∙ shareread it

Cortical prediction markets
We investigate cortical learning from the perspective of mechanism design. First, we show that discretizing standard models of neurons and synaptic plasticity leads to rational agents maximizing simple scoring rules. Second, our main result is that the scoring rules are proper, implying that neurons faithfully encode expected utilities in their synaptic weights and encode highscoring outcomes in their spikes. Third, with this foundation in hand, we propose a biologically plausible mechanism whereby neurons backpropagate incentives which allows them to optimize their usefulness to the rest of cortex. Finally, experiments show that networks that backpropagate incentives can learn simple tasks.
01/07/2014 ∙ by David Balduzzi, et al. ∙ 0 ∙ shareread it