
Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for building sampleefficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building nearoptimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memorybased metalearning within a Bayesian framework, showing that the metalearned strategies are nearoptimal because they amortize Bayesfiltered data, where the adaptation is implemented in the memory dynamics as a statemachine of sufficient statistics. Essentially, memorybased metalearning translates the hard problem of probabilistic sequential inference into a regression problem.
05/08/2019 ∙ by Pedro A. Ortega, et al. ∙ 16 ∙ shareread it

Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about the structure of naturally occurring tasks. There has been considerable interest in designing reinforcement learning algorithms with similar properties. This includes several proposals to learn the learning algorithm itself, an idea also referred to as meta learning. One formal interpretation of this idea is in terms of a partially observable multitask reinforcement learning problem in which information about the task is hidden from the agent. Although agents that solve partially observable environments can be trained from rewards alone, shaping an agent's memory with additional supervision has been shown to boost learning efficiency. It is thus natural to ask what kind of supervision, if any, facilitates metalearning. Here we explore several choices and develop an architecture that separates learning of the belief about the unknown task from learning of the policy, and that can be used effectively with privileged information about the task during training. We show that this approach can be very effective at solving standard metaRL environments, as well as a complex continuous control environment in which a simulated robot has to execute various movement sequences.
05/15/2019 ∙ by Jan Humplik, et al. ∙ 5 ∙ shareread it

Human DecisionMaking under Limited Time
Subjective expected utility theory assumes that decisionmakers possess unlimited computational resources to reason about their choices; however, virtually all decisions in everyday life are made under resource constraints  i.e. decisionmakers are bounded in their rationality. Here we experimentally tested the predictions made by a formalization of bounded rationality based on ideas from statistical mechanics and informationtheory. We systematically tested human subjects in their ability to solve combinatorial puzzles under different time limitations. We found that our boundedrational model accounts well for the data. The decomposition of the fitted model parameter into the subjects' expected utility function and resource parameter provide interesting insight into the subjects' information capacity limits. Our results confirm that humans gradually fall back on their learned prior choice patterns when confronted with increasing resource limitations.
10/06/2016 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

Memory shapes time perception and intertemporal choices
There is a consensus that human and nonhuman subjects experience temporal distortions in many stages of their perceptual and decisionmaking systems. Similarly, intertemporal choice research has shown that decisionmakers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of sensorimotor representation. In particular, the model implies that interactions that constrain future behavior are perceived as being both longer in duration and more valuable. Furthermore, using simulations of artificial agents, we investigate how memory constraints enforce a renormalization of the perceived timescales. Our results show that qualitatively different discount functions, such as exponential and hyperbolic discounting, arise as a consequence of an agent's probabilistic model of the world.
04/18/2016 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

An Adversarial Interpretation of InformationTheoretic Bounded Rationality
Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decisionmaking against adversarial environments. This connection has previously been suggested in work relating the free energy to risksensitive control and to extensive form games. Here, we show that a singleagent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker's payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory.
04/22/2014 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

InformationTheoretic Bounded Rationality
Bounded rationality, that is, decisionmaking and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on informationtheoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing boundedrational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the singlestep decisionmaking case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust and risksensitive planning.
12/21/2015 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

Belief Flows of Robust Online Learning
This paper introduces a new probabilistic model for online learning which dynamically incorporates information from stochastic gradients of an arbitrary loss function. Similar to probabilistic filtering, the model maintains a Gaussian belief over the optimal weight parameters. Unlike traditional Bayesian updates, the model incorporates a small number of gradient evaluations at locations chosen using Thompson sampling, making it computationally tractable. The belief is then transformed via a linear flow field which optimally updates the belief distribution using rules derived from information theoretic principles. Several versions of the algorithm are shown using different constraints on the flow field and compared with conventional online learning algorithms. Results are given for several classification tasks including logistic regression and multilayer neural networks.
05/26/2015 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

Subjectivity, Bayesianism, and Causality
Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and concludes that the latter must be enriched with causal interventions to model agency. The central contribution of this work is an abstract model of the subject that accommodates causal interventions in a measuretheoretic formalisation. This formalisation is obtained through a gametheoretic Ansatz based on modelling the inside and outside of the subject as an extensiveform game with imperfect information between two players. Finally, I illustrate the expressiveness of this model with an example of causal induction.
07/15/2014 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a twostep procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a nonparametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. We illustrate the effectiveness of our model by optimizing a noisy, highdimensional, nonconvex objective function.
06/09/2012 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

Information, Utility & Bounded Rationality
Perfectly rational decisionmakers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decisionmaking based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risksensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered.
07/28/2011 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it

Generalized Thompson Sampling for Sequential DecisionMaking and Causal Inference
Recently, it has been shown how sampling actions from the predictive distribution over the optimal actionsometimes called Thompson samplingcan be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of gametheoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decisionmaking and causal inference.
03/18/2013 ∙ by Pedro A. Ortega, et al. ∙ 0 ∙ shareread it