Information-Theoretic Bounded Rationality

12/21/2015 ∙ by Pedro A. Ortega, et al. ∙ Google Max Planck Society KAIST 수리과학과 Hebrew University of Jerusalem University of Pennsylvania 0

Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

It is hard to overstate the influence that the economic idea of perfect rationality has had on our way of designing artificial agents (RussellNorvig2010). Today, many of us in the fields of artificial intelligence, control theory, and reinforcement learning, design our agents by encoding the desired behavior into an objective function that the agents must optimize in expectation. By doing so, we are relying on the theory of subjective expected utility (SEU), the standard economic theory of decision making under uncertainty (Neumann1944; Savage1954). SEU theory has an immense intuitive appeal, and its pervasiveness in today’s mindset is reflected in many widely-spread beliefs: e.g.

that probabilities and utilities are orthogonal concepts; that two options with the same expected utility are equivalent; and that randomizing can never improve upon an optimal deterministic choice. Put simply, if we find ourselves violating SEU theory, we would feel strongly compelled to revise our choice.

Simultaneously, it is also well-understood that SEU theory prescribes policies that are intractable to calculate save for very restricted problem classes. This was recognized soon after expected utility theory was formulated (Simon1956). In agent design, it became especially apparent more recently, as we continue to struggle in tackling problems of moderate complexity in spite of our deeper understanding of the planning problem (Duff2002; Hutter2004; Legg2008; Ortega2011) and the vast computing power available to us. For instance, there are efficient algorithms to calculate the optimal policy of a knownMarkov decision process (MDP) (Bertsekas1996), but no efficient algorithm to calculate the exact optimal policy of an unknown MDP or a partially observable MDP (PapadimitriouTsitsiklis1987). Due to this, in practice we either make severe domain-specific simplifications, as in linear-quadratic-Gaussian control problems (Stengel1994); or we approximate the “gold standard” prescribed by SEU theory, exemplified by the reinforcement learning algorithms based on stochastic approximations (Sutton1998; Szepesvari2010) and Monte-Carlo tree search (Kocsis2006; Veness2011; Mnih2015).

Recently, there has been a renewed interested in models of bounded rationality (Simon1972). Rather than approximating perfect rationality, these models seek to formalize decision-making with limited resources such as the time, energy, memory, and computational effort allocated for arriving at a decision. The specific way in which this is achieved varies across these accounts. For instance, epsilon-optimality only requires policies to be “close enough” to the optimum (Dixon2001); metalevel rationality proposes optimizing a trade-off between utilities and computational costs (Zilberstein2008); bounded optimality restricts the computational complexity of the programs implementing the optimal policy (Russell1995b); an approach that we might label procedural bounded rationality attempts to explicitly model the limitations in the decision-making procedures (Rubinstein1998); and finally, the heuristics

approach argues that general optimality principles ought to be abandoned altogether in favor of collections of simple heuristics

(Gigerenzer2001).

Here we are concerned with a particular flavor of bounded rationality, which we might call “information-theoretic” due to its underlying rationale. While this approach solves many of the shortcomings of perfect rationality in a simple and elegant way, it has not yet attained widespread acceptance from the mainstream community in spite of roughly a decade of research in the machine learning literature. As is the case with many emerging fields of research, this is partly due to the lack of consensus on the interpretation of the mathematical quantities involved. Nonetheless, a great deal of the basics are well-established and ready for their widespread adoption; in particular, some of the algorithmic implications are much better understood today. Our goal here is to provide a consolidated view of some of the basic ideas of the theory and to sketch the intimate connections to other fields.

1.1 A Short Algorithmic Illustration

Perfect Rationality.

Let be a finite set of candidate choices or policies, and let be a utility function mapping each policy into the unit interval. Consider the problem of finding a maximizing element . For simplicity we assume that, given an element , its utility can be evaluated in a constant number of computation steps. Imposing no particular structure on , we find by sequentially evaluating each utility and returning the best found in the end.