It is hard to overstate the influence that the economic idea of perfect rationality has had on our way of designing artificial agents (RussellNorvig2010). Today, many of us in the fields of artificial intelligence, control theory, and reinforcement learning, design our agents by encoding the desired behavior into an objective function that the agents must optimize in expectation. By doing so, we are relying on the theory of subjective expected utility (SEU), the standard economic theory of decision making under uncertainty (Neumann1944; Savage1954). SEU theory has an immense intuitive appeal, and its pervasiveness in today’s mindset is reflected in many widely-spread beliefs: e.g.
that probabilities and utilities are orthogonal concepts; that two options with the same expected utility are equivalent; and that randomizing can never improve upon an optimal deterministic choice. Put simply, if we find ourselves violating SEU theory, we would feel strongly compelled to revise our choice.
Simultaneously, it is also well-understood that SEU theory prescribes policies that are intractable to calculate save for very restricted problem classes. This was recognized soon after expected utility theory was formulated (Simon1956). In agent design, it became especially apparent more recently, as we continue to struggle in tackling problems of moderate complexity in spite of our deeper understanding of the planning problem (Duff2002; Hutter2004; Legg2008; Ortega2011) and the vast computing power available to us. For instance, there are efficient algorithms to calculate the optimal policy of a knownMarkov decision process (MDP) (Bertsekas1996), but no efficient algorithm to calculate the exact optimal policy of an unknown MDP or a partially observable MDP (PapadimitriouTsitsiklis1987). Due to this, in practice we either make severe domain-specific simplifications, as in linear-quadratic-Gaussian control problems (Stengel1994); or we approximate the “gold standard” prescribed by SEU theory, exemplified by the reinforcement learning algorithms based on stochastic approximations (Sutton1998; Szepesvari2010) and Monte-Carlo tree search (Kocsis2006; Veness2011; Mnih2015).
Recently, there has been a renewed interested in models of bounded rationality (Simon1972). Rather than approximating perfect rationality, these models seek to formalize decision-making with limited resources such as the time, energy, memory, and computational effort allocated for arriving at a decision. The specific way in which this is achieved varies across these accounts. For instance, epsilon-optimality only requires policies to be “close enough” to the optimum (Dixon2001); metalevel rationality proposes optimizing a trade-off between utilities and computational costs (Zilberstein2008); bounded optimality restricts the computational complexity of the programs implementing the optimal policy (Russell1995b); an approach that we might label procedural bounded rationality attempts to explicitly model the limitations in the decision-making procedures (Rubinstein1998); and finally, the heuristics
approach argues that general optimality principles ought to be abandoned altogether in favor of collections of simple heuristics(Gigerenzer2001).
Here we are concerned with a particular flavor of bounded rationality, which we might call “information-theoretic” due to its underlying rationale. While this approach solves many of the shortcomings of perfect rationality in a simple and elegant way, it has not yet attained widespread acceptance from the mainstream community in spite of roughly a decade of research in the machine learning literature. As is the case with many emerging fields of research, this is partly due to the lack of consensus on the interpretation of the mathematical quantities involved. Nonetheless, a great deal of the basics are well-established and ready for their widespread adoption; in particular, some of the algorithmic implications are much better understood today. Our goal here is to provide a consolidated view of some of the basic ideas of the theory and to sketch the intimate connections to other fields.
1.1 A Short Algorithmic Illustration
Let be a finite set of candidate choices or policies, and let be a utility function mapping each policy into the unit interval. Consider the problem of finding a maximizing element . For simplicity we assume that, given an element , its utility can be evaluated in a constant number of computation steps. Imposing no particular structure on , we find by sequentially evaluating each utility and returning the best found in the end.