Decision Making under Uncertainty: A Quasimetric Approach

12/31/2013
by   Steve N'Guyen, et al.
0

We propose a new approach for solving a class of discrete decision making problems under uncertainty with positive cost. This issue concerns multiple and diverse fields such as engineering, economics, artificial intelligence, cognitive science and many others. Basically, an agent has to choose a single or series of actions from a set of options, without knowing for sure their consequences. Schematically, two main approaches have been followed: either the agent learns which option is the correct one to choose in a given situation by trial and error, or the agent already has some knowledge on the possible consequences of his decisions; this knowledge being generally expressed as a conditional probability distribution. In the latter case, several optimal or suboptimal methods have been proposed to exploit this uncertain knowledge in various contexts. In this work, we propose following a different approach, based on the geometric intuition of distance. More precisely, we define a goal independent quasimetric structure on the state space, taking into account both cost function and transition probability. We then compare precision and computation time with classical approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 21

page 22

04/22/2020

Tactical Decision-Making in Autonomous Driving by Reinforcement Learning with Uncertainty Estimation

Reinforcement learning (RL) can be used to create a tactical decision-ma...
07/04/2012

A unified setting for inference and decision: An argumentation-based approach

Inferring from inconsistency and making decisions are two problems which...
07/11/2012

Using arguments for making decisions: A possibilistic logic approach

Humans currently use arguments for explaining choices which are already ...
12/11/2019

Jason-RS, a Collaboration between Agents and an IoT Platform

In this article we start from the observation that REST services are the...
10/30/2019

Dynamically Protecting Privacy, under Uncertainty

We propose and analyze the ε-Noisy Goal Prediction Game to study a funda...
03/27/2013

Interval-Based Decisions for Reasoning Systems

This essay looks at decision-making with interval-valued probability mea...
10/16/2019

Psychometric Analysis of Forensic Examiner Behavior

Forensic science often involves the comparison of crime-scene evidence t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

We propose a new approach for solving a class of discrete decision making problems under uncertainty with positive cost. This issue concerns multiple and diverse fields such as engineering, economics, artificial intelligence, cognitive science and many others. Basically, an agent has to choose a single or series of actions from a set of options, without knowing for sure their consequences. Schematically, two main approaches have been followed: either the agent learns which option is the correct one to choose in a given situation by trial and error, or the agent already has some knowledge on the possible consequences of his decisions; this knowledge being generally expressed as a conditional probability distribution. In the latter case, several optimal or suboptimal methods have been proposed to exploit this uncertain knowledge in various contexts. In this work, we propose following a different approach, based on the geometric intuition of distance. More precisely, we define a goal independent quasimetric structure on the state space, taking into account both cost function and transition probability. We then compare precision and computation time with classical approaches.

Introduction

It’s Friday evening, and you are in a hurry to get home after a hard day’s work. Several options are available. You can hail a taxi, but it’s costly and you’re worried about traffic jams, common at this time of day. Or you might go on foot, but it’s slow and tiring. Moreover, the weather forecast predicted rain, and of course you forgot your umbrella. In the end you decide to take the subway, but unfortunately, you have to wait half an hour for the train at the connecting station due to a technical incident.

Situations like this one are typical in everyday life. It is also undoubtedly a problem encountered in logistics and control. The initial state and the goal are known (precisely or according to a probability distribution). The agent has to make a series of decisions about the best transport means, taking into account both uncertainty and cost. This is what we call optimal control under uncertainty.

Note that he might also have an intuitive notion of some abstract distance: how far am I from home? To what extent will it be difficult or time consuming to take a given path? The problem might become even more difficult if you do not know precisely what state you are in. For instance, you might be caught in a traffic jam in a completely unknown neighborhood.

This problem that we propose to deal with in this paper can be viewed as sequential decision making, usually expressed as a Markovian Decision Process (MDP) [Bellman1957, Howard1960, Puterman1994, Boutilier1999] and its extension to Partially Observable cases (POMDP) [Drake1962, Astrom1965]. Knowing the transition probability of switching from one state to another by performing a particular action as well as the associated instantaneous cost, the aim is to define an optimal policy, either deterministic or probabilistic, which maps the state space to the action state in order to minimize the mean cumulative cost from the initial state to a goal (goal-oriented MDPs).

This class of problems is usually solved by Dynamic Programming method, using Value Iteration (VI) or Policy Iteration (PI) algorithms and their numerous refinements. Contrasting with this model-based approach, various learning algorithms have also been proposed to progressively build either a value function, a policy, or both, from trial to trial. Reinforcement learning is the most widely used, especially when transition probabilities and cost function are unknown (model-free case), but it suffers from the same tractability problem

[Sutton1998]. Moreover one significant drawback to these approaches is that they do not take advantage of the preliminary knowledge of cost function and transition probability.

MDPs have generated a substantial amount of work in engineering, economics, artificial intelligence and neuroscience, among others. Indeed, in recent years, Optimal Feedback Control theory has become quite popular in explaining certain aspects of human motor behavior [Todorov2002, Todorov2004]. This kind of method results in feedback laws, which allow for closed loop control.

However, aside from certain classes of problems with a convenient formulation, such as the Linear Quadratic case and its extensions [Stengel1986], or through linearization of the problem, achieved by adapting the immediate cost function [Todorov2009]

, the exact total optimal solution in the discrete case is intractable due to the curse of dimensionality

[Bellman1957].

Thus, a lot of work in this field is devoted to find approximate solutions and efficient methods for computing them.

Heuristic search methods try to speed up optimal probabilistic planning by considering only a subset of the state space (e.g. knowing the starting point and considering only reachable states). These algorithms can provide offline optimal solutions for the considered subspace [Barto1995, Hansen2001, Bonet2003].

Monte-Carlo planning methods that doesn’t manipulate probabilities explicitly have also proven very successful for dealing with problems with large state space [Peret2004b, Kocsis2006].

Some methods try to reduce the dimensionality of the problem in order to avoid memory explosion by mapping the state space to a smaller parameter space [Buffet2006, Kolobov2009] or decomposing it hierarchically [Hauskrecht1998, Dietterich1998, Barry2011] .

Another family of approximation methods which has recently proven very successful [Little2007] is the “determinization”. Indeed, transforming the probabilistic problem to a deterministic one optimizing another criterion allows the use of very efficient deterministic planner [Yoon2007, Yoon2008, Teichteil-Konigsbuch2010].

What we propose here is to do something rather different, by considering goal-independent distances between states. To compute the distance we propose a kind of determinization of the problem using a one step transition ”mean cost per successful attempt” criterion, which can then be propagated by triangle inequality. The obtained distance function thus confers to the state space a quasi-metric structure that can be viewed as a Value function between all states. Theses distances can then be used to compute an offline policy using a gradient descent like method.

We show that in spite of being formally suboptimal (except for the deterministic and a described particular case), this method exhibits several good properties. We demonstrate the convergence of the method and the possibility to compute distances using standard deterministic shortest path algorithms. Comparison with the optimal solution is described for different classes of problems with a particular look at problems with prisons. Prisons or absorbing set of states have been recently shown to be difficult cases for state of the art methods [Kolobov2012] and we show how our method naturally deals with these cases.

Materials and Methods

Quasimetric

Let us consider a dynamic system described by its state and , the action applied at state leading to an associated instantaneous cost

. The dynamics can then be described by the Markov model:

where the state of the system is a random variable

defined by a probability distribution. Assuming stationary dynamics, a function exists, satisfying

This model enables us to capture uncertainties in the knowledge of the system’s dynamics, and can be used in the Markov Decision Process (MDP) formalism. The aim is to find the optimal policy

allowing a goal state to be reached with minimum cumulative cost. The classic method of solving this is to use dynamic programming to build an optimal Value function , minimizing the total expected cumulative cost using Bellman equation:

(1)

which can be used to specify an optimal control policy

(2)

In general this method is related to a goal state or a discount factor.

Here we propose a different approach by defining a goal independent quasimetric structure in the state space, defining for each state couple a distance function reflecting a minimum cumulative cost.

This distance has to verify the following properties:

leading to the triangle inequality

Therefore, the resulting quasi-distance function confers the property of a quasimetric space to .

Notice that this metric need not be symmetric (in general ). It is in fact a somewhat natural property, e.g. climbing stairs is (usually) harder than going down.

By then choosing the cost function this distance can be computed iteratively (such as the Value function).

For a deterministic problem, we initialize with:

with the discrete dynamic model giving the next state by applying action in state . Then we apply the recurrence:

(3)

We can show that this recurrence is guaranteed to converge in finite time for a finite state-space problem.

    1. by recurrence as:


      • as
        and by definition.

      • and if then

    2. as:
      then in particular if we take we have .

    3. is a decreasing monotone sequence bounded by .

    However, finding a way to initialize (more precisely ) while taking uncertainty into account, presents a difficulty in probabilistic cases as we cannot use the cumulative expected cost like in Bellman equation.

    For example we can choose:

    for the first iteration with as the one-step distance.

    The quotient of cost over transition probability is chosen as it provides an estimate of the

    mean cost per successful attempt. If we attempt times the action in state the cost will be and the objective will be reached on average times. The mean cost per successful attempt is:

    This choice of metric is therefore simple and fairly convenient. All the possible consequences of actions are clearly not taken into account here, thus inducing a huge computational gain but at the price of losing the optimality. In fact, we are looking at the minimum over actions of the mean cost per successful attempt, which can be viewed as using the best mean cost, disregarding unsuccessful attempts, i.e. neglecting the probability to move to an unwanted state.

    In a one-step decision, this choice is a reasonable approximation of the optimal that takes both cost and probability into account.

    This cost-probability quotient was used before to determinize probabilistic dynamics and extract plans [Keyder2008, Barry2011, Kaelbling2011]. Here we generalize this method to construct an entire metric in the state space using triangle inequality.

    We also notice that contrary to the dynamic programming approach, the quasimetric is not linked to a specific goal but instead provides a distance between any state pair. Moreover, using this formalism, the instantaneous cost function is also totally goal independent and can represent with greater ease any objective physical quantity, such as consumed energy. This interesting property allows for much more adaptive control since the goal can be changed without the need to recompute at all. As shown in the following, it is even possible to replace the goal state by a probability distribution over states. Another interesting property of the quasi-distance is that it doesn’t have local minimum from the action point of view.

    In fact, for any couple , is a decreasing finite series of non-negative numbers (finite number of states), which therefore converges to a non-negative number


    Note that if we multiply the cost function by any positive constant, the quasimetric is also multiplied by the same constant. This multiplication has no consequence on the structure of the state space and leaves the optimal policy unchanged, therefore we can choose a constant such that:

    Let be the subset of associated with a goal such that:

    and let the subset of associated with the goal such that:

    The subset is the set of states from which the goal can be reached in a finite time with a finite cost. Starting from the goal will never be reached either because some step between and requires an action with an infinite cost, or because there is a transition probability equal to .

    Then the defined quasimetric admits no local minimum to a given goal in the sense that for a given , if is such that:

    then