Learning Constraints from Locally-Optimal Demonstrations under Cost Function Uncertainty

01/25/2020 ∙ by Glen Chou, et al. ∙ University of Michigan 0

We present an algorithm for learning parametric constraints from locally-optimal demonstrations, where the cost function being optimized is uncertain to the learner. Our method uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the demonstrations within a mixed integer linear program (MILP) to learn constraints which are consistent with the local optimality of the demonstrations, by either using a known constraint parameterization or by incrementally growing a parameterization that is consistent with the demonstrations. We provide theoretical guarantees on the conservativeness of the recovered safe/unsafe sets and analyze the limits of constraint learnability when using locally-optimal demonstrations. We evaluate our method on high-dimensional constraints and systems by learning constraints for 7-DOF arm and quadrotor examples, show that it outperforms competing constraint-learning approaches, and can be effectively used to plan new constraint-satisfying trajectories in the environment.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Learning from demonstration has largely focused on learning cost and reward functions, through the frameworks of inverse optimal control and inverse reinforcement learning (IOC/IRL)

[irl_1, irl_2, lfd3, ng_irl], which replicate the behavior of an expert demonstrator when optimized. However, real-world planning problems, such as navigating a quadrotor in an urban environment, also require that the system obey hard constraints, that is, system trajectories must remain in a set of safe (constraint-satisfying) states. As constraints enforce safety more strictly than cost function penalties, which may “soften” a constraint and allow violations, they are better suited for planning in safety-critical situations. Furthermore, while different tasks may require different cost functions, oftentimes safety constraints are shared across tasks, and identifying them can help the robot generalize.

Initial work in [wafr] and [corl] has taken steps towards identifying constraints from approximately globally-optimal expert demonstrations, assuming that the demonstrator’s cost function is known exactly. However, as humans are not always experts at performing a task, requiring them to provide demonstrations which are nearly globally-optimal can be unreasonable. Furthermore, it is rare for the cost function being optimized to be known exactly by the learner. To address these shortcomings, we consider the problem of learning parametric constraints shared across tasks from approximately locally-optimal demonstrations under parametric cost function uncertainty. Our method is based on the insight that locally-optimal, constraint-satisfying demonstrations satisfy the Karush-Kuhn-Tucker (KKT) optimality conditions, which are first-order necessary conditions for local optimality of a solution to a constrained discrete-time optimal control problem. We solve a mixed integer linear program (MILP) to recover constraint and cost function parameters which make the demonstrations locally-optimal. We make the following specific contributions in this paper:

  • We develop a novel algorithm for learning parametric, potentially non-convex constraints from approximately locally-optimal demonstrations, where the parameterization can either be provided or grown incrementally to be consistent with the data. The method can extract volumes of safe/unsafe states (states which satisfy/do not satisfy the constraints) for future guaranteed safe planning and enable planners to query states for safety.

  • Our method can learn constraints despite uncertainty in the cost function and can also recover a cost function jointly with the constraint.

  • Under mild assumptions, we prove that our method recovers guaranteed conservativeness estimates (that is, inner approximations) of the true safe/unsafe sets, and analyze the learnability of a constraint from locally-optimal compared to globally-optimal demonstrations.

  • We evaluate our method on difficult constraint learning problems in high-dimensional constraint spaces (23 dimensions) on systems with complex nonlinear dynamics and demonstrate that our method outperforms previous methods for parametric constraint inference [wafr, corl].

Ii Related Work

Previous work in IOC [boyd, toussaint, pontryagin] has used the KKT conditions to recover a cost function from demonstrations, assuming that the constraints are known. In contrast, our method explicitly learns the constraints. The risk-sensitive IRL approach in [sumeet] also uses the KKT conditions, and is complementary to our work, which learns hard constraints. Perhaps the closest to our work is [melanie], which aims to recover a cost function and constraint simultaneously using the KKT conditions. However, to avoid non-convexity in the cost/constraint recovery problem, [melanie] restricts their method to recovering convex constraints and do not directly search for constraint parameters, instead enumerating an a-priori fixed, finite set of candidate constraint parameters using a method which holds only for the convex case. In contrast, by formulating our problem as a MILP, our method avoids enumeration, directly searching the full continuous space of possible constraint parameters. It also enables us to learn non-convex, nonlinear constraints while retaining formal guarantees on the conservativeness of the recovered constraint.

There also exists prior work in learning geometric state space constraints [vijayakumar], [shah], which our method generalizes by learning non-convex constraints not necessarily defined in the state space. Learning local trajectory-based constraints has also been explored in [dmitry, anca, lfdc1, lfdc2, lfdc3, lfdc4] by reasoning over the constraints within a single trajectory or task. These methods complement our approach, which learns a global constraint shared across tasks. In the constraint-learning literature, our work is closest to [wafr] and [corl], which learn a global shared constraint by sampling unsafe trajectories using globally-optimal demonstrations, a known cost function, and a system simulator. In addition to the drawbacks of global optimality and assuming a known cost as mentioned previously, sampling unsafe trajectories can be difficult for systems with complicated dynamics. By using the KKT conditions, which implicitly

define the unsafe set instead of explicitly through unsafe trajectories, our method sidesteps both the need for an exact cost function to classify the safety of sampled trajectories as well as any sampling difficulties.

Iii Preliminaries and Problem Setup

We consider discrete-time nonlinear systems , and , performing tasks , which are represented as constrained optimization problems over state/control trajectories :

Problem 1 (Forward problem / “task” )
(1)

where is a potentially non-convex cost function for task , parameterized by . In Sec. IV-A to IV-C, we assume that is known (through possibly inaccurate prior knowledge) for clarity; we later relax this assumption and discuss how to learn from the demonstrations. Further, is a known mapping from state-control trajectories to a constraint space , elements of which are referred to as constraint states . Mappings and are known and map to constraint spaces and , containing a known shared safe set and a known task-dependent safe set , respectively. In this paper, we encode the system dynamics in and start/goal constraints in . Grouping the constraints of Problem 1 as equality/inequality (eq/ineq) constraints and known/unknown () constraints, we can write:

(2)

where , , and . Note that unknown equality constraints can be written equivalently as . As shorthand, let . We now define

(3)
(4)

as an unknown safe/unsafe set defined by unknown parameter , for possibly unknown parameterizations . Last, we restrict and to be unions of polytopes.

Intuitively, a trajectory is locally-optimal if all trajectories within a neighborhood of have cost greater than or equal to . More precisely, for a trajectory to be locally-optimal, it necessarily satisfies the KKT conditions [cvxbook]. We define a demonstration as a state-control trajectory which we assume approximately solves Problem 1 to local optimality, i.e. it satisfies all constraints and is in the neighborhood of a local optimum.

Our goal is to recover the safe set and unsafe set , given demonstrations , known shared safe set , and task-dependent constraints . As a byproduct, our method can also recover unknown cost parameters .

Iv Method

We detail our constraint-learning algorithm. First, we formulate the general KKT-based constraint recovery problem (Sec. IV-A) and then develop specific optimization problems for the cases where the constraint is defined as a union of offset-parameterized (Sec. IV-B) or affinely-parameterized constraints (Sec. IV-D). We show how to extract guaranteed safe/unsafe states (Sec. IV-C), handle unknown constraint parameterizations (Sec. IV-E), and handle cost function uncertainty (Sec. IV-F). In closing, we show how our method can be used within a planner to guarantee safety (Sec. IV-G).

Iv-a Constraint recovery via the KKT conditions

Recall that the KKT conditions are necessary conditions for local optimality of a solution of a constrained optimization problem [cvxbook]. For constraints (2) and Lagrange multipliers and , the KKT conditions for the th locally-optimal demonstration , denoted , are:

Primal feasibility: (5a)
(5b)
(5c)
Lagrange mult. (5d)
nonnegativity: (5e)
Complementary (5f)
slackness: (5g)
Stationarity:
(5h)

where takes the gradient with respect to a flattened trajectory and

denotes elementwise multiplication. For compactness, we vectorize the multipliers

, , and . We drop the dependency, as the cost is assumed known for now, as well as (5a)-(5b), as they involve no decision variables. Then, finding a constraint consistent with the local optimality conditions of the demonstrations amounts to finding a constraint parameter which satisfies the KKT conditions for each demonstration. That is, we can solve the following feasibility problem:

Problem 2 (KKT inverse, locally-optimal)
(6)

Further, to address suboptimality (i.e. approximate local-optimality) in the demonstrations, we can relax the stationarity (5h) and complementary slackness constraints (5f)-(5g) and place corresponding penalties into the objective function:

Problem 3 (KKT inverse, suboptimal)
(7)

where denotes the LHS of Eq. (5h) and denotes the concatenated LHSs of Eqs. (5f) and (5g).

Denote the projection of the feasible set of Problem 2 onto as . We define the set of learned guaranteed safe/unsafe constraint states as /, respectively. For Problem 2, a constraint state is learned guaranteed safe/unsafe if is marked safe/unsafe for all . Formally, we have:

(8)
(9)

We now formulate variants of Problem 2 which are efficiently solvable for specific constraint parameterizations. For legibility, we describe the method assuming and is the identity. Due to the bilinearity between decision variables in Problems 2 and 3 for some parameterizations, we describe exact (Sec. IV-B) and relaxed (Sec. IV-D) formulations for recovering the unknown parameters.

Iv-B Unions of offset-parameterized constraints

Consider when Problem 1 involves avoiding an unsafe set described by the union and intersection of offset-parameterized half-spaces (i.e. does not multiply ):

(10)

This parameterization can represent any arbitrarily-shaped unsafe set if is sufficiently large (i.e. as a union of polytopes) [tao], though in practice our method may not be efficient for large . We will often use the specific case of unions of axis-aligned -dimensional hyper-rectangles,

(11)

where are the upper/lower extents of dimension of box . We now modify the KKT conditions to handle the “or” constraints in (10). Primal feasibility (5c) changes to

(12)

which can be implemented using the big-M formulation [bertsimas]:

(13)

where is a large positive number, and are the vertical concatenation of and for all ,

are binary variables encoding that at least one half-space constraint must hold, and

is the th entry of . For demonstration to be locally-optimal, we know that for each , the complementary slackness condition, , must hold for at least one and for all in Eq. (12). Furthermore, in the stationarity condition (5h), terms should only be included for pairs where the complementary slackness condition is enforced. Thus, we can enforce that holds for all , for all , and for some by writing:

(14)

together with (5d) and (5e), where we use a big-M formulation with binary variables (encoding the complementary slackness condition) and (encoding if the complementary slackness condition is being enforced). We have denoted for . Next, we modify line 2 of constraint (5h) to enforce:

(15)

for all , where the -th entry of , , refers to . Note that (continuous variables) and (binary variables) are bilinear ( has no decision variables as does not multiply ). By assuming bounds , this can be reformulated exactly in a linear fashion (i.e. linearized) [mccormick] by replacing each bilinear product in (15) with slack variables and adding constraints (where for short):

(16)

Finally, let / be the horizontal concatenation of /, for all . We can now pose the full problem:

Problem 4 (KKT inverse, unions)
(17)

Iv-C Extraction of safe and unsafe states

Before moving onto affine parameterizations, we first detail how to check guaranteed safeness/unsafeness (as defined in (8)-(9)). One can check if a constraint state or by adding a constraint or to Problem 2 and checking feasibility of the resulting program:

Problem 5 (Query if is guaranteed safe Or unsafe)
(18)

If Problem 5 is infeasible, then or . Solving this problem is akin to querying an oracle about the safety of . The oracle can return that is guaranteed safe (program infeasible after forcing to be unsafe), guaranteed unsafe (program infeasible after forcing to be safe), or unsure (program is feasible despite forcing to be safe or unsafe).

Since the constraint space is continuous, it is not possible to check via enumeration if each or . To address this, we can check the neighborhood of a constraint state for membership in by solving the following:

Problem 6 (Volume extraction)
(19)

Intuitively, Problem 6 finds the largest box centered at contained within . An analogous problem can also be posed to recover the largest hypercube centered at contained within . For some common parameterizations (axis-aligned hyper-rectangles, convex sets), subsets of and can be even more efficiently recovered by performing line searches or taking convex hulls of guaranteed safe/unsafe states, details of which are in Appendix B of [corl]. Volumes of safe/unsafe space can thus be produced by repeatedly solving Problem 6 for different .

Iv-D KKT relaxation for unions of affine constraints

Now, consider when Problem 1 involves avoiding an unsafe set described by a union of affine constraints:

(20)

where is an affine function of for fixed . Unlike in Sec. IV-B, formulating the recovery problem like Problem 4 yields trilinearity in the stationarity condition between continuous variables and and binary variables , since for the affine case, remains a function of . As the product of two continuous decision variables cannot be linearized exactly, one must solve a MINLP to recover in this case, which can be inefficient. However, a relaxation which enables querying of guaranteed safeness/unsafeness via Problem 5 can be formulated as a MILP. For legibility, we present the case, where there is only one affine constraint (and hence the binary variables seen in Problem 4 are all set to 1 and can thus be dropped). Each bilinear term is replaced with , where is a variable which represents the bilinear term and is an indicator variable encoding that if is 0, then must be 0. Hence, by linearizing the bilinear term as such, there is no relaxation gap when the Lagrange multipliers are zero; the only loss is when the Lagrange multipliers are non-zero (i.e. when the demonstration touches the constraint boundaries). In this case, coupling between and is lost by introducing the variables. We further linearize (product of continuous, binary variables) with the same procedure in Sec. IV-B by again introducing slack variables and constraining them accordingly with (16), where the are replaced with . Putting things together, we can write the following relaxed constraint recovery problem for :

Problem 7 (KKT relaxation, affine)
(21)

where , , denote horizontal concatenation of , , over . The case where the constraint is a union of affine constraints yields quadrilinearity and can be handled similarly, requiring one extra step to linearize the products of binary variables and , which can be done exactly.

While Problem 7 cannot recover the constraint parameter directly, one can still check if a constraint state is guaranteed safe/unsafe using Problem 5 (see Theorem 2 for reasoning).

Iv-E Unknown constraint parameterization

In many applications, we may not know a constraint parameterization a priori. However, complex unsafe/safe sets can often be approximated as the union of many simple unsafe/safe sets. Thus, we adapt the method in [corl] for incrementally growing a parameterization based on the complexity of the provided demonstrations. More precisely, suppose the true parameterization of the unsafe set is unknown but can be exactly/approximately written as the union of simple sets . Each simple set has a known parameterization but , the minimum number of simple sets needed to reconstruct , is unknown. We can estimate a lower bound on , , by incrementally adding simple sets until Problem 2 is feasible (i.e. there exists a sufficiently complex constraint which can satisfy the demonstrations’ KKT conditions). Issues with conservativeness of the recovered constraint when are discussed in [corl] and also hold here, which we omit for brevity.

Iv-F Handling cost function uncertainty

We now extend the KKT conditions presented in (5) and Problems 2 and 3 to learn constraints with parametric uncertainty in the cost function (i.e. if in Problem 1 is unknown). To address this, the first term in the stationarity condition (5h) must be changed to . Then, if is affine in , can be found using a MILP.

Querying/volume extraction holds just as before; the only difference is that is now a decision variable in Problem 5/6. Note we are extracting constraint states that are guaranteed safe/unsafe for all possible cost parameters; that is, we are extracting safe/unsafe sets that are robust to cost uncertainty.

We summarize what we can solve for when using various parameterizations. For the exact cases, we can solve for /, but when relaxing, we can only solve for / via queries. Note the constraint/cost can be nonlinear in without inducing relaxation, though it precludes usage of Problem 6 (as is a decision variable in the latter, but not the former):

Constraint param. Cost param. Recover ?
: form of (10); : affine : affine; : nonlin. Yes: Prob. 4 Prob. 5/6
: form of (10); : nonlin. : affine; : nonlin. Yes: Prob. 4 Prob. 5
: form of (20); : nonlin. : affine; : nonlin. No Prob. 5

This only describes what we can solve for; the actual accuracy of the recovered / and the size of the recovered / depends on how informative the demonstrations are, i.e. the demonstrations should interact with the constraint.

Iv-G Applications to safe planning

As learned constraints can be reused for novel tasks with the same safety requirements, we end this section by describing how our method can be used within a planner to guarantee the safety of trajectories planned for such tasks. Recall that Problems 5 and 6 can be used to query if a constraint state or a region around is guaranteed safe/unsafe. The planner can use this information by either:

  • Extracting an explicit representation of the constraint by repeatedly solving Problem 6 for different to cover and . Denote these extracted sets as and (the conservativeness of our method is proved in Sec. V-A). Then, can be passed to a planner and quickly used for constraint/collision checking via set-containment checks. A planned trajectory is guaranteed safe if each state on it lies in , since is contained in true safe set . If is small and the planner cannot find a feasible trajectory, we can at least guarantee that a trajectory is not definitely unsafe it it does not intersect with , as is contained in the true unsafe set .

  • Extracting an implicit representation of the constraint by solving Problem 5 as needed by the planner. This may be less computationally efficient than the explicit case, but we demonstrate in Sec. VI-C that we still achieve reasonable planning times for a 7-DOF arm.

V Theoretical Analysis

In this section, we prove that our method provides a conservative estimate of the guaranteed learned safe/unsafe sets (Sec. V-A) and prove learnability results using locally-optimal demonstrations (Sec. V-B).

V-a Conservativeness

Definition 1 (Implied unsafe/safe set)

For some set , let be the set of states implied unsafe by restricting the parameter set to , i.e. is the set of states that all mark as unsafe. Similarly, let be the set of states implied safe by restricting the parameter set to .

We further introduce the following lemma:

Lemma 1 (Lemma C.1 in [corl])

Suppose , for some other set . Then, and .

Theorem 1 (Conservativeness of Problem 2)

Suppose the constraint parameterization is known exactly. Then, extracting and (as defined in (8) and (9), respectively) from the feasible set of Problem 2 projected onto (denoted as ) returns and .

We first prove that . Suppose that there exists such that . Then by definition, for all , . However, we know that all locally-optimal demonstrations satisfy the KKT conditions with respect to the true parameter ; hence, . Then, . Contradiction. Similar logic holds for proving that . Suppose that there exists such that . Then by definition, for all , . However, we know that all locally-optimal demonstrations satisfy the KKT conditions with respect to the true parameter ; hence, . Then, . Contradiction.

Remark 1

Unfortunately, it is difficult to guarantee conservativeness when using suboptimal demonstrations (solving Problem 3), as the relationship between cost suboptimality and KKT violation is generally unknown. However, we note in practice that the recovered using suboptimal demonstrations still tend to be conservative (see Sec. VI-B).

Theorem 2 (Conservativeness of Problem 7)

Suppose the constraint parameterization is known exactly. Then, extracting and (as defined in (8) and (9), respectively) from the feasible set of Problem 7 (denoted as ) returns and .

Denote the -projected feasible set of the original unrelaxed problem (i.e. variables are not introduced and the bilinear terms between and remain) as and the -projected feasible set of Problem 7 as . Using the logic in Theorem 1, extracting and from yields and (since is the true feasible set, like assumed in Theorem 1). Furthermore, , since relaxing the bilinear terms to linear terms in Problem 7 expands the feasible set compared to the unrelaxed problem. By definition, and , and via Lemma 1, and . Hence, and .

Remark 2

For brevity, we omit conditions on for conservativeness; it is well-known that this is achieved by choosing the big-M constants to be sufficiently large [bertsimas].

V-B Global vs local learnability

Definition 2 (Local learnability)

A state is locally learnable if there exists any set of locally-optimal demonstrations, where may be infinite, such that , where is the -projected feasible set of Problem 2. We also define the locally learnable set of unsafe states as the union of all locally learnable states.

Definition 3 (Global learnability)

A state is globally learnable if there exists any set of globally-optimal demonstrations and sampled strictly lower-cost (and hence unsafe) trajectories, where and may be infinite, such that , where is the feasible set of Problem 2 in [corl] (which recovers a constraint consistent with the demonstrations and sampled unsafe trajectories). Accordingly, we define the globally learnable set of unsafe states as the union of all globally learnable states.

Note that a safe state can always be learned guaranteed safe, as there always exists a safe globally-optimal or locally-optimal demonstration passing through . Armed with these definitions, we show the following:

Theorem 3 (Global vs local)

Suppose the initial constraint parameter set is identical for both the local and global problems. Then, .

Any globally-optimal demonstration must also satisfy the KKT conditions, as it is also locally-optimal. Further conditions (in the form of lower-cost trajectories being infeasible) must be imposed on a constraint parameter for it to be globally-optimal. Hence, . By Lemma 1, , and thus .

Note that Theorem 3 holds in the limit of having sampled all unsafe trajectories. In practice, the sampling is nowhere near complete, especially for nonlinear dynamics. We see in these cases (Sec. VI-C) that our KKT-based method learns more compared to sampling-based techniques. Finally, we note that cost function uncertainty can only decrease learnability, as it enlarges the feasible set of Problem 2.

Vi Results

We show our method, first on 2D examples (Sec. VI-A) for intuition, and then on high-dimensional 7-DOF arm (Sec. VI-B) and quadrotor (Sec. VI-C) constraint-learning problems (see the accompanying video for experiment visualizations). All computation times are recorded on a laptop with a 3.1 GHz Intel Core i7 processor and 16 GB RAM.

Vi-a 2D examples

Global vs. local: Assuming global demonstration optimality can enlarge compared to assuming local optimality (Theorem 3). In this example, we show some common differences in the learned constraints when assuming global/local optimality. Consider a 2D kinematic system , , avoiding the pink obstacle in Fig. 1. We use an axis-aligned box constraint parameterization. In Fig. 1 (left), by assuming the demonstrations (cyan, green) are globally-optimal and sampling lower-cost trajectories (the middle state on each trajectory is plotted in red), the hatched area is implied guaranteed unsafe, as any axis-aligned box containing the sampled unsafe states (in red) must also contain the hatched area. In contrast, assuming local optimality gives us zero volume learned guaranteed safe/unsafe, as a measure-zero horizontal line obstacle (orange dashed) can make the demonstrations locally-optimal: as the line supports the middle state on each demonstration, the cost cannot be locally improved. In Fig. 1, center, we show a case where there is no gap in learnability: without assuming a parameterization, the demonstrations can be explained by two horizontal line obstacles, but together with the box parameterization, we recover and . Fig. 1 (right) shows that assuming global optimality may result in non-conservative constraint recovery (e.g. if the dotted red line were a sampled unsafe trajectory), while a horizontal line obstacle (orange dashed line) can explain local optimality of the demonstration, yielding conservative constraint recovery.

Fig. 1: Left: local learns less than global. Center: local learns the same as global. Right: global recovers non-conservative solution. Red: sampled unsafe trajectories. Pink: true constraint. Green/cyan: demonstrations.

Effects of cost uncertainty: We show that learnability under cost uncertainty is more related to the possible behaviors that a cost uncertainty set can represent, rather than the actual size of the cost parameter space. For the demonstrations/constraint in the center plot of Fig. 1, consider the following cost uncertainty sets: A) , where and B) , where for all . While Set A has a much smaller parameter space compared to Set B (2 vs 20 parameters), allowing to take negative values enables the case where the demonstrator wants to maximize path length (i.e. set ). For fixed start/goal states and control constraints, the observed demonstrations are actually locally-optimal with respect to a cost function which maximizes path length in an environment with no box state space constraint. Hence, for Set A, our method returns . In contrast, while Set B has a much larger parameter space, the range of allowable behaviors is small (all cost terms must penalize path length). Thus, despite the large cost parameter space, and .

[capbesideposition=right,center,capbesidewidth=3.1cm]figure[]

Fig. 2: Nonlinear constraint. Blue: true constraint boundary. Red/green states: learned in . Purple/orange: two demonstrations.

Nonlinear constraint: We emphasize that while our method requires an affine parameterization in the constraint parameters, constraints that are nonlinear in the state can still be learned. Consider a parameterization