I Introduction
Proper modeling of a stochastic system of interest is a key step towards successful control and decision making under uncertainty. In particular, accurate characterization of the underlying probability distribution is crucial, as it encodes how we expect the system to behave unexpectedly over time. However, such a modeling process can pose significant challenges in realworld problems. On the one hand, we may have only limited knowledge of the underlying system, which would force us to use an erroneous model. On the other hand, even if we can perfectly model a complicated stochastic phenomenon, such as a complex multimodal distribution, it may still not be appropriate for the sake of realtime control or planning. Indeed, many modelbased stochastic control methods require a Gaussian noise assumption, and many of the others need computationally intensive sampling. The present work addresses this problem via distributionally robust control, wherein a potential distributional mismatch is considered between a baseline Gaussian process noise and the true, unknown model within a certain KullbackLeibler (KL) divergence bound. The use of the Gaussian distribution is advantageous to retain computational tractability without the need for sampling in the state space. Our contribution is a novel model predictive control (MPC) method for nonlinear, nonGaussian systems with nonconvex costs. This controller would be useful, for example, to safely navigate a robot among human pedestrians while the stochastic transition model for humans is not perfect. It is important to note that our contribution is built on the mathematical equivalence between distributionally robust control and risksensitive optimal control [22]. Unlike the conventional stochastic optimal control that is concerned with the expected cost, risksensitive optimal control seeks to optimize the following entropic risk measure [18]:
(1) 
where is a probability distribution characterizing any source of randomness in the system, is a userdefined scalar parameter called the risksensitivity parameter, and is an optimal control cost. The risksensitivity parameter
determines a relative weight between the expected cost and other higherorder moments such as the variance
[36]. Loosely speaking, the larger becomes, the more the objective cares about the variance and is thus more risksensitive.Our distributionally robust control algorithm can alternatively be viewed as an algorithm for automatic online tuning of the risksensitivity parameter in applying risksensitive control. Risksensitive optimal control has been shown to be effective and successful in many robotics applications [19, 20, 2, 21]. However, in prior work the user has to specify a fixed risksensitivity parameter offline. This would require an extensive trial and error process until a desired robot behavior is observed. Furthermore, a risksensitivity parameter that works in a certain state can be infeasible in another state, as we will see in Section IV. Ideally, the risksensitivity should be adapted online depending on the situation to obtain a specifically desired robot behavior [20, 21], yet this is highly nontrivial as no simple general relationship is known between the risksensitivity parameter and the performance of the robot. Our algorithm addresses this challenge as a secondary contribution. Due to the fundamental equivalence between distributionally robust control and risksensitive control, it serves as a nonlinear risksensitive control that can dynamically adjust the risksensitivity parameter depending on the state of the robot as well as the surrounding environment. The rest of the paper is organized as follows. Section II reviews the related work in controls and robotics literature. Section III summarizes the theoretical results originally presented in [22] that connect distributionally robust control to risksensitive optimal control. Section IV develops this theory into an algorithm that provides a locally optimal solution for general nonlinear systems with nonconvex cost functions, which is a novel contribution of this paper. In Section V, we test our method in a collision avoidance scenario wherein the predictive distribution of pedestrian motion is erroneous. We further show its benefits as a risksensitive optimal controller that can automatically adjust its risksensitivity parameter in this section. The paper concludes in Section VI with potential future research directions.
Ii Related Work
Iia Distributional Robustness and RiskSensitivity
Distributionally robust control seeks to optimize control actions against a worstcase distribution within a given set of probability distributions, often called the ambiguity set [32, 26]. There exist various formulations to account for distributional robustness in optimal control. Some works are concerned with minimizing the worstcase expectation of a cost objective [26, 28], while others enforce riskbased or chance constraint satisfaction under a worstcase distribution [10, 32]. The present work belongs to the former class. Existing methods also differ in the formulation of the ambiguity set. Momentbased ambiguity sets require knowledge of moments of the groundtruth distribution up to a finite order [32, 26], which is often overly conservative [10]. Statistical distancebased ambiguity sets are also gaining attention. The authors of [10] uses a Wasserstein metric to define the ambiguity set for motion planning with collision avoidance, but their MPC formulation is not suited for nonlinear systems. divergence and more general divergences (which KL divergence belongs to) are employed in [28], similar to the present work. However, the ambiguity set considered in [28] is restricted to categorical distributions, while our work requires no assumption on the class of the groundtruth distributions. Furthermore, we make use of risksensitive optimal control to obtain planned robot trajectories with feedback, unlike sampling in their implementation. Optimization of the entropic risk measure has been an active research topic in economics and controls literature since 1970s [14, 35, 36, 8]. The concept of risksensitive optimal control has been successfully applied to robotics in various domains, including haptic assistance [19, 20]
, modelbased reinforcement learning (RL)
[2], and safe robot navigation [21, 34], to name a few. In all these works, the risksensitivity parameter is introduced as a userspecified constant, and is found to significantly affect the behavior of the robot. For instance, our prior work on safe robot navigation in human crowds [21] reveals that a robot with higher risksensitivity tends to yield more to oncoming human pedestrians. However, how to find a desirable risksensitivity parameter still remains an open research question; in the robot navigation problem, the robot simply freezes if it is too risksensitive when the scene is crowded. As the authors of [20] point out, the robot should adapt its risksensitivity level depending on the situation, yet there still does not exist an effective algorithmic framework to automate it due to the issues discussed in Section I. In this work, we provide such an algorithm for nonlinear, nonGaussian stochastic systems. As mentioned earlier, our approach is built on previouslyestablished theoretical results that link risksensitive and distributionally robust control [22].IiB Approximate Methods for Optimal Feedback Control
The theory of optimal control lets us derive an optimal feedback control law via dynamic programming (DP) [4]
. For linear systems with additive Gaussian white noise and quadratic cost functions, the exact DP solution is tractable and is known as LinearQuadraticGaussian (LQG)
[1] or LinearExponentialQuadraticGaussian (LEQG) [35]. They are different in that LQG optimizes the expected cost while LEQG optimizes the entropic risk measure, although both DP recursions are quite similar. However, solving general optimal control problems for nonlinear systems remains a challenge due to lack of analytical tractability. Hence, approximate local optimization methods have been developed, including Differential Dynamic Programming (DDP) [13], iterative LinearQuadratic Regulator (iLQR) [17], and iterative LinearQuadraticGaussian (iLQG) [30, 29]. While both DDP and iLQR are designed for deterministic systems with quadratic cost functions, iLQG can locally optimize the expected cost objective for Gaussian stochastic systems with nonconvex cost functions. Similarly, the iterative LinearExponentialQuadraticGaussian (iLEQG) has been recently proposed to locally optimize the entropic risk for Gaussian systems with nonconvex costs [9, 34, 23]. Note however that they are not designed to be robust to model mismatches that we consider in this paper. In fact, it is known that even LQG does not possess guaranteed robustness [7].Iii Problem Statement
Iiia Distributionally Robust Optimal Control
Consider the following stochastic nonlinear system:
(2) 
where denotes the state, the control, and the noise input to the system at time . For some finite horizon , let
denote the joint noise vector with probability distribution
. This distribution is assumed to be a known Gaussian white noise process, i.e. is independent of for all , and we call (2) the reference system. Ideally, we would like the model distribution to perfectly characterize the noise in the dynamical system. However, in reality the noise may come from a different, more complex distribution which we may not know exactly. Let denote a perturbed noise vector that is distributed according to . We define the following perturbed system that characterizes the true but unknown dynamics:(3) 
Note that we make no assumptions on Gaussianity or whiteness of . One could also attribute it to potentially unmodeled dynamics. The true, unknown probability distribution is contained in the set of all probability distributions on the support . We assume that is not “too different” from . This is encoded as the following constraint on the KL divergence between and :
(4) 
where is the KL divergence and is a given constant. Note that always holds, with equality if and only if . The set of all possible probability distributions satisfying (4) is denoted by , which we define as our ambiguity set. Note that is a convex subset of for a fixed . We are interested in controlling the perturbed system (3) with a state feedback controller of the form . The operator defines a mapping from into . The class of all such controllers is denoted . The cost functions considered in this paper are given by
(5) 
We assume that the above objective satisfies the following nonnegativity assumption.
Assumption 1 (Assumption 3.1, [22])
The functions and satisfy and for all , , and .
Under the dynamics model (3), the cost model (5), and the KL divergence constraint (4) on , we are interested in finding an admissible controller that minimizes the worstcase expected value of the cost objective (5). In other words, we are concerned with the following distributionally robust optimal control problem:
(6) 
where indicates that the expectation is taken with respect to the true, unknown distribution . In this formulation, the robustness arises from the ability of the controller to plan against a worstcase distribution in the ambiguity set .
Remark 1
If the KL divergence bound is zero, then is necessary. In this degenerate case, (6) reduces to the standard stochastic optimal control problem:
(7) 
IiiB Equivalent RiskSensitive Optimal Control
Unfortunately, the distributionally robust optimal control problem (6) is intractable as it involves maximization with respect to a probability distribution . To circumvent this, [22] proves that problem (6) is equivalent to a bilevel optimization problem involving risksensitive optimal control with respect to the model distribution . We refer the reader to [22] for the derivation and only restate the main results in this section for selfcontainedness. Before doing so, we impose an additional assumption on the worstcase expected cost.
Assumption 2 (Assumption 3.2, [22])
For any admissible controller , the resulting closedloop system satisfies
(8) 
This assumption states that, without the KL divergence constraint, some adversariallychosen noise could make the expected cost objective arbitrarily large in the worst case. It amounts to a controllabilitytype assumption with respect to the noise input and an observabilitytype assumption with respect to the cost objective [22]. Under Assumptions 1 and 2, the following theorem holds.
Theorem 1
See Theorems 3.1 and 3.2 in [22].
Remark 2
Notice that the first term in the righthand side of (9) is the entropic risk measure , where the risk is computed with respect to the model distribution and serves as the inverse of the risksensitivity parameter. Rewriting the equation in terms of the risksensitivity parameter , we see that the righthand side of (9) is equivalent to
(11) 
where . Theorem 1 shows that the original distributionally robust optimal control problem (6) is mathematically equivalent to a bilevel optimization problem (11) involving risksensitive optimal control. Note that the new problem does not involve any optimization with respect to the true distribution .
Iv RAT iLQR ALGORITHM
Even though the mathematical equivalence shown in [22] and summarized in Section IIIB is general, it does not immediately lead to a tractable method to efficiently solve (11) for general nonlinear systems. There are two major challenges to be addressed. First, exact optimization of the entropic risk with a state feedback control law is intractable, except for linear systems with quadratic costs. Second, the optimal risksensitivity parameter has to be searched efficiently over the feasible space , which not only is unknown but also varies dependent on the initial state . A novel contribution of this paper is a tractable algorithm that approximately solves both of the problems for general nonlinear systems with nonconvex cost functions. In what follows, we detail how we solve both the inner and the outer loop of (11) to develop a distributionallyrobust, risksensitive MPC.
Iva Iterative LinearExponentialQuadraticGaussian
Let us first consider the inner minimization of (11):
(12) 
where we omitted the extra term as it is constant with respect to the controller . This amounts to solving a risksensitive optimal control problem for a nonlinear Gaussian system. Recently, a computationallyefficient, local optimization method called iterative LinearExponentialQuadraticGaussian (iLEQG) has been proposed for both continuoustime systems [9] and the discretetime counterpart [23, 34]. Both versions locally optimize the entropic risk measure with respect to a receding horizon, affine feedback control law for general nonlinear systems with nonconvex costs. We adopt a variant of the discretetime iLEQG algorithm [34] to obtain a locally optimal solution to (12). In what follows, we assume that the noise coefficient function in (2) is the identity mapping for simplicity, but it is straightforward to handle nonlinear functions in a similar manner as discussed in [30]. The algorithm starts by applying a given nominal control sequence to the noiseless dynamics to obtain the corresponding nominal state trajectory . In each iteration, the algorithm maintains and updates a locally optimal controller of the form:
(13) 
where denotes the feedback gain matrix. The th iteration of our iLEQG implementation consists of the following four steps:

[label=0)]

Local Approximation: Given the nominal trajectory , we compute the following linear approximation of the dynamics as well as the quadratic approximation of the cost functions:
(14) (15) (16) (17) (18) (19) (20) (21) for to , where is the differentiation operator. We also let , , and .

Backward Pass: We perform approximate DP using the current feedback gain matrices as well as the approximated model obtained in the previous step. Suppose that the noise vector is Gaussiandistributed according to with . Let , , and . Given these terminal conditions, we recursively compute the following quantities:
(22) (23) (24) (25) and
(26) (27) (28) from down to . Note that is necessary so it is invertible, which may not hold if is too large. This is called “neurotic breakdown,” when the optimizer is so pessimistic that the costtogo approximation becomes infinity [36]. Otherwise, the approximated costtogo for this optimal control (under the controller ) is given by .

Regularization and Control Computation: Having derived the DP solution, we compute new control gains and offset updates as follows:
(29) (30) where is a regularization parameter to prevent
from having negative eigenvalues. We adaptively change
across multiple iterations as suggested in [29], so the algorithm enjoys fast convergence near a local minimum while ensuring the positivedefiniteness of at all times. 
Line Search for Ensuring Convergence: It is known that the update could lead to increased cost or even divergence if a new trajectory strays too far from the region where the local approximation is valid [29]. Thus, the new nominal control trajectory is computed by backtracking line search with line search parameter . Initially, and we derive a new candidate nominal trajectory as follows:
(31) (32) If this candidate trajectory results in a lower costtogo than the current nominal trajectory, then the candidate trajectory is accepted and returned as . Otherwise, the trajectory is rejected and rederived with until it is accepted. More details on this line search can be found in [31].
The above procedure is iterated until the nominal control does not change beyond some threshold in a norm. Once converged, the algorithm returns the nominal trajectory as well as the feedback gains and the approximate costtogo .
IvB CrossEntropy Method
Having implemented the iLEQG algorithm for the innerloop optimization of (11), it remains to solve the outerloop optimization for the optimal risksensitivity parameter . This is a onedimensional optimization problem in which the function evaluation is done by solving the corresponding risksensitive optimal control (12). In this work we choose to adapt the cross entropy method [24, 15] to derive the approximately optimal value for . This method is favorable for online optimization due to its anytime nature and high parallelizability of the Monte Carlo sampling, but it is possible to use other methods as well. The cross entropy method is a stochastic method that maintains an explicit probability distribution over the design space. At each step, a set of Monte Carlo samples is drawn from the distribution, out of which a subset of
“elite samples” that achieve the best performance is retained. The parameters of the distribution is then updated according to the maximum likelihood estimate on the elite samples. The algorithm stops after a desired number of steps
. In our implementation we model the distribution as univariate Gaussian . A remaining issue is that the iLEQG may return the costtogo of infinity if a sampled is too large, due to neurotic breakdown. Since our search space is limited to where yields a finite costtogo, we have to ensure that each iteration has enough samples in . To address this problem, we augment the cross entropy method with rejection and resampling. Out of the samples drawn from the univariate Gaussian, we first discard all nonpositive samples. For each of the remaining samples, we evaluate the objective (11) by a call to iLEQG, and then count the number of samples that obtained a finite costtogo. Let be the number of such valid samples. If , we proceed and fit the distribution. Otherwise, we redo the sampling procedure as there are not sufficiently many valid samples to choose the elites from. In practice, resampling is not likely to occur after the first iteration of the cross entropy method. At the same time, we empirically found that the first iteration has a risk of resampling multiple times, hence degrading the efficiency. We therefore also perform an adaptive initialization of the Gaussian parameters and in the first iteration as follows. If the first iteration with results in resampling, we not only resample but also divide and by half. If all of the samples are valid, on the other hand, we accept them but double and , since it implies that the initial set of samples is not widespread enough to cover the whole feasible set . The parameters and are stored internally in the cross entropy solver and carried over to the next call to the algorithm.IvC RAT iLQR as MPC
We name the proposed bilevel optimization algorithm RAT iLQR. The pseudocode is given in Algorithm 1. At run time, it is executed as an MPC in a recedinghorizon fashion; the control is recomputed after executing the first control input and transitioning to a new state. A previouslycomputed control trajectory is reused for the initial nominal control trajectory at the next time step to warmstart the computation.
V Results
This section presents qualitative and quantitative results of the simulation study that we conducted to show the effectiveness of the RAT iLQR algorithm. We provide the problem setup as well as implementation details in Section VA. The goals of this study are twofold. First, we demonstrate that the robot controlled by RAT iLQR can successfully accomplish its task under the presence of stochastic disturbance, without access to the groundtruth distribution but the knowledge of the KL divergence bound. This is presented in Section VB with comparisons to (nonrobust) iLQG and a modelbased MPC with sampling from the true generative model. Second, Section VC focuses on the nonlinear risksensitive optimal control aspect of RAT iLQR to show its value as an algorithm that can optimally adjust the risksensitivity parameter online, which itself is a novel contribution.
Va Problem Setup
We consider a dynamic collision avoidance problem where a unicycle robot has to avoid a pedestrian in a collision course as illustrated in Figure 2. Collision avoidance problems are often modeled by stochastic optimal control in the autonomous systems literature [16] and the humanrobot interaction literature [27, 12], including our prior work [21]. The state of the robot is defined by , where [m] denotes the position, [m/s] the velocity, and [rad] the heading angle. The robot’s control input is , where [m/s] is the acceleration and [rad/s] is the angular velocity. The pedestrian is modeled as a single integrator, whose position is given by [m]. We assume a constant nominal velocity [m/s] for the pedestrian. The joint system is in . The dynamics are propagated by Euler integration with time interval [s] and additive noise to the joint state. The model distribution for is a zeromean Gaussian with . The groundtruth distribution for the robot is the same Gaussian as in the model, but the pedestrian’s distribution is a mixture of Gaussians that is independent of the robot’s noise. Both the model and the true distributions for the pedestrian are illustrated in Figure 1
. Gaussian mixtures are favored by many recent papers in machine learning to account for multimodality in human’s decision making
[5, 33, 25]. RAT iLQR requires an upperbound on the KL divergence between the model and the true distributions. For the sake of this paper we assume that there is a separate module that provides an estimate. In this specific simulation study, we performed Monte Carlo integration with samples drawn from the true distribution offline. During the simulation, however, we did not reveal any information on the true distribution to RAT iLQR but the estimated KL value of 32.02. This offline computation was possible due to our timeinvariant assumption on the Gaussian mixture. If one is to use more realistic datadriven prediction instead, it is necessary to estimate the KL divergence online since the predictive distribution may change over time as the humanrobot interaction evolves. Note that efficient and accurate estimation of information measures (including KL divergence) is an active area of research in information theory and machine learning [3, 11], which is one of our future research directions. The cost functions for this problem are given by(33) 
where denotes a quadratic cost that penalizes the deviation from a given target robot trajectory, a collision penalty that incurs high cost when the robot is too close to the pedestrian, and a small quadratic cost on the control input. Mirroring the formulation in [34], we used the following collision cost:
(34) 
RAT iLQR was implemented in Julia and the Monte Carlo sampling of the cross entropy method was distributed across multiple CPU cores. Our implementation with , , , and yielded the average computation time of 0.27 [s], which is 2.7 times slower than real time. We expect to achieve improved efficiency by further parameter tuning as well as more careful parallelization.
VB Comparison with Baseline MPC Algorithms
We compared the performance of RAT iLQR against two baseline MPC algorithms, iLQG [30] and PETS [6]. iLQG corresponds to RAT iLQR with the KL bound of , i.e. no distributional robustness is considered. Instead it is more computationally efficient than RAT iLQR, taking only 0.01 [s]. PETS is a stateoftheart, modelbased stochastic MPC algorithm with sampling and is originally proposed in a modelbased reinforcement learning (RL) context. We chose PETS as our baseline since it also relies on the cross entropy method for online control optimization and is not limited to Gaussian distributions, similar to RAT iLQR. However, there are three major differences between PETS and RAT iLQR. First, PETS performs the cross entropy optimization directly in the highdimensional control sequence space, which is far less sample efficient than RAT iLQR which uses the cross entropy method to only optimize the scalar risksensitivity parameter. Second, PETS does not consider feedback during planning as opposed to RAT iLQR. Third, PETS requires access to the exact groundtruth Gaussian mixture distribution to perform sampling, while RAT iLQR only relies on the KL divergence bound and the Gaussian distribution that we have modeled. We let PETS perform iterations of the cross entropy optimization, each with samples for the control sequence coupled with samples for the joint state trajectory prediction, which resulted in the average computation time of 0.67 [s].
Method  Min. Sep. Dist. [m]  Total Collision Count 

RAT iLQR (Ours)  0  
iLQG [20]  1  
PETS [36]  4 
. RAT iLQR achieved the largest average value for the minimum separation distance with the smallest standard deviation, which contributed to safe robot navigation without a single collision. Note that PETS had multiple collisions despite its access to the true Gaussian mixture distribution.
We performed 30 runs of the simulation for each algorithm, with randomized pedestrian start positions and stochastic transitions. To measure the performance, we computed the minimum separation distance between the robot and the pedestrian in each run, assuming that the both agents are circular with the same diameter. The histogram plots presented in Figure 3 clearly indicates the failure of iLQG and PETS as well as RAT iLQR’s capability to maintain a sufficient safety margin for collision avoidance despite the distributional model mismatch. As summarized in Table I, RAT iLQR achieved the largest minimum separation distance on average with the smallest standard deviation, which contributed to safe robot navigation. Note that even iLQG had one collision under this large model mismatch. Figure 2 provides a qualitative explanation of this failure; the planned trajectories by iLQG tend to be much closer to the passing pedestrian than those by the risksensitive RAT iLQR. This difference is congruous with our earlier observations in prior work [21] where risksensitive optimal control is shown to affect the global behavior of the robot.
VC Benefits of RiskSensitivity Parameter Optimization
KL Bound:  

Method  Total Collision Count  Tracking Error [m] 
RAT iLQR (Ours)  0  
iLEQG with  0  
KL Bound:  
Method  Total Collision Count  Tracking Error [m] 
RAT iLQR (Ours)  0  
iLEQG with  0  
KL Bound:  
Method  Total Collision Count  Tracking Error [m] 
RAT iLQR (Ours)  0  
iLEQG with  0 
Aside from the baseline comparison, we also performed two additional sets of 30 simulations for RAT iLQR, with different groundtruth distributions that have smaller KL divergences of and . This is to study how the KL divergence bound given to RAT iLQR affects the resulting risksensitivity parameter . The results are shown in Figure 4. As the KL bound increases from 1.34 to 32.02, the ratio between the optimal found by RAT iLQR to the maximum feasible also increases. This matches our intuition that the larger the model mismatch, the more risksensitive the robot becomes. However, we also note that the robot does not saturate all the time even under the largest KL bound of 32.02. This raises a fundamental question on the benefits of RAT iLQR as a risksensitive optimal control algorithm: how favorable is RAT iLQR with optimal compared to iLEQG with the highest risksensitivity? To answer this question, we performed a comparative study between RAT iLQR with and iLEQG with (i.e. maximum feasible found during the cross entropy sampling of RAT iLQR) under the same simulation setup as before. The results are reported in Table II. In terms of collision avoidance, both algorithms were equally safe with collision count of . However, RAT iLQR achieved significantly more efficient robot navigation compared to iLEQG with , reducing the average tracking error by , , and for the KL values of , , and , respectively. The efficiency and the safety of robot navigation are often in conflict in dynamic collision avoidance, and prior work [21] struggles to find the right balance by manually tuning a fixed . With RAT iLQR, such need for manual tuning is eliminated since the algorithm dynamically adjusts so it is the most desirable to handle the potential model mismatch specified by the KL bound.
Vi Conclusions
In this work we propose RAT iLQR, a novel nonlinear MPC algorithm for distributionally robust control under a KL divergence bound. Our method is based on the mathematical equivalence between distributionally robust control and risksensitive optimal control. A locally optimal solution to the resulting bilevel optimization problem is derived with iLEQG and the cross entropy method. The simulation study shows that RAT iLQR successfully accounts for the distributional mismatch during collision avoidance. It also shows the effectiveness of dynamic adjustment of the risksensitivity parameter by RAT iLQR, which overcomes a limitation of conventional risksensitive optimal control methods. Future work will focus on accurate online estimation of the KL divergence from a stream of data. We are also interested in exploring applications of RAT iLQR, including control of learned dynamical systems and perceptionaware control.
References
 [1] (1970) Introduction to stochastic control theory. Academic Press. Cited by: §IIB.
 [2] (2020) Curious ilqr: resolving uncertainty in modelbased rl. In Conference on Robot Learning, pp. 162–171. Cited by: §I, §IIA.
 [3] (2018) Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062. Cited by: §VA.
 [4] (1976) Dynamic programming and stochastic control. Academic Press, Inc., USA. External Links: ISBN 0120932504 Cited by: §IIB.
 [5] (2019) MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. In Conference on Robot Learning, pp. 86–99. Cited by: §VA.
 [6] (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems, pp. 4754–4765. Cited by: §VB.
 [7] (1978) Guaranteed margins for lqg regulators. IEEE Transactions on automatic Control 23 (4), pp. 756–757. Cited by: §IIB.
 [8] (2016) Gametheoretic and risksensitive stochastic optimal control via forward and backward stochastic differential equations. In 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 6154–6160. Cited by: §IIA.
 [9] (2015) Risk sensitive, nonlinear optimal control: iterative linear exponentialquadratic optimal control with gaussian noise. arXiv preprint arXiv:1512.07173. Cited by: §IIB, §IVA.
 [10] (2020) Wasserstein distributionally robust motion planning and control with safety constraints using conditional valueatrisk. In 2020 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 490–496. Cited by: §IIA.
 [11] (2019) Estimation of information measures and its applications in machine learning. Ph.D. Thesis, University of Michigan. Cited by: §VA.
 [12] (2020) MATS: an interpretable trajectory forecasting representation for planning and control. arXiv preprint arXiv:2009.07517. Cited by: §VA.
 [13] (1970) Differential dynamic programming. Cited by: §IIB.
 [14] (1973) Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Transactions on Automatic control 18 (2), pp. 124–131. Cited by: §IIA.
 [15] (2019) Algorithms for optimization. Mit Press. Cited by: §IVB.
 [16] (2019) Informed information theoretic model predictive control. In 2019 International Conference on Robotics and Automation (ICRA), pp. 2047–2053. Cited by: §VA.
 [17] (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems.. In ICINCO (1), pp. 222–229. Cited by: §IIB.
 [18] (2020) How should a robot assess risk? towards an axiomatic theory of risk in robotics. In Robotics Research, pp. 75–84. Cited by: §I.
 [19] (2012) Risksensitive optimal feedback control for haptic assistance. In 2012 IEEE international conference on robotics and automation, pp. 1025–1031. Cited by: §I, §IIA.
 [20] (2012) Disagreementaware physical assistance through risksensitive optimal feedback control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3639–3645. Cited by: §I, §IIA.
 [21] (2020) Risksensitive sequential action control with multimodal human trajectory forecasting for safe crowdrobot interaction. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §I, §IIA, §VA, §VB, §VC.
 [22] (2000) Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Transactions on Automatic Control 45 (3), pp. 398–412. Cited by: §I, §I, §IIA, §IIIB, §IIIB, §IIIB, §IV, Assumption 1, Assumption 2.
 [23] (2020) On the convergence of the iterative linear exponential quadratic gaussian algorithm to stationary points. In 2020 American Control Conference (ACC), pp. 132–137. Cited by: §IIB, §IVA.

[24]
(2013)
The crossentropy method: a unified approach to combinatorial optimization, montecarlo simulation and machine learning
. Springer Science & Business Media. Cited by: §IVB. 
[25]
(202008)
Trajectron++: dynamicallyfeasible trajectory forecasting with heterogeneous data.
In
European Conf. on Computer Vision
, . Cited by: §VA.  [26] (2017) Datadriven distributionally robust control of energy storage to manage wind power fluctuations. In 2017 IEEE Conference on Control Technology and Applications (CCTA), Vol. , pp. 199–204. Cited by: §IIA.
 [27] (2018) Multimodal probabilistic modelbased planning for humanrobot interaction. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. Cited by: §VA.
 [28] (2020) FormulaZero: distributionally robust online adaptation via offline population synthesis. arXiv preprint arXiv:2003.03900. Cited by: §IIA.
 [29] (2012) Synthesis and stabilization of complex behaviors through online trajectory optimization. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913. Cited by: §IIB, item 3, item 4.
 [30] (2005) A generalized iterative lqg method for locallyoptimal feedback control of constrained nonlinear stochastic systems. In Proceedings of the 2005, American Control Conference, 2005., pp. 300–306. Cited by: §IIB, §IVA, §VB.
 [31] (2012) Motion planning under uncertainty using iterative local optimization in belief space. The International Journal of Robotics Research 31 (11), pp. 1263–1278. Cited by: item 4.
 [32] (2016) Distributionally robust control of constrained stochastic systems. IEEE Transactions on Automatic Control 61 (2), pp. 430–442. Cited by: §IIA.
 [33] (2020) Fast risk assessment for autonomous vehicles using learned models of agent futures. In Robotics: Science and Systems 2020, Cited by: §VA.
 [34] (2020) Gametheoretic planning for riskaware interactive agents. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Cited by: §IIA, §IIB, §IVA, §VA.
 [35] (1981) Risksensitive linear/quadratic/gaussian control. Advances in Applied Probability 13 (4), pp. 764–777. Cited by: §IIA, §IIB.
 [36] (2002) Risk sensitivity, a strangely pervasive concept. Macroeconomic Dynamics 6 (1), pp. 5–18. Cited by: §I, §IIA, item 2.
Comments
There are no comments yet.