DeepAI
Log In Sign Up

Data-Driven Safety Verification for Legged Robots

02/24/2022
by   Junhyeok Ahn, et al.
The University of Texas at Austin
0

Planning safe motions for legged robots requires sophisticated safety verification tools. However, designing such tools for such complex systems is challenging due to the nonlinear and high-dimensional nature of these systems' dynamics. In this letter, we present a probabilistic verification framework for legged systems, which evaluates the safety of planned trajectories by learning an assessment function from trajectories collected from a closed-loop system. Our approach does not require an analytic expression of the closed-loop dynamics, thus enabling safety verification of systems with complex models and controllers. Our framework consists of an offline stage that initializes a safety assessment function by simulating a nominal model and an online stage that adapts the function to address the sim-to-real gap. The performance of the proposed approach for safety verification is demonstrated using a quadruped balancing task and a humanoid reaching task. The results demonstrate that our framework accurately predicts the systems' safety both at the planning phase to generate robust trajectories and at execution phase to detect unexpected external disturbances.

READ FULL TEXT VIEW PDF

page 3

page 6

page 7

03/20/2019

Walking with Confidence: Safety Regulation for Full Order Biped Models

Safety guarantees are valuable in the control of walking robots, as fall...
11/22/2018

Verification of Planning Domain Models - Revisited

The verification of planning domain models is crucial to ensure the safe...
01/19/2023

Interval Reachability of Nonlinear Dynamical Systems with Neural Network Controllers

This paper proposes a computationally efficient framework, based on inte...
05/09/2022

A Verification Framework for Certifying Learning-Based Safety-Critical Aviation Systems

We present a safety verification framework for design-time and run-time ...
11/10/2022

Set based velocity shaping for robotic manipulators

We develop a new framework for trajectory planning on predefined paths, ...
03/07/2020

A Safety Framework for Critical Systems Utilising Deep Neural Networks

Increasingly sophisticated mathematical modelling processes from Machine...

I Introduction

Safe motion planning for legged systems should be of essential consideration to prevent falling or colliding with obstacles. The main challenge in safe motion planning is to design safety verification tools that accurately evaluate whether a system will satisfy safety constraints while it is stabilized along desired trajectories by using a given feedback controller and without being too conservative.

In this letter, we propose a framework that learns a safety assessment function that can provide probabilistic verification for motion planning. Our framework trains this function using trajectory data. We rollout a number of trajectories using a nominal model and embed them with their safety properties into a low-dimensional space in which we define their safety probabilities. During the execution phase, upcoming desired trajectories are mapped to this low-dimensional space, and the safety probability is estimated before execution. Note that since the safety probability is computed based on the nominal model, there is a reality gap. In order to reduce this gap, we perform an online adaptation process as we collect trajectories during execution.

Related Work: Recent work on robust motion planning has considered safety verification methods that characterizes funnels around planned trajectories. The authors in [1] employed a linear feedback controller and estimated regions of attraction of the closed-loop system by searching Lyapunov functions, and [2, 3] showed robust motion planning on aerial robots. A similar new approach, based on Hamilton-Jacobi reachability analysis [4] and contraction theory [5], proposed an offline characterization of tracking error bounds around trajectories. However, these techniques are computationally intensive and limited to a small class of systems, which make it difficult to be deployed for legged robots which are generally modeled as high-dimensional and hybrid system with sophisticated feedback controllers.

Model predictive control (MPC) has shown to be a promising tool to perform dynamic constrained trajectory optimization. In particular, tube-based MPC considers a simple ancillary feedback controller to bind output trajectories around a nominal path and verifies safety satisfactions for all realizations of uncertainties [6, 7]. The authors in [8] applied this technique to bipedal walking assuming a linear pendulum model and a simple controller. However, computing invariant tubes for highly non-linear and hybrid systems with sophisticated feedback controllers is challenging. The work in [9] proposed to learn distributions of output trajectories in a data-driven manner, which can then be used for safety verification, but the data-efficiency and sim-to-real gap issues have not been addressed for robot deployment.

The studies in [10, 11] considered a Bayesian optimization technique which evaluates planned trajectories executed with a closed-loop controller and use them to find planner parameters. The authors in [12, 13] trained policies using closed-loop systems to generate swing foot trajectories for walking motion. These frameworks make it possible to optimize planner parameters and to design trajectories such that the resulting closed-loop behaviors satisfy safety constraints. However, these verification methods evaluate trajectory safety only at the planning phase, making it difficult to detect unsafe states arising during execution, for instance, due to unexpected disturbances.

Fig. 1: The safety assessment module evaluates the probability of safety of the closed-loop system by taking into account information from the trajectory planner and from the feedback controller.

The idea of embedding system safety information into a low-dimensional space is not new and has been previously presented in [14]. In this work, the authors proposed a framework that learns a low-dimensional representation of regions of attraction of a closed-loop autonomous system. In our work, we extend this idea and learn a safety assessment function for a closed-loop trajectory tracking system. For closed-loop autonomous systems, the initial states on their own determine the evolution of the systems and therefore, their safety characteristics. On the contrary, closed-loop trajectory tracking systems have external inputs (e.g., desired trajectories), which affect the evolution of the system and, thus, require a special safety treatment. For instance, we have to properly measure which specific pieces of a desired trajectory could result in future failure. To this end, we re-evaluate the computation methods described in [14] and extend them for safety verification for executing planned trajectories, while preserving algorithmic benefits.

Contributions: Our key contributions are the following:

  1. [label=()]

  2. We propose a framework that learns a safety assessment function that evaluates whether desired trajectories are safe before and during execution. In particular, we investigate a data structure, data generation pipeline, and safety-related properties needed for training.

  3. Our framework incorporates numerous algorithmic advantages, in particular:

    1. [label=()]

    2. It does not require an analytic expression of the closed-loop system to train a safety assessment function, which allows us to reason about safety for complicated systems.

    3. It is data-efficient and is able to address the sim-to-real gap, which is crucial for real system implementation.

    4. Our safety assessment function can provide safety predictions for the trajectories both when generating robust plans and executing to detect unexpected external disturbances.

  4. We deploy our framework in a quadruped balancing task and a humanoid reaching task and show that our framework can open up a number of interesting possibilities for algorithm development. In the quadruped balancing task, we integrate a back-up recovery step planner that is triggered based on safety predictions, and in the humanoid reaching task, we provide a robot self-assessment capability to estimate the likelihood of safe task completion for human-robot interaction.

Ii Problem Statement

Fig. 2: The safety assessment function is initialized through offline process using trajectory data from the nominal system, and then updated through the online adaptation process using trajectory data from the real system to reduce a sim-to-real gap.

Consider a discretized system given by

(1)

where , , are the system state, input, and disturbances.

is the output vector that can be measured from system state (e.g., end-effector positions in task space). We further assume to have a planner that computes a desired trajectory

, where represents a planning horizon, and denotes a desired output. Given a tracking controller , the closed-loop system dynamics is denoted as

(2)

Then, the solution trajectory of the closed-loop system can be recursively computed from the starting state and the upcoming desired trajectory with the expression

(3)

As illustrated in Fig. 1, our goal is to make a receding horizon prediction about the safety of the closed-loop system with the current state measurement and upcoming desired trajectory. To be more specific, at current time index , we want to predict the probability of all future states being safe,

(4)

using the information of and . is the user-specified safe set that could be defined with a tracking error or conservative capture region to avoid falling. Note that is the safety assessment horizon during which we look ahead and can be different from the planning horizon . is a task-dependent parameter and is chosen to contain primarily safety information. For a cyclic walking task, for example, does not need to be the trajectory duration for multiple steps, but rather just for one stepping cycle. For convenience, we concatenate the state measurement and upcoming desired trajectory and define a safety assessment input:

(5)

Using this nomenclature, our goal can be summarized to define a safety assessment function that predicts the safety probability (4) of a closed-loop system.

We consider a scenario where the real dynamical system is not perfectly known, but we assume the nominal system is available and can be simulated over time. Since the dynamics of legged systems are non-linear, high-dimensional, and hybrid and the controller are often formulated based on a numerical optimization problem, we do not have access to the analytic expressions of the closed-loop solution trajectories of either the nominal or real systems. Therefore, we propose to learn the safety assessment function in a data-driven manner. Throughout the paper, we use a tilde, , and an overline, , to represent variables related to the nominal system and the real system, respectively.

Iii Framework Overview

Our framework aims to find a low-dimensional embedding of safety assessment inputs where the low-dimensional space can be discretized into a finite number of grid cells. Then, we assign each cell a belief mass using belief function theory [15] to evaluate the safety probability of the inputs. The assignment of belief masses is denoted as basic belief assignment (BBA) and the BBA for the grid index is expressed as . Here, is the belief mass of the probability of the closed-loop system being safe when it evolves with safety assessment inputs that are mapped to and belong to the grid index . is the belief mass of the complementary event and is the uncertainty on the safety estimation. Note that it holds , and , , and are in the interval . After the BBAs for the grid cells are computed, we define a safety assessment function , where the safety assessment input is embedded in the grid cell .

To compute BBAs for grid cells, we first simulate a sufficient amount of trajectories using a nominal model. We collect safety assessment inputs from the trajectories and label them whether they yield safe behaviors or not. For each safety assessment input pair, we evaluate a distance metric to measure their similarity in terms of safety. For instance, the distance between a pair is small if they share a similar safety property (e.g., if they are both safe or unsafe) but large otherwise. Using the computed distances, we embed the safety assessment inputs into a low-dimensional space using the the t-Distributed Stochastic Neighbor Embedding (t-SNE) technique [16]. As a result, we obtain two clusters separated in a low-dimensional space: one is the collection of safety assessment inputs that result in safe behaviors and the other one is the collection of safety assessment inputs that yield unsafe behavior. Then, we discretize the low-dimensional space into grid cells and make a prior estimate of BBA for each cell with the expression .

Simulating the nominal system is usually a cheap and efficient way to initialize the low-dimensional representation of the trajectories and the safety assessment function, but is inaccurate. Therefore, an online adaptation process is followed to reduce the gap between the real and the nominal system and update the safety assessment function. As we collect trajectory data from the real system, we compare it with the behavior from the nominal closed-loop system and train a discrepancy function that reveals how reliable the training data from the nominal system was. Using the discrepancy function, we update the prior estimates of BBAs in the grid cells. At the same time, we compute a feedback estimates of the BBA for each cell using the real system’s trajectory data, which is defined as . Finally, we combine the prior and the feedback estimates of BBAs and update the safety assessment function. The overall framework including offline initialization and online adaptation is illustrated in Fig. 2.

Iv Offline Initialization of Safety Assessment Function

Iv-a Data Generation and Low-dimensional Embedding

As illustrated in Fig. 3, a planner designs a desired trajectory () using a randomly sampled planner parameter. Employing a feedback tracking controller, we simulate a nominal closed-loop system and rollout a trajectory (). We determine the trajectory to be safe if all of its states are contained in the safe region. We terminate the episode when the system reaches unsafe regions and determine the trajectory to be unsafe. We split the simulated trajectories into segments spanning a duration of , the safety assessment horizon, and create a training data set with each segment’s initial state, desired trajectory, and unsafety score. The collection of training data is denoted as , where

(6)

and is the number of training data, corresponding to the number of trajectory segments. and represent the starting state and the desired trajectory of the th trajectory segment – note that we zero the beginning time index for each segment – forming the th safety assessment input. is the unsafety score and is computed by the following rule:

(7)

where is a discount factor and is a function that takes a segment index and returns the remaining time steps from the beginning of the segment to the termination of the episode where the segment belongs to. Note that the tilde conveys that the unsafety score is evaluated using the simulated trajectory from the nominal closed-loop system. The unsafety score represents how much the segment contributes to the system’s unsafe behavior. Associating it with the discount factor, the segments that are near the episode termination are scored with higher values.

Fig. 3: Detailed view of the offline initialization process.

For each pair of training data, we measure their similarity based on their error and safety properties. First, we measure the dynamic time warping for the error signals between the th and th training data using the formula , where is the trajectory error and is the dynamic time warping operator. While a dynamic time warping measurement might reflect similarity of the safety property in general, it is still possible that safe and unsafe segments share similar trajectories. To obtain more accurate similarity measures in terms of safety, we propose a distance metric considering the dynamic time warping measurements and unsafety scores at the same time as

(8)

where denotes the maximum value among the dynamic time warping measurements and is a weighting constant multiplying the unsafety score difference. As a result, the trajectory segments which show similar error sequences and are alike in terms of safety are considered to be close.

Using this computed distance, we apply t-SNE on the training data to obtain a realization of the low-dimensional space . Based upon this embedding, we train a mapping function

, using a deep neural network by minimizing the cost function

, where is the low-dimensional embedding of the th training data, . The neural network is trained to reproduce the low-dimensional embedding constructed by t-SNE.

Iv-B Prior Estimate of BBAs on Grid Cells

We discretize the low-dimensional space into grid cells and compute a prior estimate of BBA for each cell as illustrated in Fig. 3. For convenience, we define a locating function which takes a safety assessment input and returns an index of a grid cell in which the input is embedded in the low-dimensional space. First, we define the belief assignment for each embedded training data point, , based on its unsafety score by introducing the expression , where

(9)

Here, is the belief mass of the probability of the closed-loop system’s behavior being safe when it starts at the state with the upcoming desired trajectory and is the belief mass of its complementary event. represents the confidence level on the nominal system model and is set to user-specified parameter, .

We take the belief assignments on the training data into account and further designate a belief assignment for each grid cell. Let us define, for each index , a set of BBAs , which contains the BBAs for grid cell . Then, the prior estimate of the BBA for the grid cell can be computed as

(10)

where is the number of BBAs in , is the minimum number of data for the estimate. When there is not sufficient training data in the grid cell (i.e., ), we estimate by an empty BBA , which indicates that no safety estimate can be made. is a fusion operator among the set , which is borrowed from [14] as

(11)

Finally, the safety assessment function is initialized with the prior estimate of the BBAs for grid cells.

V Online Adaptation of Safety Assessment Function

V-a Discrepancy Function

Although the prior estimate of the BBA provides a rough safety prediction, we update the safety assessment function online as we collect trajectory data from the real system as depicted in Fig. 4. When we rollout a trajectory using the real system, we simulate a trajectory using the nominal closed-loop system with the same initial state and the same desired trajectory. With the trajectories from the real and nominal systems, we construct a collection of feedback data with sets, where

(12)

Similar to the training data, and represent the starting state and the desired trajectory of the th trajectory segment with the re-ordered time index. and are the unsafety scores of the th segment of the trajectories of the real and the nominal system, respectively, computed by Eq. (7). If there is a discrepancy in terms of safety between the nominal and the real system due to the reality gap, can be different from .

Fig. 4: Detailed view of the online adaptation process.

Now, we define a discrepancy function that quantifies the level of reality gap. We approximate this function with a Gaussian process regression (GPR) model, which is trained with the input set and the output set .

With the trained GPR model, we predict the reliability of the training data and update the prior estimate of BBA

. Let us denote the predicted mean and standard deviation of

by and . Based on the level of reality gap predicted by the trained GPR model, we update the belief assignment on the training data with the new uncertainty

(13)

where is a user-specified parameter set to be smaller than . As more feedback data is collected and the standard deviation on the prediction goes below a certain threshold (i.e., ), we update the uncertainty of the belief assignment using the mean prediction . With the new , we update the belief mass, and , by following Eq. (9). Finally, we improve the prior estimate of BBAs for grid cells with Eq. (10) to take the reality gap into account.

V-B Feedback Estimate of BBAs on Grid Cells

We update the feedback estimate of BBAs on grid cells using . We, again, first compute the belief assignment for each embedded feedback data with the expression , where , , and . Note that is set to have zero uncertainty since it comes from the real system. With this, we compute the feedback estimate of BBA for the grid index as

(14)

where contains the BBAs in grid , and is the number of BBAs in the set . If no feedback data is collected yet for the index (i.e., ), we set the estimate to an empty BBA. is another fusion operator among the set and is defined as

(15)

Here, parameters and are the initial value and the decay rate of the uncertainty , respectively, and the uncertainty converges to zero as the number of data goes to infinity (i.e., ). and are computed with the average operator.

Finally, we combine and and compute the BBA for each index vector as

(16)

If the feedback estimate for the grid index is available, we fuse the prior and feedback estimates of BBAs through the fusion operator in Eq. (11), otherwise, we just use the prior estimate. It has been shown that the approaches as the number of feedback data, , approaches infinity [14]. This means that the prior estimate has an effect when there is no sufficient data from the real system, but has less of an effect in making safety estimates. We finally update the safety assessment function as . For computational efficiency, the online adaptation process is performed once every sets of feedback data are obtained, where the value of is a task dependant parameter.

Vi Experimental Results

Fig. 5: Offline initialization (left) and online adaptation (right) of safety assessment function during Laikago’s balancing task. The online adaptation process occurs once every feedback data sets are collected. For the online adaptation phase, only the first and the fourth iterations are illustrated.
Fig. 6: (top) Snapshots of Laikago balancing (a)-(d) and taking a recovery step (e). (bottom) Receding horizon safety prediction over time and throughout the low-dimensional space. The robot is initialized at and is perturbed by the balls twice (at and ).

In this study, we consider two different scenarios: a quadruped balancing task and a humanoid reaching task. We then address the following questions: Does the offline initialization phase find a proper low-dimensional representation of trajectory data and compute ? Does the online adaptation phase incorporate feedback data and properly address the sim-to-real gap? Can the safety assessment function make a receding horizon prediction so that it can evaluate trajectories’ safety both at planning phase and at the execution phase? How is our safety assessment function compared to other baseline verification tools and how much are the predictions accurate? How can our framework be incorporated to a back-up planner or controller to prevent unsafe behaviors?

Fig. 7: Snapshots of Atlas reaching (a) the blue box and (b) the red can. In these human-robot interaction scenarios, the human tells the robot which object to reach.
Fig. 8: (a) Reachable regions of Atlas’ left hand computed by our safety assessment function (orange boundary) and by a simple inverse kinematics-based reachability method (blue boundary) from nominal pose shown in Fig. 7. (b) Safety assessment function predictions based on randomly sampled targets and the resulting closed-loop behaviors.

Vi-a Laikago Balancing

We consider a balancing task using the Laikago quadruped from UnitreeRobotics. The robot’s state consists of its floating base and joints configurations, and the output vector

is the base position. At every episode, the robot is initialized with randomly sampled state and our planner generates an interpolated trajectory between the initial and desired base position. Then, our feedback controller computes joint position commands by solving inverse kinematics to follow the trajectory. For this task, we define the safe set

to be the supporting polygon and a specified height range. Thus, we check that the projection of the base onto the ground remains inside this safe region and that the base height remains within its corresponding bounds. We consider random disturbances while balancing and aim to make a receding horizon safety prediction on the motions using the safety assessment function. If a strong disturbance causing the closed-loop system to become unsafe is properly detected by the safety prediction module, we initiate a recovery step plan [17] to avoid falling. Table I summarizes parameters used in the safety assessment function training.

We simulate episodes with the nominal closed-loop system and segment the data to construct the training data .111We intentionally make a reality gap by reducing the link’s mass by and removing the joint frictions and observation noises to simulate the nominal system. We also add a random offset to the initial state to simulate the disturbances. We measure the distance between the training data and use it to embed the data into a two dimensional space (i.e., ) that is discretized into a by square grid with a cell length of . The low-dimensional embedding of the training data and the prior estimate of BBAs for grid cells are illustrated in Fig. 5.

The online adaptation process is performed once every feedback data are collected from the real system (i.e., ). We train the discrepancy function with the GPR model and update for each grid cell. For instance, the grid cell highlighted with the pink circle in Fig. 5 was originally assigned of safety probability in the offline initialization phase but is updated to after the first update iteration due to the feedback data that shows a large sim-to-real gap. This makes the discrepancy prediction around the pink circle regions to be high, which results in an increase in the uncertainty and a decrease in the safety probability . At the same time, we update and fuse it with to adapt the safety assessment function.

After the safety assessment module converges, we show that our framework can make a receding horizon safety prediction on the balancing trajectories and trigger the recovery step when it is needed to avoid falling. Fig. 6 shows snapshots of Laikago balancing and taking a recovery step. The robot is perturbed with balls in simulation: one which generates a small disturbance (Fig. 6(b)) and another one which generates a large disturbance (Fig. 6(d)). The robot stabilizes and tracks the desired trajectory until the safety assessment function predicts future unsafety. When it predicts a safety probability below the threshold , set to , it triggers the recovery step planner to avoid falling.

Vi-B Atlas Reaching

We consider an object reaching task using the Boston Dynamic’s humanoid Atlas. The robot’s state consists of its floating base and joints configurations, and the output vector consists of the reaching hand position. At every episode, the robot is initialized with randomly sampled state and the planner generates an interpolated trajectory between the initial and the target hand position. Our feedback controller computes joint torque commands by using an optimization-based whole-body controller [18]. We define the safety set such that if the projected base position is inside the supporting polygon, the end-effectors do not collide with the obstacles, and the joint positions remain within their limits. We train the safety assessment function for the hand reaching trajectories and use it to predict whether the robot can reach the commanded target safely.222When we rollout trajectories using the nominal system, we do not sample an offset and do not add it to the initial state since we do not consider disturbances here. This training is done only for one arm since the same mapping function can be used for both left and right arms. The parameters used in the training are identical to the ones used in Laikago balancing task except for the prediction horizon, which is .

0.99 10 0.01 0.3 5 0.3 0.1 0.4 0.3
TABLE I: Parameters

When a human commands a humanoid what to do as an end-user, it is not trivial to evaluate whether the command is safe to execute or not. We demonstrate that our safety assessment function enables a robot to estimate the likelihood it will accomplish the given task safely. Fig. 7(a) illustrates a scenario where Atlas is told to reach the blue box on the bookshelf. After ensuring this task can be accomplished safely, the robot executes the command. Fig. 7(b) illustrates the scenario where the robot is initially told to reach the red can. Based on the safety prediction, the robot rejects the task so that the human instructor can provide a different description to accomplish the task.

In Fig. 8(a), we compare the reachable regions on the bookshelf computed by our safety assessment function against those obtained by a simple inverse kinematics based reachability method. Our safety assessment function considers joint limits violation, collision, and falling down while manipulating to be unsafe, and it results in more conservative reachable regions than those considering only kinematic constraints. Fig. 8(b) summarizes the evaluation on the prediction accuracy of our safety assessment function. Among episodes with randomly sampled target positions, the safety assessment function predicts of safe targets to be safe and of unsafe targets to be unsafe.

Vii Conclusions

In this letter, we propose a probabilistic safety verification tool for legged systems when desired motions are given. We leverage a low-dimensional embedding of the current state measurement and upcoming desired trajectories based on the proposed distance metric for safety prediction. For data-efficiency, we initialize our safety assessment function by simulating trajectories with a nominal system and perform online adaptation using trajectories from the real system to account for the reality gap. We have demonstrated our framework’s efficiency and accuracy with a quadruped balancing task and a humanoid reaching task.

As future work, we would like to integrate our safety verification tool in hierarchical reinforcement learning frameworks such as

[19] and train a high-level motion policy with a safety consideration. We would also like to deploy our safety verification tool in a human-robot interaction scenario such as [20] and provide self-assessment capabilities to our new Draco humanoid, a successor of the Draco biped [21].

Acknowledgment

The authors would like to thank the members of the Human Centered Robotics Laboratory at The University of Texas at Austin for their great help and support.

References

  • [1] R. Tedrake, I. R. Manchester, M. Tobenkin, and J. W. Roberts, “Lqr-trees: Feedback motion planning via sums-of-squares verification,” The International Journal of Robotics Research, vol. 29, no. 8, pp. 1038–1052, 2010. [Online]. Available: https://doi.org/10.1177/0278364910369189
  • [2] A. Majumdar and R. Tedrake, “Funnel libraries for real-time robust feedback motion planning,” The International Journal of Robotics Research, vol. 36, no. 8, pp. 947–982, 2017. [Online]. Available: https://doi.org/10.1177/0278364917712421
  • [3] Z. Manchester and S. Kuindersma, “Robust direct trajectory optimization using approximate invariant funnels,” Autonomous Robots, vol. 43, no. 2, pp. 375–387, 2019. [Online]. Available: https://doi.org/10.1007/s10514-018-9779-5
  • [4] M. Chen, S. L. Herbert, H. Hu, Y. Pu, J. F. Fisac, S. Bansal, S. Han, and C. J. Tomlin, “Fastrack:a modular framework for real-time motion planning and guaranteed safe tracking,” IEEE Transactions on Automatic Control, vol. 66, no. 12, pp. 5861–5876, 2021.
  • [5] S. Singh, H. Tsukamoto, B. T. Lopez, S.-J. Chung, and J.-J. Slotine, “Safe motion planning with tubes and contraction metrics,” in 2021 60th IEEE Conference on Decision and Control (CDC), Dec 2021, pp. 2943–2948.
  • [6] W. Langson, I. Chryssochoos, S. Raković, and D. Mayne, “Robust model predictive control using tubes,” Automatica, vol. 40, no. 1, pp. 125–133, 2004. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0005109803002838
  • [7] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-based model predictive control for safe exploration,” in 2018 IEEE Conference on Decision and Control (CDC), Dec 2018, pp. 6059–6066.
  • [8] A. Gazar, M. Khadiv, A. D. Prete, and L. Righetti, “Stochastic and robust mpc for bipedal locomotion: A comparative study on robustness and performance,” in 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), July 2021, pp. 61–68.
  • [9]

    D. Fan, A. Agha, and E. Theodorou, “Deep Learning Tubes for Tube MPC,” in

    Proceedings of Robotics: Science and Systems, Corvalis, Oregon, USA, July 2020.
  • [10] A. Rai, R. Antonova, S. Song, W. Martin, H. Geyer, and C. Atkeson, “Bayesian optimization using domain knowledge on the atrias biped,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1771–1778.
  • [11] M. H. Yeganegi, M. Khadiv, A. D. Prete, S. A. A. Moosavian, and L. Righetti, “Robust walking based on mpc with viability guarantees,” IEEE Transactions on Robotics, pp. 1–16, 2021.
  • [12] A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, and V. Vanhoucke, “Policies modulating trajectory generators,” in Proceedings of The 2nd Conference on Robot Learning

    , ser. Proceedings of Machine Learning Research, A. Billard, A. Dragan, J. Peters, and J. Morimoto, Eds., vol. 87.   PMLR, 29–31 Oct 2018, pp. 916–926. [Online]. Available:

    https://proceedings.mlr.press/v87/iscen18a.html
  • [13] J. Ahn, J. Lee, and L. Sentis, “Data-efficient and safe learning for humanoid locomotion aided by a dynamic balancing model,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4376–4383, July 2020.
  • [14] Z. Zhou, O. S. Oguz, M. Leibold, and M. Buss, “Learning a low-dimensional representation of a safe region for safe reinforcement learning on dynamical systems,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2021.
  • [15] G. Shafer, A Mathematical Theory of Evidence.   Princeton: Princeton University Press, 1976.
  • [16] L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008. [Online]. Available: http://jmlr.org/papers/v9/vandermaaten08a.html
  • [17] M. H. Raibert, Legged Robots That Balance.   USA: Massachusetts Institute of Technology, 1986.
  • [18] J. Ahn, S. J. Jorgensen, S. H. Bang, and L. Sentis, “Versatile locomotion planning and control for humanoid robots,” Frontiers in Robotics and AI, vol. 8, 2021. [Online]. Available: https://www.frontiersin.org/article/10.3389/frobt.2021.712239
  • [19] R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artif. Intell., vol. 112, no. 1–2, p. 181–211, aug 1999. [Online]. Available: https://doi.org/10.1016/S0004-3702(99)00052-1
  • [20] T. Frasca, E. Krause, R. Thielstrom, and M. Scheutz, ““can you do this?” self-assessment dialogues with autonomous robots before, during, and after a mission,” 2020.
  • [21] J. Ahn, D. Kim, S. Bang, N. Paine, and L. Sentis, “Control of a high performance bipedal robot using viscoelastic liquid cooled actuators,” in 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019, pp. 146–153.