I Introduction
Inverse reinforcement learning (IRL) allows an agent to infer the goals driving someone’s behavior and learn to complete the same task simply by observing. This is a powerful paradigm for learning new behaviors from scratch, but doesn’t encompass all of the useful information we may extract from observations. For example, consider the case where the agent already has a good policy for some task, but notices that an expert demonstrator is deviating from the optimal behavior. A reasonable explanation would be that the demonstrator is acting under new environmental constraints that the agent is unaware of. As a concrete example, we can imagine a scenario where an autonomous vehicle (the agent) is following a car that suddenly swerves (the demonstrator). Since both vehicles have the same policy of avoiding collisions and following the road, the agent can infer that an obstacle suddenly appeared in the road and take evasive action even before it can detect the obstacle directly. Taking cues from other agents is an important aspect of intelligent behavior that can help compensate for problems such as sensor failure or perceptual error.
The process of detecting constraints that help explain the behavior of a demonstrator is called constraint inference. Scobee and Sastry [17]
applied the maximum entropy IRL framework to this problem, resulting in an algorithm that can identify the most likely constraints from a hypothesis set. However, this approach is limited to systems with tabular stateaction spaces. This precludes its use in many real systems of interest whose dynamics are inherently continuous. In this paper, we describe a procedure for creating a tabular approximation of an arbitrary continuous system, and show that maximum likelihood constraint inference (MLCI) can be used to infer constraints on the approximated system that transfer well to the original continuous system. We analyze the effects of various approximation hyperparameters on the accuracy of constraint inference on an example 2dimensional pendulum system. Although this analysis does not necessarily generalize to other sets of dynamics, following a similar procedure on a system of interest can indicate whether the approximation is sufficient for meaningful constraint inference. We also present a technique for estimating confidence in the inferred constraint in the form of a Bayesian probability update.
In addition to applications where the agent wishes to use learned constraints to improve its own policy, this extension of MLCI allows us to perform constraint inference directly from observed human movements. One exciting potential application of this work is in individuals with nonspecific lowback pain, pain that is not immediately attributable to a specific pathology. This affects 84% of people in their lifetime, with around 12% of people being disabled from this pain [1]. Our proposed constraint inference approach would enable an explanatory biomechanical tool to infer joint level limitations from a series of fullbody movements, where traditional biomechanical methods have seen limited success [14]. This motivates the telescoping inverted pendulum model (section VI) which has been used to model different standing patterns in clinical populations [13].
We first present related work in section II, before briefly introducing MLCI in section III and our method for translating continuous dynamics into a tabular Markov Decision Process that can be used with MLCI in section IV. We then perform experimental analysis in section V, and finally show an example with clinically motivated 4D telescoping inverted pendulum dynamics in section VI.
Ii Related Work
Previous work on constraint inference can be split into two categories: approaches that infer the most likely constraints but require a tabular stateaction space, and those that admit continuous dynamics but drop the maximum likelihood feature. In the former category, in addition to Scobee and Sastry [17], VazquezChanlatte et al. [19] learn task specifications, which can be thought of as a generalization of statespace constraints to include complex multistep behaviors. Unfortunately, neither of these approaches can be applied directly to many realworld systems that are inherently continuous.
There are many proposed methods for identifying constraints in systems that cannot be tabulated. Some use heuristics such as assuming that constrained behaviors will have high intrademonstration variance and low interdemonstration variance, or that a maintaining an end effector in the same orientation throughout a demonstration suggests a constraint
[12, 5]. [7] presents a kinematicsbased approach for learning constraints that affect how a nominal policy is executed in different environments, but doesn’t assume an objective function and therefore requires demonstrations to cover much of the state space for the inference to be welldefined. [10] is specialized for online constraint inference in the context of shared autonomy, where misidentified constraints can be corrected by the user. [2] provides a flexible approach for learning statespace constraints by sampling from possible trajectories with lower costs than the demonstrations.Although the present work introduces error by estimating continuous dynamics with a finite stateaction space, it provides two key advantages over previous methods that work with continuous dynamics. First, using the maximum entropy framework allows us to model the demonstrators as softoptimal with respect to a reward function, which may be especially appropriate for human demonstrators. Second, we are able to estimate and rank the most likely constraints even in situations where demonstrations cover only a small portion of the state space and do not provide enough information to fully resolve ambiguity in possible constraints.
Iii Markov Decision Processes and Maximum Likelihood Constraint Inference
To perform maximum likelihood constraint inference (MLCI), we adapt the approach developed by [17]. In this section, we present a brief overview of the MLCI algorithm.
Iiia Markov Decision Dynamics
MLCI is formulated as an operation on a tabular Markov Decision Process (MDP). The MDP is a tuple of four elements:

A state space to navigate. is a finite set of discrete state values:

A set of actions to decide between. is a finite set of discrete input values:

A transition kernel
that determines the influence of on . The repeated action of this transition kernel generates a sequence of states over a time horizon given a sequence of action choices up to the horizon. The couple of state sequence and action sequence is the trajectory and the space of all possible trajectories is .

An objective metric that measures the quality of trajectories.
This work focuses on deterministic dynamics, so the transition kernel will be singleton distributions with zero probability of all next states except the deterministic successor
. That is, we focus on MDP’s with transitions of the form:(1) 
MLCI requires that and be finite sets, and we refer to an MDP which satisfies this property as tabular.
IiiB Maximum Entropy Likelihood on Trajectories
This work leverages the maximum entropy likelihood distribution advanced in [21] and extended to constraint inference in [17]. This distribution’s randomness reflects epistemological uncertainty in the estimated reward function of the demonstrator. Under this distribution, the likelihood of a trajectory is defined on the deterministic MDP as:
(2) 
where is the normalizing constant:
This work investigates how dynamic agents avoid certain sets of states . These constrained states further refine the choice distribution by zeroing out illegal choices:
(3) 
Where the partition constant decreases to for this new distribution that constrains out much of the previous support. Let be the subset of trajectories that don’t violate the constraint :
So that may be simply defined as:
IiiC Constraint Inference
The distribution in equation (3) describes the likelihood of observing any demonstrated trajectory given a constraint set . Given a set of independent and identically distributed sample trajectories , the likelihood of observing this dataset is:
(4)  
(5)  
(6) 
Adding a constraint to the model that helps explain the demonstrations will increase this likelihood. Therefore, the most likely constraint is the one that maximizes . Note two properties that will aid in finding :
Remark 1.
The optimal constraint set must have all inside of its corresponding . Otherwise its likelihood would be 0 – a lower likelihood even than having no constraints at all. This would contradict its being the optimum.
Therefore for any feasible candidate constraints, the indicator will always evaluate to 1. With the zerocase ruled out, the likelihood can be straightforwardly characterized by factoring out the remaining dependent component:
Remark 2.
When comparing the likelihood amongst feasible constraint sets, they are only rescalings of the same datasetdetermined constant by . So the maximum likelihood constraint set is simply whichever set , amongst the feasible constraint sets, has the smallest .
For every hypothesized constraint set , can be computed by a Bellman backup or by forward simulation. The latter approach is favored by [17] as it makes a direct parallel to the seminal Maximum Entropy IRL work [21]. Let be some baseline set of known constraints (e.g. the empty set for the unconstrained case). The forward simulation relies on the fact that is proportional to .
It can be calculated by forward simulating the state distribution under ’s maximum entropy distribution and observing the probability that trajectories violate the constraint up to time . Call that quantity , then:
Therefore, the most likely constraint minimizes , or equivalently, maximizes . The quantity will be useful in some of our subsequent analysis.
Iv Formulation of Approximate MDP
Given an arbitrary set of continuous dynamics of the form , we wish to generate an appropriate tabular stateaction space that can be used with the MLCI algorithm described in section III. We will illustrate this process with a pendulum model that we return to for experimental analysis in section V.
Iva Running Example: Pendulum System
The pendulum model consists of a 2dimensional state space (angle and angular velocity). The 1dimensional control input is the normalized torque applied at the base of the pendulum:
(7) 
Where the gravitational constant and the length of the pendulum are both assumed to be 1 for simplicity. The constraint hypothesis set is an evenly spaced 10by10 grid of nonoverlapping cells that cover the state space, for a total of 100 possible constraints. (Note that any set of state space regions is acceptable as the constraint hypothesis set, including overlapping regions or ones that do not cover the whole state space, but it is typically appropriate for them to be equally sized. This is because a larger constraint region is able to ”explain away” more demonstrator suboptimality and is therefore likely to have a larger , making it difficult to directly compare constraint regions of different sizes when choosing the most likely one.) The demonstrator wants to arrive at a particular goal state at the end of a = 5s period while minimizing the total squared torque and avoiding the true constraint region, :
(8) 
Where we use to refer to continuous time and to refer to discrete time steps.
IvB Forming The Tabular StateAction Space
First, we choose appropriate bounds for each dimension of the state space and control input, which can come from domain knowledge or observing the range of values in the demonstrations. For the pendulum system, it is natural to bound , we select the velocity bound , and the control input bound is chosen by observing that the controls used by continuous trajectories optimizing the objective in equation (8) rarely exceed this range. We then grid up the continuous state space by dividing it into disjoint cells that completely cover the bounded area. A reasonable default is to use equally sized boxes. For example, we can divide the pendulum state space into 100 cells, 10 along each dimension, each encompassing a rad angle width and a 1.2 rad/s angular velocity range. The set of these cells is . Similarly, the range of possible control inputs is divided into discrete points to give . We use and to label the discrete states and actions, respectively. is the value of the continuous state at the center point of state cell , while is the value of the control input associated with discrete action choice .
IvC Tabular MDP Transition Kernel and Objective
To complete the tabular MDP representation, we need to determine the transition and reward associated with each ). For “gridworld” environments frequently used in inverse reinforcement learning, the agent is allowed to transition to any adjacent cell. However, for arbitrary continuous dynamics, this behavior may result in trajectories that bear little resemblance to what is possible under the true dynamics. For example, consider that in the pendulum system, allowing a transition from to is nonsensical if the current velocity is a large negative value, regardless of the control input.
To resolve this problem, we select a constant time interval that represents the amount of time that passes between state transitions in the tabular model. For each discrete , we use an ODE solver to determine the trajectory that would result from starting at the center of state cell , , and applying a constant control input of for time. We can then determine which state cell the agent would land in at the end of this trajectory segment, which becomes the successor state . While this is sufficient for determining appropriate discrete transitions, the start and successor cells alone do not tell us which statebased constraints may have been violated while taking a particular transition. Therefore, we also keep track of which hypothesized constraints would be violated while executing the continuous trajectory underlying the discrete transition. This ensures that the agent isn’t allowed to “warp through” constraints even when and are not adjacent cells.
Finally, we assume that the groundtruth reward for an entire trajectory can be expressed as for some function of the continuous state and control input. We estimate the tabular reward function as , where the sequence is the sequence of discrete stateaction pairs over the course of a trajectory on the tabular MDP.
It is worth noting that trajectories allowable under the tabular MDP described above are not necessarily feasible or safe under the true continuous dynamics. For example, starting from different points within the same cell might result in slightly different constraint violations, while we only track violations that result from starting in the center of each cell. This is acceptable for our application because we are trying to obtain estimates of general behavior that enable reasonable likelihoodbased constraint inference. Similarly, there is no welldefined mapping from a particular continuous trajectory to a feasible discrete stateaction sequence under the approximate tabular dynamics. Since we only handle statebased constraints, it is sufficient to determine which possible constraints a demonstration violates without trying to construct a discrete version of the trajectory. This can be done by sampling points along the trajectory to determine which constraint regions it passes though.
The primary hyperparameters that determine the final tabular MDP are the number of cells to use for each state dimension, the number of actions, and the transition time step . These parameters can be tuned using domain knowledge or by running simulated experiments with known constraints to determine which model obtains the best performance. An example of these experiments and the resulting constraint inference performance for the pendulum system is described in the following section. Once an appropriate model has been selected, it can be used with any combination of reward function and demonstration set. Additionally, if the objective of the demonstrators is known in advance, MLCI can be performed on the appropriately initialized discrete MDP as a precomputation step, and constraints can be inferred online with very little additional computation.
V Analysis On Pendulum System
After following the procedure outlined in section IV for the pendulum system, we now have a tabular MDP representation that can be used with MLCI as described in section III. We next turn to analyzing the behavior of this approximate MDP. For our experiments, we tested two possible groundtruth constraints: prohibits while , and prohibits while Both groundtruth constraints are aligned with the constraint hypothesis set. The constraint hypothesis space is illustrated in Fig. 2. For each ground truth constraint, we randomly sampled 100 pairs of start and end states from (defined in equation (10) below) for agents to satisfy while optimizing the objective in equation (8). Some of these startend state pairs were illposed since the pendulum could not reach across them in the fixed 5 second time horizon provided. After removing these configurations, the set of demonstrations was reduced to trajectories.
Va Accuracy Of Tabular MDP Dynamics
We first examine how accurately the tabular MDP recovers the true continuous dynamics under goaldirected behavior induced by the objective function. For each groundtruth constraint and random startgoal pair, we initialized the MDP while incorporating the true constraint into the MDP dynamics (i.e., actions that would result in violating the true constraint were not allowed). We then performed a Bellman backup to determine the distribution of soft optimal policies on the tabular MDP. Intuitively, if the MDP perfectly describes the true continuous dynamics, we expect that running a simulation with the groundtruth dynamics while taking the sequence of actions determined by one of these policies will cause the agent to land exactly at the goal state. Following this intuition, we sampled and executed a random policy from each MDP and measured the normalized Euclidean distance between the final state and goal state. As shown in Fig. 3, increasing the number of state cells reduces the “roundoff” error associated with each discrete state transition and results in a final state that is closer to the intended goal. Since the objective function specifies a fixed time horizon, increasing decreases the number of transitions over the course of a trajectory and therefore reduces final state error as well.
VB Generating Simulated Expert Demonstrations
To understand the accuracy of constraint inference with the tabular MDP, we first need expert demonstrations that follow the groundtruth dynamics. For each groundtruth constraint, 100 random pairs of states were sampled to serve as the start and goal points for independent demonstrations. These expert continuous demonstrations were synthesized using a secondorder descent method with simulation time step (), much finer than the used in the tabular MDP. The demonstrations are optimized using a GaussNewtonstyle descent method known as Iterative LinearQuadratic Regulators (or iLQR) [6]. The optimization is halted after ten iterations. For each startgoal pair, the bestofthree optimizations is picked (each with randomly sampled controls initialization) to reject optimizations that get stuck in local minima. The optimizations that could not succeed in reaching their goal were filtered out from the dataset, reducing the dataset size to .
The state constraints are blocked out as rectangular polytope constraints in the continuous statespace. They are enforced using an interiorpoint method that supersedes any controls (as in [3]) that would reach the constrained states. This backwardsreachable set that forms the barriercertificate [15]
is computed via a HamiltonJacobi Isaacs Partial Differential Equation
[11]. For a continuous dynamic , the robust backwards reachable set of the constraint region can be computed as the subzero level set of:(9) 
where is initialized to the signed distance from :
Let be the complement of this backwards reachable set:
(10) 
As the complement of the reachable set, is the set from which there is a way to avoid the keepout set . Since there exists an avoidant strategy, this is a controlinvariant set. So long as the system is initialized within it is possible to remain safe. Furthermore, any controls can be taken up to crossing the border from into . At this point, the maximally safe action must be taken. This is the safety strategy advanced in [3].
This safety strategy ensures the system will stay on the interior of the feasible region. Due to intervening only when absolutely necessary (i.e. when crossing into ), this intervention is also the least restrictive. It will not eliminate any trajectories that weren’t already infeasible. Therefore the set of feasible solutions remains unchanged after instituting these dynamics. The optimal trajectory of the nonintervened dynamics will be the same as the optimal trajectory on the intervened dynamics.
This constraintenforcing switching control is nondifferentiable, so derivativebased optimizations on the controls cannot be used. Fortunately, new relaxations of switched dynamics [20] can substitute a relaxed problem whose solutions will converge to the true unrelaxed solution as the relaxation is tightened.
VC Constraint Inference Performance
Accurate constraint inference relies on a close match between expert demonstrations and softoptimal trajectories on the tabular MDP that incorporates the groundtruth constraint. For the purposes of constraint inference, two trajectories are equivalent if they violate the same constraints in the constraint hypothesis set. Therefore, we next examine the difference between the expected constraint violation under the tabular MDP and the actual constraints violated by independent continuous demonstrations. Over all of the possible constraints, this difference can be expressed as
(11) 
Where is an indicator for whether demonstration violates constraint . If the approximate MDP perfectly tracks the true constraint violation distribution and demonstrations are distributed according to soft optimality, we expect the quantity in equation (11) to go asymptotically to 0 as the number of demonstrations increases. Results for different model hyperparameters and a single demonstration (averaged over 65 trials and the two alternative groundtruth constraints) are shown in Fig. 4. Increasing the number of state space grid cells from 100 to 400 lowers constraint violation error, but increasing the number of states in the tabular MDP beyond this point does not have much effect. This suggests that the greater accuracy of the approximate MDP for larger numbers of discrete states does not necessarily translate into improved constraint inference. Error is stable across different values of the discrete time interval .
We see a very similar trend when examining the performance of constraint inference across MDP’s generated with different hyperparameters, as can be seen in Fig. 5. After choosing an appropriate , tabular MDP’s with 100 to 1600 states are able to successfully identify the true constraint as one of the top5 likeliest constraints after 9 demonstrations. Increasing the number of states to at least 400 stabilizes performance across different choices of . Even though the approximate MDP’s do not capture the true continuous dynamics with high fidelity, especially for the coarsest statespace grid, constraint inference still works well and is robust to a range of hyperparameters.
In addition to these average trends, we can qualitatively examine the approximation quality by sampling a trajectory from the discrete MDP and comparing it to the original continuous demonstration. An example of this for a single trial is shown in Fig. 6.
For all of the analyses described above, we also varied the number of discrete actions in the approximate MDP but found that this made little difference to any of the measures we examined. A larger number of actions allows the discrete agent more possible routes to the goal, but it may be that these routes do not change constraint violation behavior in expectation across the softoptimal policy distribution. Fig. 3 through Fig. 6 show results using 9 actions evenly spaced from to .
VD Confidence In Found Constraints
In addition to identifying the most likely constraints influencing agent behavior, it is desirable to calculate the probability of there being a constraint at all. First, consider the simple case where we assume that there is at most one constraint, and if there is one, it is the most likely one identified via MLCI. Let be the event that this is truly a constraint, and be the event that N independent trajectories do not violate this constraint. We would like to calculate . We know that since no demonstrations may violate a constraint, and that (i.e. the probability of a demonstration not violating this constraint by coincidence, even if the agent isn’t really subject to it), which we obtain from the MLCI algorithm. We can therefore use Bayes’ Rule to obtain the following formula:
(12) 
where is a prior on the probability of the constraint being present. This simple formula introduces no additional approximation error beyond what is already present in the model under the assumptions described above, and presents an important advantage of the MLCI approach to constraint inference over previous approaches that cannot provide confidence estimates of found constraints. Unfortunately, relaxing the assumptions on possible constraints and calculating probabilities of all possible constraints quickly becomes computationally intractable. Providing estimates of these probabilities is left for future work.
Vi Potential Application: SittoStand and Lower Back Pain
The robustness to hyperparameters selection, low number of required demonstrations, and ability to provide a confidence interval on the identified constraints supports the use of the MLCI approach to identify patientspecific impairments from observed motion. One potential application is in the analysis of individuals with Low back pain (LBP).
LBP affects 7090% of adults during their lifetime and can be extremely debilitating [18]. However, it is often difficult to determine the source of the pain and therefore prescribe an appropriate treatment. Disorders of the lower spine, hip, and pelvic region can all cause LBP [16]. Treating the wrong problem may result in an unnecessary surgery that doesn’t resolve the patient’s LBP. When a treatment plan that addresses the physical cause of the pain can’t be identified, patients may be prescribed opioids for chronic pain management, even though these are ineffective and can lead to abuse and addiction [9]. Therefore, there is a pressing clinical need to develop better methods for understanding the source of LBP.
There is a recent body of literature suggesting that LBP may be linked to irregularities in movement patterns. For example, inappropriate amounts of pelvic movement during various motions appears to contribute to LBP [8]. This pelvic movement may be compensatory for a limited range of motion in other joints  in other words, constraints on the achievable joint angles. We would expect the resulting movement patterns to avoid regions of the biomechanical state space associated with pain. Identifying both physical and painrelated constraints on movement could therefore lead us to a better understanding of the underlying cause of the LBP. For this reason, we would like to infer the most likely constraints a person is acting under when observing their movements. A particularly promising movement pattern for demonstrations is completing a sittostand trajectory, which exerts significant strain on several joints implicated in LBP [4]. A telescoping inverted pendulum system has been used to model this movement, which reduces the problem to 4 dimensions while allowing for clinically relevant discovery [13].
Via Constraint Inference on Telescoping Inverted Pendulum
Following the above motivation, we next demonstrate successful constraint inference on a telescoping inverted pendulum (TIP) model. The dynamics for this model are as follows:
These dynamics omit the crosscoupling term between angular acceleration and linear velocity for simplicity. For this experiment, we chose the goal set as the set of all states within a certain range of pendulum length and angle, leaving velocity as a free parameter. The objective is to reach the goal set at = 5s while minimizing . The constraint hypothesis space is a 10x10 evenly spaced grid along the angle and length dimensions, so that if a particular (angle, length) combination is constrained, the agent is not allowed to enter that combination at any velocity. We generated 5 demonstrations with random start and goal states following the same procedure as in section VB. We then formulated a tabular MDP with 2500 states (10 cells each for angle and length, and 5 cells each for angular and linear velocity) and 15 actions (5 discrete torque options and 3 discrete linear force options) and performed constraint inference on the demonstrations. The ground truth constraint and top 2 likeliest inferred constraints are shown in Fig. 7. Despite the coarseness of the tabular stateaction space and a mismatch between the constraint hypothesis space and the true constraint region, MLCI correctly identifies the ground truth constraint and takes about 5 minutes with no optimization effort on a single CPU core. If the start and goal states of demonstrations are known in advance, as is likely to be the case in a clinical test, this computation can be done ahead of time and inferring constraints after observing the actual demonstration trajectories is virtually instantaneous.
Vii Conclusion
We have presented methodology for forming a tabular MDP approximation of continuous dynamics which can be used for maximum likelihood constraint inference. Although the approximation introduces some error into the estimation, constraint inference works well with pendulum dynamics over a range of hyperparameters, including a small discrete state space. The present approach allows for ranking possible constraints by their likelihood, which is especially useful in applications with significant uncertainty, and uses the maximum entropy framework, which may be an especially good fit for human demonstrators, who tend to act suboptimally. Future work should characterize the kinds of dynamics for which this approach works well and whether techniques such as variable grid size may allow for higher accuracy and increased computational efficiency.
References
 [1] (201202) Nonspecific low back pain. The Lancet 379 (9814), pp. 482–491 (en). External Links: ISSN 01406736, Link, Document Cited by: §I.
 [2] (202004) Learning Constraints From LocallyOptimal Demonstrations Under Cost Function Uncertainty. IEEE Robotics and Automation Letters 5 (2), pp. 3682–3690. Note: Conference Name: IEEE Robotics and Automation Letters External Links: ISSN 23773766, Document Cited by: §II.
 [3] (200812) Decentralized cooperative collision avoidance for acceleration constrained vehicles. In 2008 47th IEEE Conference on Decision and Control, pp. 4357–4363. Note: ISSN: 01912216 External Links: Document Cited by: §VB, §VB.
 [4] (199405) Chair rise strategies in the elderly. Clinical Biomechanics 9 (3), pp. 187–192 (en). External Links: ISSN 02680033, Link, Document Cited by: §VI.
 [5] (2017) Learning Object Orientation Constraints and Guiding Constraints for Narrow Passages from One Demonstration. In 2016 International Symposium on Experimental Robotics, D. Kulić, Y. Nakamura, O. Khatib, and G. Venture (Eds.), Springer Proceedings in Advanced Robotics, Cham, pp. 197–210 (en). External Links: ISBN 9783319501154, Document Cited by: §II.
 [6] (2004) ITERATIVE LINEAR QUADRATIC REGULATOR DESIGN FOR NONLINEAR BIOLOGICAL MOVEMENT SYSTEMS:. In Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, Setúbal, Portugal, pp. 222–229 (en). External Links: ISBN 9789728865122, Link, Document Cited by: §VB.
 [7] (201505) Learning null space projections. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2613–2619. Note: ISSN: 10504729 External Links: Document Cited by: §II.
 [8] (201510) Correlation between Hip Rotation RangeofMotion Impairment and Low Back Pain. A Literature Review.. Ortopedia, Traumatologia, Rehabilitacja 17 (5), pp. 455–462 (English). External Links: ISSN 15093492, 20844336, Link, Document Cited by: §VI.
 [9] (200701) Systematic Review: Opioid Treatment for Chronic Back Pain: Prevalence, Efficacy, and Association with Addiction. Annals of Internal Medicine 146 (2), pp. 116 (en). External Links: ISSN 00034819, Link, Document Cited by: §VI.
 [10] (201612) Inferring and assisting with constraints in shared autonomy. In 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 6689–6696. External Links: Document Cited by: §II.
 [11] (2007) A Toolbox of Level Set Methods. UBC Department of Computer Science Technical Report TR200711, pp. 31 (en). Cited by: §VB.
 [12] (2013) Learning Robot Skills Through Motion Segmentation and Constraints Extraction. HRI Workshop on Collaborative Manipulation, pp. 5 (en). Cited by: §II.
 [13] (199911) A telescopic invertedpendulum model of the musculoskeletal system and its use for the analysis of the sittostand motor task. Journal of Biomechanics 32 (11), pp. 1205–1212 (en). External Links: ISSN 00219290, Link, Document Cited by: §I, §VI.
 [14] (201806) Is there evidence to use kinematic/kinetic measures clinically in low back pain patients? A systematic review. Clinical Biomechanics 55, pp. 53–64 (en). External Links: ISSN 02680033, Link, Document Cited by: §I.
 [15] (2004) Safety Verification of Hybrid Systems Using Barrier Certificates. In Hybrid Systems: Computation and Control, R. Alur and G. J. Pappas (Eds.), Lecture Notes in Computer Science, Berlin, Heidelberg, pp. 477–492 (en). External Links: ISBN 9783540247432, Document Cited by: §VB.
 [16] (2019) Links between the Hip and the Lumbar Spine (Hip Spine Syndrome) as they Relate to Clinical Decision Making for Patients with Lumbopelvic Pain. PM&R 11 (S1), pp. S64–S72 (en). Note: _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/pmrj.12187 External Links: ISSN 19341563, Link, Document Cited by: §VI.
 [17] (201909) Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning. arXiv:1909.05477 [cs, eess, stat] (en). Note: arXiv: 1909.05477 External Links: Link Cited by: §I, §II, §IIIB, §IIIC, §III.
 [18] (201407) Risk Factors for Serious Underlying Pathology in Adult Emergency Department Nontraumatic Low Back Pain Patients. The Journal of Emergency Medicine 47 (1), pp. 1–11 (en). External Links: ISSN 07364679, Link, Document Cited by: §VI.
 [19] (2018) Learning Task Specifications from Demonstrations. Advances in Neural Information Processing Systems 31, pp. 5367–5377 (en). External Links: Link Cited by: §II.
 [20] (201812) A New Solution Concept and Family of Relaxations for Hybrid Dynamical Systems. In 2018 IEEE Conference on Decision and Control (CDC), pp. 743–750. Note: ISSN: 25762370 External Links: Document Cited by: §VB.

[21]
(2010)
Modeling Interaction via the Principle of Maximum Causal Entropy.
In
International Conference on Machine Learning
, pp. 8 (en). Cited by: §IIIB, §IIIC.
Comments
There are no comments yet.