In shared autonomy [aigner1997human, dragan2013blending, reddy2018shared, losey2018review, goertz1963manipulators, farraj2018hapticshared, losey2020latent, li2011dynamic, erdogan2017effect, brown2014balancing, crandall2017human]
, robots assist human operators to perform their objectives more effectively. Here, rather than directly executing the human’s control input, a typical framework has the robot estimate the human’s intent and execute controls that help achieve it[dragan2013blending, javdani2015shared, muelling2017autonomy, perez2015fast, reddy2018shared].
These methods succeed when the robot knows the set of possible human intents a priori, e.g. the objects the human might want to reach, or the buttons they might want to push [dragan2013blending, javdani2015shared]. But realistically, users of these systems will inevitably want to perform tasks outside the repertoire of known intents – they might want to reach for a goal unknown to the robot, or perform a new task like pouring a cup of water into a sink. This presents a three-fold challenge for shared autonomy. First, the robot will be unable to recognize and help with something unknown. Second, and perhaps more importantly, it will attempt to assist with whatever wrong intent it infers, interfering with what the user is trying to do and hindering their performance. This happens when the robot plans in expectation [javdani2015shared], and, as our experiments will demonstrate, it happens even when the robot arbitrates the amount of assistance based on its confidence in the most likely goal [dragan2013blending]. Third, the new task remains just as difficult as the first time even after arbitrarily many attempts.
Our key idea is that the robot should detect that the user is trying something new and give them control. This then presents an opportunity for the robot to observe the new executed trajectory, learn the underlying intent that explains it, and add it to its repertoire so that it can infer and assist for this intent in the future.
To achieve this, we need two ingredients: 1) a way for the robot to detect its repertoire of intents is insufficient, and 2) a representation of intents that enables learning new tasks throughout its lifetime, adding them to its repertoire, and performing inference over them in a unified way with the initial known intents. For the latter, we use cost functions to unify goals and general skills like pouring into the same representation. This then enables the former: when the human acts too suboptimally for any of the known cost functions, it suggests the robot lacks the correct set of costs.
Our approach takes inspiration from recent work on hypothesis misspecification where the robot recognizes when its cost function features are insufficient to explain human demonstrations and corrections [bobu2020quantifying], and updates the cost in proportion to the situational confidence in these features’ ability to explain input. We extend detecting hypothesis mispecification to the context of shared autonomy, in which there are multiple intents, represented as cost functions, and the robot seeks to recognize whether any of the known intents explain the human input sufficiently. The robot can then arbitrate its assistance based on its confidence in the most likely intent being what the human wanted.
Our approach, which we call Confidence-Aware Shared Autonomy (CASA), allows the robot to ascertain whether the human inputs are associated with a known or new task. By arbitrating the user’s input based on the confidence in the most likely intent, CASA follows a standard policy blending assistance approach if the task is known, and otherwise gives the user full control. Additionally, CASA allows the user to provide a few demonstrations of the new intent, which the robot uses to learn a cost function via Inverse Reinforcement Learning (IRL)[finn2016guided] and add it to its set of intents. This enables lifelong shared autonomy, where the robot helps when it is confident in what the user wants and learns new intents when it detects that the human is doing something novel, so that it can assist with that intent in the future.
We test our approach in a expert case study and a user study with a simulated 7DoF JACO assistive robot arm. Our results suggest that CASA significantly outperforms prior approaches when assisting for unknown intents, maintains high performance in the case of known ones, and successfully learns new intents for better lifelong shared autonomy.
Ii Confidence-Aware Shared Autonomy
We consider a human teleoperating a dexterous robotic manipulator to perform everyday manipulation tasks. The robot’s goal is to assist the person in accomplishing their desired skill by augmenting or changing their input. While the robot possesses a predefined set of possible intents, the human’s desired motion might not be captured by any of them. We propose that since the robot might not understand the person’s intentions, it should reason about how confident it is in its predictions to avoid assisting for the wrong intent.
Formally, let be the continuous robot state (e.g. joint angles), and the continuous robot action (e.g. joint velocity). The user controls their desired robot configuration by providing continuous inputs via an interface (e.g. GUI, joystick, keyboard commands, etc). These inputs are mapped to robot actions through a direct teleoperation function . Define a person’s trajectory up until time as the sequence .
The robot is equipped with a set of known intents , one of which may represent the user’s desired motion. Each intent is parameterized by a cost function , which may be hand-engineered or learned from demonstrations via IRL [maxent, Ng2000inverse]. For example, if the intent represents moving to a goal , the cost function can be distance to the goal:
. If the intent is pouring a cup, the cost can be a neural network with parameters, . Our shared autonomy system does not know the intent a priori, but infers it from the human’s inputs. Given the user’s trajectory so far, , a common strategy is to predict the user’s intent , compute the optimal action for moving accordingly, then augment the user’s original input with it [dragan2013blending].
However, what if none of the intents match the human’s input, i.e., the person is trying to do something the robot does not know about? We introduce a shared autonomy formalism where the robot reasons about its confidence in its current set of intents’ ability to explain the person’s input, and uses that confidence for robust assistance. This confidence serves a dual purpose, as the robot can also use it to ask the human to demonstrate the missing intent.
Ii-B Intent Inference
To assist the person, the robot has to first predict which of its known tasks the person is trying to carry out, if any. To do that, the robot needs a model of how people teleoperate it to achieve a desired motion. We assume the Boltzmann noisily-rational decision model [baker2007goal, von1945theory]:
where the person chooses the trajectory proportional to its exponentiated cost . The parameter controls how much the robot expects to observe human input consistent with the intent . Typically, is fixed, recovering the Maximum Entropy IRL observation model [maxent], which is what most inference-based shared autonomy methods use [dragan2013blending, javdani2015shared]. Inspired by work on confidence-aware human-robot interaction [fridovich-keil2019confidence, fisac2018probabilistically, bobu2020quantifying], we instead reinterpret as a measure of the robot’s situational confidence in its ability to explain human data, given the known intents , and we show how the robot can estimate it in Sec. II-C.
Given Eq. (1), if the cost of intent is additive along the trajectory , we have that:
where is the duration of the episode. In high-dimensional manipulation spaces, evaluating these integrals is intractable. We follow [dragan2013blending] and approximate them via Laplace’s method:
where is the action dimensionality, and the trajectories and are optimal with respect to and can be computed with any off-the-shelf trajectory optimizer111We use TrajOpt [trajopt], based on sequential quadratic programming..
Now, given a tractable way to compute the likelihood of the human input, the robot can obtain a posterior over intents:
assuming and a uniform prior over intents.
Prior inference-based shared autonomy work [dragan2013blending, javdani2015shared] typically assumes . We show that the robot should not be restricted by such an assumption and it, in fact, benefits from estimating and reinterpreting it as a confidence.
Ii-C Confidence Estimation
In the Boltzmann model in Eq. (1), we see that
determines the variance of the distribution over human trajectories. Whenis high, the distribution is peaked around those trajectories with the lowest cost ; in contrast, a low makes all trajectories equally likely. We can, thus, reinterpret to take a useful meaning in shared autonomy: given an intent, controls how well that intent’s cost explains the user’s input. A high for an intent indicates that the intent’s cost explains the input well and is a good candidate for assistance. A low on all intents suggests that the robot’s intent set is insufficient for explaining the person’s trajectory.
We can thus estimate and use it for assistance. Using the likelihood function in Eq. (3), we write the posterior
If we assume a uniform prior , we may compute an estimate of the confidence parameter per intent via a maximum likelihood estimate:
where we drop the Hessians since they don’t depend on . Setting the derivative of the objective in Eq. (6) to zero and solving for yields the following estimate:
Alternatively, we chose to add an exponential prior with parameter , , on to obtain a MAP estimate
The denominators in equations 7 and 8 can be interpreted as the “suboptimality” of the observed partial trajectory compared to the cost of the optimal trajectory for the particular , . Note that is inversely proportional to the suboptimality divided by the number of time steps that have passed. Intuitively, the user has more chances to be a suboptimal teleoperator as time goes on, so dividing for corrects for the natural increase in suboptimality over time.
If this normalized suboptimality is low for an intent , then the person is close to a good trajectory for that intent and will be high. Thus, a high means that the person’s input is well-explained by that intent. On the other hand, high suboptimality per time means the person is far from good trajectories, so ’s cost model does not explain the person’s trajectory and will be low.
Ii-D Confidence-based Arbitration
Armed with a confidence estimate for every , the robot can predict the most likely one using Eq. (4). From here, one natural style of assistance is “policy blending” [dragan2013blending]. First the robot computes an optimal trajectory under the most likely intent, , the first action of which is . Then the robot combines and using a blending parameter , resulting in the robot action . We also refer to as the human’s control authority.
Prior work proposes different ways to arbitrate between the robot and human actions by choosing
proportional to the robot’s distance to the goal or to the probability of the most likely goal[dragan2013blending]. However, when using the probability , might look much better than the other intents, resulting in the robot wrongly assisting for . Distance-based arbitration ignores the full history of the user’s input and can only accommodate simple intents.
Instead, we propose that the robot should use its confidence in the most likely intent, , estimated according to Sec. II-C, to control the strength of its arbitration:
When is high, i.e. the robot is confident that the predicted intent can explain the person’s input, is low, giving the robot more influence through its action . When is low, i.e. not even the most likely intent explains the person’s input, increases, giving the person’s action more authority.
Ii-E Using Confidence for Lifelong Learning
Estimating the confidence also lets the robot detect misspecification in : if all estimated for are below a threshold , the robot is missing the person’s intent.
Once the robot has identified that its intent set is misspecified, it should ask the person to teach it. We represent the missing intent as a neural network cost parameterized by and learn it via deep maximum entropy IRL [finn2016guided]. The gradient of the IRL objective with respect to the cost parameters can be estimated by: . are (noisy) demonstrations of the person executing the desired missing intent via direct teleoperation, and are trajectories sampled from the induced near the optimal policy.
Once we have a new intent , the robot updates its intent set . The next time the person needs assistance, the robot can perform confidence estimation, goal inference, and arbitration as before, using the new library of intents. While the complexity scales linearly with , planning can be parallelized across each intent.
Learned rewards fit naturally into our framework, allowing for a simple way to compare against the known intents. However, one could imagine adapting our method to the many other ways to learn an intent, from imitation learning[ho2016GAIL, reddy2020sqil], to dynamic movement primitives [paraschos2013PMP]. For instance, if we parameterize intents via policies, we can derive a similar confidence metric based on probabilities of observed human actions under a stochastic policy, rather than costs.
Iii Expert Case Study
In this section, we introduce three manipulation tasks and use expert data to analyze confidence estimation and assistance. We later put CASA’s assistive capacity to test with non-experts in a user study in Sec. IV.
Iii-a Experimental Setting
We conduct our experiments on the simulated 7-DoF JACO arm shown in Fig. 2. We use the pybullet interface [coumans2019] and teleoperate the robot via keypresses. We map 6 keys to bi-directional movements of the robot’s end-effector, and 2 keys for rotating it in both directions. We performed inference and confidence estimation twice per second.
We test CASA on 3 different tasks. In the Known Goal task, we control for the well-specified setting: the robot must assist the user to move to the known green goal location in Fig. 2. In the other tasks, we test CASA’s efficacy in the case of misspecification, where the user’s desired intent is initially missing from the robot’s known set . In the second task, Unknown Goal, the person teleoperates the robot to the red goal which is unknown to the robot. Finally, in the third and most complicated task, Unknown Skill, the person tries to pour the cup contents at an unknown goal location.
For the Unknown Goal and Unknown Skill tasks, we first run CASA before being exposed to the new intent (CASA before learning). Detecting low confidence, the robot then asks for demonstrations and learns the missing intents via deep maximum entropy IRL as discussed in Sec. II-E. We then run teleoperation with CASA after learning, to assess the quality of robot assistance after learning the new intent.
Iii-B Arbitration Method Comparison
We compare CASA to a policy blending assistance (PBA) baseline [dragan2013blending] that assumes for all intents. PBA arbitrates with the distance to the predicted goal: , with some threshold past which the robot does not assist. More sophisticated arbitration schemes use or the full distribution , but they are much less robust to task misspecification. This is because when the user teleoperates for an unknown intent, will be low for all known ; however, forming requires normalizing over all known intents, after which can still be high unless the user happened to operate in a way that appears equally unlikely under all known intents.
We analyzed this phenomenon by tracking a reference trajectory for the Unknown Goal task which moves optimally towards the unknown goal (see Fig. 2 for the task layout). We compared the performances of the distance and confidence arbitration methods, as well as a belief-based method which sets (chosen so that when , when ). In Fig. 3, the confidence in each goal stays low enough that the robot would have left the user in full control; meanwhile, the relatively higher likelihood of one goal causes the belief to quickly go to and thus set the user’s control authority to under the belief-based arbitration scheme.
We examined one belief-based arbitration method here, but since rapidly goes to , any other arbitration that is a function of the belief would similarly try to assist for the wrong goal, motivating our choice of the simpler but more robust distance-based arbitration baseline.
Iii-C Well-specified Tasks
Fig. 2 (top) showcases the results of our experiment for the Known Goal task. Looking at the confidence plot, we see that increases with time for the correct green goal, while it remains low for the alternate known purple goal. In the arbitration plot, as increases, gradually decreases, reflecting that the robot takes more control authority only as it becomes more confident that the person’s intent is indeed . Similarly, since there is no misspecification, PBA arbitration steadily decreases the human’s contribution to the final control. Both methods result in smooth trajectories which go to the correct goal location.
Iii-D Misspecified Tasks
Our approach distinguishes itself in how it handles misspecified tasks. During the Unknown Goal task, in Fig. 2 (middle), CASA before learning estimates low for both goals, since neither goal explains the person’s motion moving towards the red goal. The estimated is slightly higher for the green goal than for the purple one because it is closer to the user’s input; however, neither are high enough to warrant an arbitration below , and thus the robot receives no control. In Fig. 2 (bottom), we observe almost identical behavior before learning for the Unknown Skill task: the known intents do not match the user’s behavior, and thus the user is given full control authority and completes the task.
This contrasts PBA, which, for both Unknown Goal and Unknown Skill, predicts the green goal as the intent. Since in both cases the user’s desired trajectory passes near the green goal, PBA erroneously takes control and moves the user towards it, requiring the human to counteract the robot’s controls to try to accomplish the task.
In the middle plots for each of the misspecified tasks, we observe for CASA after learning, the newly-learned intents receive confidence estimates which increase as the robot is able to observe the user, and thus CASA contributes more to the control as it becomes confident.
Iv User Study
We now present the results of our user study, testing how well our method can assist non-expert users.
Iv-a Experimental Design
Due to the COVID-19 pandemic, we were unable to perform an in-person user study with a physical robot. Instead, as described in Sec. III, we replicated our lab set-up in a pybullet simulator [coumans2019] in which users can teleoperate a 7 DoF JACO robotic arm using keyboard inputs (Fig. 2).
We split the study into four phases: (1) familiarization, (2) no misspecification, (3) misspecification before learning, and (4) misspecification after learning. First, we introduce the user to the simulation interface by asking them to perform a familiarization task. In the next phase, we tested the Known Goal task. In the third phase, we tested the two misspecified tasks, Unknown Goal and Unknown Skill, then asked participants to provide 5 demonstrations for each intent. Finally, in the fourth phase, we retested the misspecified tasks using cost functions learned from the demonstrations.
Independent Variables: For each experiment, we manipulate the assistance method with three levels: no assistance (NA), policy blending assistance (PBA) [dragan2013blending], and Confidence-Aware Shared Autonomy (CASA). For Unknown Goal and Unknown Skill, we compared our method before and after learning new intents against the NA and PBA baselines.
Dependent Measures: Before each task, we displayed an exemplary reference trajectory to help participants understand their objective. As such, for our objective metrics, we measured Error as the sum of squared differences between the intended and executed trajectories, Efficiency Cost as the sum of squared velocities across the executed trajectory, and Effort as the number of keys pressed. To assess the users’ interaction experience, we administered a subjective 7-point Likert scale survey, asking the participants three questions: (1) if they felt the robot understood how they wanted the task done, (2) if the robot made the interaction more effortless, and (3) if the assistance provided was useful.
Participants: We used a within-subjects design and counterbalanced the order of the assistance methods. We recruited 11 users (10 male, aged 20-30) from the campus community, most of whom had technical background.
H1: If there is no misspecification, assisting with CASA is not inferior to assisting with PBA, and is superior to NA.
H2: If there is misspecification, assisting with CASA before learning is more accurate, efficient, and effortless than with PBA and not inferior to NA.
H3: If there is misspecification, assisting with CASA after learning is more accurate, efficient, and effortless than NA.
H4: If there is misspecification, participants will believe the robot understood what they want, feel less interaction effort, and find the assistance more useful with CASA after learning than with any other baseline.
Objective. Fig. 4 summarizes our main findings. For Known Goal, which is well-specified, CASA does no worse than PBA and better that NA in terms of relative effort and error. We confirmed this by running an ANOVA, finding a significant main effect for the method ( for effort; for error). In post-hoc testing, a Tukey HSD test revealed that CASA is significantly better than NA ( for effort, for error). We also performed a non-inferiority test [lesaffre2008noninferiority], and obtained that CASA is non-inferior to PBA within a margin of for effort, for efficiency, and for error. These findings are in line with H1 and were expected, since the robot should have no problem handling known intents.
For the two misspecified tasks, we first ran an ANOVA with the method (CASA before learning, NA, and PBA) as a factor, and the task as a covariate, and found a significant main effect ( for effort; for error). A Tukey HSD revealed that CASA is significantly better than PBA ( for effort, for error). We also ran a non-inferiority test, and obtained that CASA is non-inferior to NA within a margin of for effort, for efficiency, and for error for Unknown Goal, and for effort, for efficiency, and for error for Unknown Skill. For both unknown tasks, CASA before learning is essentially indistinguishable from NA since a low would make the robot rely on direct teleoperation. Both the figure and our statistical tests confirm H2, which speaks for the consequences of confidently assisting for the wrong intent.
For efficiency cost, we did not find an effect, possibly because Fig. 4 shows that PBA is more efficient for the Unknown Skill task than other methods. Anecdotally, PBA forced users to an incorrect goal thus preventing them from pouring, which explains the lower efficiency cost. By having a high arbitration for the wrong intent, PBA can cause a smooth trajectory, since it lowers the control authority of the possibly-noisy human inputs. However, this trajectory does not accomplish the task. When running an ANOVA for each of the tasks separately, we found a significant main effect for the method for Unknown Goal (), and a post-hoc Tukey HSD revealed CASA is significantly better than PBA (), further confirming H2.
Lastly, we looked at the performance with CASA after learning the new intents. For Unknown Goal, a simple task, the figure shows that CASA after learning doesn’t improve efficiency and error, but it does reduce relative effort when compared to NA. For Unknown Skill, a more complex task, CASA after learning outperforms NA. This is confirmed by an ANOVA with the method (NA, CASA after learning) as the factor, where we found a significant main effect ( for effort; for efficiency cost), supporting H3.
Subjective. We show the average Likert survey scores for each task in Fig. 5. In line with H1, for the Known Goal task, users thought the robot under both PBA and CASA had a good understanding of how they wanted the task to be done, made the interaction more effortless, and provided useful assistance. The results are in stark contrast to NA, which scores low on all those metrics. For Unknown Goal and Unknown Skill, all methods fare poorly on all questions except for CASA after learning, supporting H4.
In this paper, we formalized a confidence-aware shared autonomy process where the robot can adjust its assistance based on how confident it is in its prediction of the human intent. We introduced an approximate solution for estimating this confidence, and demonstrated its effectiveness in adjusting arbitration when the robot’s intent set is misspecified and enabling continual learning of new intents.
While our confidence estimates tolerated some degree of suboptimal user control, an extremely noisy operator attempting a known intent might instead appear to be performing a novel intent. Moreover, due to COVID, we ran our experiments in a simulator, which does not replicate the difficulty inherent in teleoperating a real manipulator via a joystick interface. Despite these limitations, we are encouraged to see robots have a more principled and robust way to arbitrate shared autonomy, as well as decide when they need to learn more to be better teammates. We look forward to applications of our confidence-based ideas beyond manipulation robots, to semi-autonomous vehicles, quadcopter control, or any other shared autonomy scenarios.