When humans work in collaboration with others, they can accomplish things that would be difficult to do alone and will often achieve their goals more efficiently. With the recent development of artificial intelligence technology, the human-agent team, in which humans and AI agents work together, has become an increasingly important topic. The role of agents in this problem is to collaborate with humans to achieve a set task.
One type of intuitive collaborative agent is the supportive agent, which helps a human by predicting the human’s objective and planning an action that would best help achieve it. In recent years, there have been agents that can plan effectively by inferring human subgoals for a partitioned problem based on the Bayesian Theory of Mind (Wu et al., 2021). Other agents perform biased behavior for generic cooperation, such as communicating or hiding their intentions (Strouse et al., 2018), maximizing the human’s controllability (Du et al., 2020), and so on. However, these agents cannot actually modify the human’s plan, which means the ultimate success or failure of the collaborative task depends on the human’s ability to plan. In other words, if the human sets the wrong plan, the performance will suffer.
In general, humans cannot plan optimal actions for difficult problems due to limitations in their cognitive and computational abilities. Figure 1 shows an example of a misleading human-agent team task as one such difficult problem. The task is a kind of pursuit-evasion problem. The human and the agent aim to capture one of the characters (shown as a face) by cooperating together and approaching the character from both sides. In Fig. 1(a), there are two characters, (1) and (2), on the upper and lower roads, respectively. Since (1) is farther away, (2) seems to be a more appropriate target. However, (2) cannot be captured because it can escape via the lower bypath. On the other side, in Fig. 1(b), the agent and the human can successfully capture (2) because it is slightly farther to the left than in Fig. 1(a). Thus, the best target might change due to a small difference in a task, and this can be difficult for humans to judge.
|(a): Best target is (1)||(b): Best target is (2)|
|(a): Explicit guidance||(b): Implicit guidance|
One naive approach to solving such a problem is for the agent to guide the human action toward the optimal plan. Since agents generally do not have cognitive or computational limitations, they can make an optimal plan more easily than humans. After an agent makes an optimal plan, it can explicitly guide the human behavior to aim for a target. For example, there is an agent that performs extra actions to convey information to others with additional cost (Kamar et al., 2009), where it first judges whether it should help others by paying that cost. As such guidance is explicitly observable by humans, we call it explicit guidance in this paper. However, if an agent abuses explicit guidance, the human may lose their sense of control regarding their decision-making in achieving a collaborative task—in other words, their autonomy. As a result, the human may feel they are being controlled by the agent. For example, in Human-Robot Teaming, if the robot decides who will perform each task, the situational awareness of the human will decrease. (Gombolay et al., 2017)
To reduce that risk, agents should guide humans while enabling them to maintain autonomy. We focus on implicit guidance offered through behavior. Implicit guidance is based on the cognitive knowledge that humans can infer the intentions of others on the basis of their behaviors (Baker et al., 2009) (Baker et al., 2017). The agent will expect the human to infer its intentions and to discard any plans that do not match what they infer the agent to be planning. Under this expectation, the agent acts in a way that makes it easy for the human to find the best (or at least better) plan for optimum performance on a collaborative task. Implicit guidance of this nature should help humans maintain autonomy, since the discarding of plans is a proactive action taken by the human.
Figure 2(a) is an example of explicit guidance for the problem in Figure 1. The agent guides the human to the best target directly and expects the human to follow. Figure 2(b) is an example of implicit guidance. When the agent moves upward, the human can infer that the agent is aiming for the upper target by observing the agent’s movement. Although this is technically the same thing as the agent showing the target character explicitly, we feel that in this case humans would feel as though they were able to maintain autonomy by inferring the agent’s target voluntarily.
In this work, we investigate the advantages of implicit guidance. First, we model three types of collaborative agent: a supportive agent, an explicit guidance agent, and an implicit guidance agent. Our approach for planning the agents is based on partially observed Markov decision process (POMDP) planning, where the unobservable state is the target that the human should aim for and a human’s behavior model is included in the transition environment. Our approach is simple, in contrast to the more complex approaches such as interactive POMDP (I-POMDP)(Gmytrasiewicz and Doshi, 2005), which can model bi-directional recursive intention inference infinitely. However, as there are studies indicating that humans have cognitive limitations (De Weerd et al., 2013) (de Weerd et al., 2017) regarding recursive intention inference, such a model might be too complex for the representation and not intuitive enough.
For our implicit guidance agent, we add to the POMDP formulation the factor that the human infers the agent’s target and changes their own target. This function is based on a cognitive science concept known as the Theory of Mind. Integrating a human’s cognitive model into the state transition function is not uncommon: it has been seen in assertive robots (Taha et al., 2011) and in sidekicks for games (Macindoe et al., 2012). Examples that are closer to our approach include a study on collaborative planning (Dragan, 2017) using legible action (Dragan et al., 2013) and another on human-aware planning (Chakraborti et al., 2018). These are similar to our concept in that an agent expects a human to infer its intentions or behavioral model. However, these approaches assume that a human does not change their goal and they do not guide the human’s goal to something more preferable. In terms of more practical behavior models, there have been studies on integrating a model learned from a human behavior log (Jaques et al., 2018; Carroll et al., 2019). However, this approach requires a huge number of interaction logs for the human who is the partner in the collaborative task. Our approach has the advantage of “ad-hoc” collaboration (Stone et al., 2010), which is collaboration without opponent information held in advance. We also adopt the Bayesian Theory of Mind. In the field of cognitive science, several studies have investigated how humans teach others their knowledge, and the Bayesian approach is often used for this purpose. For example, researchers have used a Bayesian approach to model how humans teach the concept of an item by showing the item to learners (Shafto et al., 2014). In another study, a Bayesian approach was used to model how humans teach their own behavioral preferences by giving a demonstration (Ho et al., 2016). Extensive evidence of this sort has led to many variations of the human cognitive model based on the Bayesian Theory of Mind, such as those for ego-centric agents (Nakahashi and Yamada, 2018; Pöppel and Kopp, 2019) and irrational agents (Zhi-Xuan et al., 2020), and we can use it too, for extending our algorithm. Of course, there are other theories of the Theory of Mind. For example, the Analogical Theory of Mind (Rabkina and Forbus, 2019) tries to model the Theory of Mind through the learning of structural knowledge. One advantage of the Bayesian Theory of Mind is that it is easy to calculate behaviors that people can guess simply by developing a straightforward Bayesian formula. That works to our advantage when it comes to efficient “ad-hoc” collaboration.
To evaluate the advantages of implicit guidance, we designed a simple task for a human-agent team and used it to carry out a participant experiment. The task is a pursuit-evasion problem similar to the example in Fig. 1. There are objects that move around in a maze to avoid capture, and the participant tries to capture one of the objects through collaboration with an autonomous agent. We implemented the three types of collaborative agent discussed above for the problem, and participants executed small tasks through collaboration with these agents. The results demonstrated that the implicit guidance agent was able to guide the participants to capture the best object while allowing them to feel as though they maintained autonomy.
2.1 Computational Model
2.1.1 Collaborative Task
We model the collaborative task as a decentralized partially observable Markov decision process (Dec-POMDP) (Goldman and Zilberstein, 2004). This is an extension of the partially observable Markov decision process (POMDP) framework for multi-agent setups that deals with a specific case in which all agents share the same reward function of a partially observable stochastic game (POSG) (Kuhn and Tucker, 1953).
Dec-POMDP is defined in the format , where is a set of agents, is a set of states, is a set of actions, and is a set of observations. is a transition function. is a reward function.
is observation emission probabilities.
In our setting, consists of an agent and a human , so consists of a human action and an agent action; thus, it can be represented as . Inspired by MOMDP (Ong et al., 2010), we factorize into observable factor as the position of the agent and the human, and the unobservable factor is the target, which formally defines the human’s goal for the task as . As a result, observations become equal to the observable factors of states, formally, , and can be factorized into observable state part and unobservable state part .
2.1.2 Agent Planning
We formalize the planning problem to calculate the actions of an agent for the collaborative task problem described above. In this formalization, the action space focuses only on the agent’s action, and the target can be changed only by the human through the observation of the actions of the agent. Furthermore, we integrate a human policy for deciding human action into the transition function. As a result, the agent planning problem is reduced to POMDP (Kaelbling et al., 1998), which is defined in the format . Here, we set and . In addition, we define the human policy function . We assume that humans will change their target by observing the actions of agents and then decide their own actions. Thus, the policy function requires agent actions as input.
We assume that the human policy is based on Boltzmann rationality:
where is a rational parameter and is the action value function of the problem given . is the only unobservable factor for the states, so the problem reduces into MDP given . Thus, we can calculate by general MDP planning such as value iteration.
On the basis of this formulation, we formulate a planning algorithm for the three collaborative agents. The difference is the human policy function, which represents an agent’s assumption toward human behavior. This difference is what makes the difference in the collaborative strategy.
The supportive agent assumes that humans do not change their target regardless of the agent’s action. That is, . is an indicator function.
Explicit Guidance Agent
The explicit guidance agent guides the human toward the best target; thus, it assumes that the human knows what the best target is. We represent the best target as , that is, . The best target is calculated as , where is the initial observable state and is the state value function of the problem given .
Implicit Guidance Agent
The implicit guidance agent assumes that humans change their target by observing the agent’s actions. We assume that humans infer the target of the agent on the basis of Boltzmann rationality, as suggested in earlier Theory of Mind studies (Baker et al., 2009) (Baker et al., 2017).
is also based on Boltzmann rationality:
where is a rational parameter and is the state value function of the problem regarding the state after given .
2.1.3 Decide Agent Actions
By solving POMDP as shown in 2.1.2
using a general POMDP planning algorithm, we can obtain an alpha-vector set conditioned with the observable factor of the current state regarding each action. We represent this as, and the agent takes the most valuable action . Formally,
where is the current belief of an unobservable factor. is updated on each action of a human and an agent as follows for each unobservable factor of belief :
The initial belief is for the supportive and implicit guidance agents and for the explicit guidance agent.
We conducted a participant experiment to investigate the advantages of the implicit guidance agent. This experiment was approved by the ethics committee of the National Institute of Informatics.
2.2.1 Collaborative Task Setting
The collaborative task setting for our experiment was a pursuit-evasion problem (Schenato et al., 2005), which is a typical type of problem used for human-agent collaboration (Vidal et al., 2002; Sincák, 2009; Gupta et al., 2017). This problem covers the basic factors of collaborative problems, that is, that the human and agent move in parallel and need to communicate to achieve the task. This is why we felt it would be a good base for understanding human cognition.
Figure 3 shows an example of our experimental scenario. There are multiple types of object in a maze. The yellow square object labeled “P” is an object that the participant can move, the red square object labeled “A” is an object that the agent can move, and the blue circle objects are target objects that the participant has to capture. When participants move their object, the target objects move, and the agent moves. Target objects move to avoid being captured, and the participant and the agent know that. However, the specific algorithm of the target objects is known only by the agent. Since both the participants and the target objects have the same opportunities for movement, participants cannot capture any target objects by themselves. This means they have to approach the target objects from both sides through collaboration with the agent, and the participant and the agent cannot move to points through which they have already passed. For this collaboration, the human and the agent should share with each other early on which object they want to capture. In the experiment, there are two target objects located in different passages. The number of steps needed to capture each object is different, but this is hard for humans to judge. Thus, the task will be more successful if the agent shows the participant which target object is the best. In the example in the figure, the lower passage is shorter than the upper one, but it has a path for escape. Whether the lower object can reach the path before the agent can capture it is the key information for judging which object should be aimed for. This is difficult for humans to determine instantly but easy for agents. There are three potential paths to take from the start point of the agent. The center one is the shortest for each object, and the others are detours for implicit guidance. Also, to enhance the effect of the guidance, participants and agents are prohibited from going backward.
We modeled the task as a collaborative task formulation. The action space corresponds to an action for the agent, and the observable state corresponds to the positions of the participant, agent, and target objects. The reward parameter is conditioned on the target object that the human aims for. The space of the parameter corresponds to the number of target objects, that is, . The reward for capturing a correct / wrong object for is 100, –100, and the cost of a one-step action is –1. Since a go-back action is forbidden, we can compress multiple steps into one action for a human or an agent to reach any junction. Thus, the final action space is the compressed action sequence and the cost is –1 the number of compressed steps. Furthermore, to prohibit invalid actions such as head to a wall, we assign such action a –1000 reward. We modeled three types of collaborative agent, as discussed above: a supportive agent, an explicit guidance agent, and an implicit guidance agent. We set the rational parameters as , and discount rate is 0.99.
The purpose of this experiment was to determine whether implicit guidance can guide humans while allowing them to maintain autonomy. Thus, we tested the following two hypotheses.
(H1) Implicit guidance can guide humans’ decisions toward better collaboration.
(H2) Implicit guidance can help humans maintain autonomy more than explicit guidance can.
We prepared five tasks. Two of these tasks, as listed below, were tricks to make it hard for humans to judge which would be the best target. All tasks are shown in the supplementary material.
(A) There were two winding passages with different but similar lengths. There were three tasks for this type.
(B) As shown in Fig. 3, there was a long passage and a short one with a path to escape. There were two tasks for this type.
We recruited participants for this study from Yahoo! Crowdsourcing. The participants were 100 adults located in Japan (70 male, 24 female, 6 unknown). The mean age of participants who answered the questionnaire we administered was 45 years.
Procedure of Experiment
Our experiment was based on a within-subject design and conducted on the Web using a browser application we created. Participants were instructed on the rules of the agent behavior and then underwent a confirmation test to determine their degree of understanding. Participants who were judged to not have understood the rules were given the instructions again. After passing this test, participants entered the actual experiment phase. In this phase, participants were shown the environment and asked “Where do you want to go?” After inputting their desired action, both the agent and the target objects moved forward one step. This process was continued until the participants either reached a target object or input a certain number of steps. When each task was finished, participants moved on to the next one. In total, participants were shown 17 tasks, which consisted of 15 regular tasks and two dummy tasks to check whether they understood the instructions. Regular tasks consisted of three task sets (corresponding to the three collaborative agents) that included five tasks each (corresponding to the variations of tasks). The order of the sets and the order of the tasks within each set were randomized for each participant. After participants finished each set, we gave them a survey on perceived interaction with the agent (algorithm) using a 7-point Likert scale.
The survey consisted of the questions listed below.
Was it easy to collaborate with this agent?
Did you feel that you had the initiative when working with this agent?
Could you find the target object of this agent easily?
Did you feel that this agent inferred your intention?
Item 2 is the main question, as it relates to the perceived autonomy we want to confirm. The additional items are to prevent biased answers and relate to other important variables for human-agent(robot) interaction. Item 1 relates to perceived ease of collaboration, namely, the fluency of the collaboration, which has become an important qualitative variable in the research on Human–Robot Interaction in recent years (Hoffman, 2019). Item 3 relates to the perceived inference of the agent’s intentions by the human. It is one of the variables focused on the transparency of the agent, which plays a key role in constructing human trust in an agent (Lewis et al., 2021). From the concrete algorithm perspective, a higher score is expected for guidance agents (especially explicit guidance agents) than for supportive agents. Item 4 relates to the perceived inference of the human’s intentions by the agent. This is a key element of the perceived working alliance (Hoffman, 2019), and when it is functioning smoothly, it increases the perceived adaptivity in human-agent interaction. Perceived adaptivity has a positive effect on perceived usefulness and perceived enjoyment (Shin and Choo, 2011). From the concrete algorithm perspective, a higher score is expected for agents without implicit guidance (especially supportive agents) than for implicit guidance agents.
Before analyzing the results, we excluded any data of participants who were invalidated. We used dummy tasks for this purpose, which were simple tasks that had only one valid target object. We then filtered out the results of participants (a total of three) who failed these dummy tasks.
3.1 Results of Collaborative Task
shows the rate at which participants captured the best object that the agent knew. In other words, it is the success rate of the guidance of the agent based on any of the given guidance. We tested the data according to the standard process for paired testing. The results of repeated measures analysis of variance (ANOVA) showed that there was a statistically significant difference between the agent types for the overall tasks (), task type (A) (), and task type (B) (
). We then performed repeated measures t-tests with a Bonferroni correction to determine which two agents had a statistically significant difference. “**” in the figure means there were significant differences between the two scores (). The results show the average rate for the overall tasks, task type (A), and task type (B). All of the results were similar, which demonstrates that the performances were independent of the task type. The collaboration task with the supportive agent clearly had a low rate. This indicates that the task was difficult enough that participants found it hard to judge which object was best, and the guidance from the agent was valuable for improving the performance on this task. These results are strong evidence in support of hypothesis H1. As another interesting point, there was no significant difference in the rate between implicit guidance and explicit guidance. Although we did not explain implicit guidance to the participants, they inferred the agent’s intention anyway and used it as guidance. Of course, the probable reason for this is that the task was so simple participants could easily infer the agent’s intentions. However, the fact that implicit guidance is almost as effective as explicit guidance in such simple tasks is quite impressive.
3.2 Results for Perceived Interaction with the Agent
Figure 5 shows the results of the survey on the effect of the agent on cognition. The results of repeated measures ANOVA showed that there was a statistically significant difference between the agent types for perceived ease of collaboration (), perceived autonomy (), and perceived inference of the human’s intentions (). In contrast, there was no statistically significant difference for survey item perceived inference of agent’s intentions (). We then performed repeated measures t-tests with a Bonferroni correction to determine which two agents had a statistically significant difference regarding variables that has a significant difference. “**” in the figure means there were significant differences between the two scores (). The most important result here is the score of perceived autonomy. From this result, we can see that participants felt they had more autonomy during the tasks when collaborating with the implicit guidance agent than with the explicit guidance one. These results are strong evidence in support of hypothesis H2.
Although the other results do not directly concern our hypothesis, we discuss their analysis briefly. Regarding the perceived inference of the human’s intentions, the results were basically as expected, but for the perceived inference of the agent’s intentions, the fact that there were no significant differences among all agents was unexpected. One hypothesis that explains this is that humans do not recognize the guidance information as the agent’s intention. As for the perceived ease of collaboration, the results showed that explicit guidance had adverse effects on it. Implicit guidance agents and supportive agents use exactly the same interface, though the algorithms are different, but explicit guidance agents use a slightly different interface to convey the guidance, which increases the amount of information on the interface a little. We think that the burden to understand such additional visible information might be responsible for the negative effect on the perceived ease of collaboration.
As far as we know, this is the first study to demonstrate that implicit guidance has advantages in terms of both task performance and the effect of the agent on the perceived autonomy of a human in human-agent teams. In this section, we discuss how our results relate to other studies, current limitations, and future directions.
4.1 Discussion of for the Results
The results in 3.1 show that both implicit and explicit guidance increase the success rate of a collaborative task. We feel one reason for this is that the quality of information in the guidance is appropriate. A previous study on the relationship between information type and collaborative task performance Butchibabu et al. (2016) showed that “implicit coordination” improves the performance of a task more than “explicit coordination” in a cooperative task. The word “implicit” refers to coordination that “relies on anticipation of the information and resource needs of the other team members”. This definition is different from ours, as “implicit coordination” is included in explicit guidance in our context. This study further divided “implicit coordination" into “deliberative communication,” which involves communicating objectives, and “reactive communication," which involves communicating situations, and argued that high-performance teams are more likely to use the former type of communication than the latter. We feel that quality of information in implicit guidance in our context is the same as this deliberative communication in that it conveys the desired target, which is one of the reasons our guidance can deliver a good performance.
The main concern of human-agent teams is how to improve the performance on tasks. However, as there have not been many studies that focus on the effect of the agent on cognition, the results in 3.2 should make a good contribution to the research on human-agent teams. One of the few studies that have been done investigated task performance and people’s preference for the task assignment of a cooperative task involving a human and an AI agent (Gombolay et al., 2015). In that study, the authors mentioned the risk that a worker with a robot collaborator may perform less well due to their loss of autonomy, which is something we also examined in our work. They found that a semi-autonomous setting, in which a human first decides which tasks they want to perform and the agent then decides the rest of the task assignments, is more satisfying than the manual control and autonomous control settings in which the human and the robot fully assign tasks. In cooperation with the implicit guidance agent and the supportive agent in our study, the human selects the desired character by him or herself. This can be regarded as a kind of semi-autonomous setting. Thus, our results are consistent with these ones in that the participants felt strongly that cooperation was easier than with explicit guidance agents. Furthermore, that study also mentioned that task efficiency has a positive effect on human satisfaction, which is also consistent with our results.
4.2 Limitation and Future Direction
Our current work has limitations in that the experimental environment was small and simple, the intention model was a small discrete set of target objectives, and the action space of the agent was a small discrete set. In a real-world environment, there is a wide variety of human intentions, such as target priorities and action preferences. The results in this paper do not show whether our approach is sufficiently scalable for problems with such a complex intention structure. In addition, the agent’s action space was a small discrete set that can be distinguished by humans, which made it easier for the human to infer the agent’s intention. This strengthens the advantage of implicit guidance, so our results do not necessarily guarantee the same advantage for environments with continuous action spaces. Extending the intention model to a more flexible structure would be the most important direction for our future study. One of the most promising approaches is integration with studies on inverse reinforcement learning(Ng et al., 2000)
. Inverse reinforcement learning is the problem of estimating the reward function, which is the basis of behavior, from the behavior of others. Intention and purpose estimation based on the Bayesian Theory of Mind can also be regarded as a kind of inverse reinforcement learning(Jara-Ettinger, 2019). Inverse reinforcement learning has been investigated for various reward models (Abbeel and Ng, 2004; Levine et al., 2011; Choi and Kim, 2014; Wulfmeier et al., 2015) and has also been proposed to handle uncertainty in information on a particular reward (Hadfield-Menell et al., 2017). Finally, regarding the simplicity of our experimental environment, using environments that are designed according to an objective complexity factor (Wood, 1986) and then analyzing the relationship between the effectiveness of implicit guidance and the complexity of the environment would be an interesting direction for future work.
Another limitation is the assumption that all humans have the same fixed cognitive model. As mentioned earlier, a fixed cognitive model is beneficial for ad-hoc collaboration, but for more accurate collaboration, fitting to individual cognitive models is important. The first approach would be to parameterize human cognition with respect to specific cognitive abilities (rationality, K-level reasoning (Nagel, 1995), working memory capacity (Daneman and Carpenter, 1980), etc.) and to fit the parameters online. This would enable the personalization of cognitive models with a small number of samples. One such approach is human-robot mutual adaptation for “shared autonomy,” in which control of a robot is shared between the human and the robot (Nikolaidis et al., 2017). In that approach, the robot learns “adaptability," which is the degree to which humans change their policies to accommodate a robot’s control.
Finally, the survey items we used to determine the effect of the agent on perceived autonomy were general and subjective. For a more specific and consistent analysis of the effect on perceived autonomy, we need to develop more sophisticated survey items and additional objective variables. Consistent multiple questions to determine human autonomy in shared autonomy have been used before (Du et al., 2020). As for measuring an objective variable, analysis of the trajectories in the collaborative task would be the first choice. A good clue for the perceived autonomy in the trajectories is “shuffles”. Originally, shuffles referred to any action that negates the previous action, such as moving left and then right, and it can also be an objective variable for human confusion. If we combine shuffles with goal estimation, we can design a “shuffles for goal” variable. A larger number of variables means that a human’s goal is not consistent, which would thus imply that he or she is affected by others and has low autonomy. In addition, reaction time and biometric information such as gaze might also be good candidates for objective variables.
Another limitation of this study is that we assumed humans regard the agent rationally and act only to achieve their own goals. The former is problematic because, in reality, humans may not trust the agent. One approach to solving this is to use the Bayesian Theory of Mind model for irrational agents (Zhi-Xuan et al., 2020). As for the latter, in a more practical situation, humans may take an action to give the agent information, similar to implicit guidance. Assistance game/cooperative inverse reinforcement learning (CIRL) (Hadfield-Menell et al., 2016) has been proposed as a planning problem for this kind of human behavior. In this problem, only the human knows the reward function, and the agent assumes that the human expects it to infer this function and take action to maximize the reward. The agent implicitly assumes that the human will give information for effective cooperative planning. Generally, CIRL is computationally expensive, but it can be solved by slightly modifying the POMDP algorithm (Malik et al., 2018), which means we could combine it with our approach. This would also enable us to consider a more realistic and ideal human-agent team in which humans and agents provide each other with implicit guidance.
In this work, we demonstrated that a collaborative agent based on “implicit guidance” is effective at providing a balance between improving a human’s plans and maintaining the human’s autonomy. Implicit guidance can guide human behavior toward better strategies and improve the performance in collaborative tasks. Furthermore, our approach makes humans feel as though they have autonomy during tasks, more so than when an agent guides them explicitly. We implemented agents based on implicit guidance by integrating the Bayesian Theory of Mind model into the existing POMDP planning and ran a behavioral experiment in which humans performed simple tasks with autonomous agents. Our results demonstrated that there were many limitations, such as a poor agent information model and trivial experimental environment. Even so, we believe our findings could lead to better research on more practical and human-friendly human-agent collaboration.
Wu et al. 
Sarah A Wu, Rose E Wang, James A Evans, Joshua B Tenenbaum, David C Parkes, and
Too many cooks: Bayesian inference for coordinating multi-agent collaboration.Topics in Cognitive Science, 13(2):414–432, 2021.
- Strouse et al.  DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, and David Schwab. Learning to share and hide intentions using information regularization. arXiv preprint arXiv:1808.02093, 2018.
- Du et al.  Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, and Anca Dragan. Ave: Assistance via empowerment. Advances in Neural Information Processing Systems, 33, 2020.
- Kamar et al.  Ece Kamar, Ya’akov Gal, and Barbara J Grosz. Incorporating helpful behavior into collaborative planning. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). Springer Verlag, 2009.
- Gombolay et al.  Matthew Gombolay, Anna Bair, Cindy Huang, and Julie Shah. Computational design of mixed-initiative human–robot teaming that considers human factors: situational awareness, workload, and workflow preferences. The International journal of robotics research, 36(5-7):597–617, 2017.
- Baker et al.  Chris L Baker, Rebecca Saxe, and Joshua B Tenenbaum. Action understanding as inverse planning. Cognition, 113(3):329–349, 2009.
- Baker et al.  Chris L Baker, Julian Jara-Ettinger, Rebecca Saxe, and Joshua B Tenenbaum. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour, 1(4):1–10, 2017.
- Gmytrasiewicz and Doshi  Piotr J Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24:49–79, 2005.
- De Weerd et al.  Harmen De Weerd, Rineke Verbrugge, and Bart Verheij. How much does it help to know what she knows you know? an agent-based simulation study. Artificial Intelligence, 199:67–92, 2013.
- de Weerd et al.  Harmen de Weerd, Rineke Verbrugge, and Bart Verheij. Negotiating with other minds: the role of recursive theory of mind in negotiation with incomplete information. Autonomous Agents and Multi-Agent Systems, 31(2):250–287, 2017.
- Taha et al.  Tarek Taha, Jaime Valls Miró, and Gamini Dissanayake. A pomdp framework for modelling human interaction with assistive robots. In 2011 IEEE International Conference on Robotics and Automation, pages 544–549. IEEE, 2011.
- Macindoe et al.  Owen Macindoe, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Pomcop: Belief space planning for sidekicks in cooperative games. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference, 2012.
- Dragan  Anca D Dragan. Robot planning with mathematical models of human state and action. arXiv preprint arXiv:1705.04226, 2017.
- Dragan et al.  Anca D Dragan, Kenton CT Lee, and Siddhartha S Srinivasa. Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 301–308. IEEE, 2013.
- Chakraborti et al.  Tathagata Chakraborti, Sarath Sreedharan, and Subbarao Kambhampati. Human-aware planning revisited: A tale of three models. In Proc. of the IJCAI/ECAI 2018 Workshop on EXplainable Artificial Intelligence (XAI). That paper was also published in the Proc. of the ICAPS 2018 Workshop on EXplainable AI Planning (XAIP), pages 18–25, 2018.
- Jaques et al.  Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Çaglar Gülçehre, Pedro A Ortega, DJ Strouse, Joel Z Leibo, and Nando de Freitas. Intrinsic social motivation via causal influence in multi-agent rl. corr abs/1810.08647 (2018). arXiv preprint arXiv:1810.08647, 2018.
- Carroll et al.  Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. On the utility of learning about humans for human-ai coordination. Advances in Neural Information Processing Systems, 32:5174–5185, 2019.
- Stone et al.  Peter Stone, Gal A Kaminka, Sarit Kraus, and Jeffrey S Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
- Shafto et al.  Patrick Shafto, Noah D Goodman, and Thomas L Griffiths. A rational account of pedagogical reasoning: Teaching by, and learning from, examples. Cognitive psychology, 71:55–89, 2014.
- Ho et al.  Mark K Ho, Michael Littman, James MacGlashan, Fiery Cushman, and Joseph L Austerweil. Showing versus doing: Teaching by demonstration. Advances in neural information processing systems, 29:3027–3035, 2016.
- Nakahashi and Yamada  Ryo Nakahashi and Seiji Yamada. Modeling human inference of others’ intentions in complex situations with plan predictability bias. arXiv preprint arXiv:1805.06248, 2018.
- Pöppel and Kopp  Jan Pöppel and Stefan Kopp. Egocentric tendencies in theory of mind reasoning: An empirical and computational analysis. In CogSci, pages 2585–2591, 2019.
- Zhi-Xuan et al.  Tan Zhi-Xuan, Jordyn L Mann, Tom Silver, Joshua B Tenenbaum, and Vikash K Mansinghka. Online bayesian goal inference for boundedly-rational planning agents. arXiv preprint arXiv:2006.07532, 2020.
- Rabkina and Forbus  Irina Rabkina and Kenneth D Forbus. Analogical reasoning for intent recognition and action prediction in multi-agent systems. In Proceedings of the Seventh Annual Conference on Advances in Cognitive Systems, pages 504–517, 2019.
- Goldman and Zilberstein  Claudia V Goldman and Shlomo Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of artificial intelligence research, 22:143–174, 2004.
- Kuhn and Tucker  Harold William Kuhn and Albert William Tucker. Contributions to the Theory of Games, volume 2. Princeton University Press, 1953.
- Ong et al.  Sylvie CW Ong, Shao Wei Png, David Hsu, and Wee Sun Lee. Planning under uncertainty for robotic tasks with mixed observability. The International Journal of Robotics Research, 29(8):1053–1068, 2010.
- Kaelbling et al.  Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Schenato et al.  Luca Schenato, Songhwai Oh, Shankar Sastry, and Prasanta Bose. Swarm coordination for pursuit evasion games using sensor networks. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pages 2493–2498. IEEE, 2005.
Vidal et al. 
Rene Vidal, Omid Shakernia, H Jin Kim, David Hyunchul Shim, and Shankar Sastry.
Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation.IEEE transactions on robotics and automation, 18(5):662–669, 2002.
- Sincák  DHV Sincák. Multi–robot control system for pursuit–evasion problem. Journal of electrical engineering, 60(3):143–148, 2009.
- Gupta et al.  Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems, pages 66–83. Springer, 2017.
- Hoffman  Guy Hoffman. Evaluating fluency in human–robot collaboration. IEEE Transactions on Human-Machine Systems, 49(3):209–218, 2019.
- Lewis et al.  Michael Lewis, Huao Li, and Katia Sycara. Deep learning, transparency, and trust in human robot teamwork. In Trust in Human-Robot Interaction, pages 321–352. Elsevier, 2021.
- Shin and Choo  Dong-Hee Shin and Hyungseung Choo. Modeling the acceptance of socially interactive robotics: Social presence in human–robot interaction. Interaction Studies, 12(3):430–460, 2011.
- Butchibabu et al.  Abhizna Butchibabu, Christopher Sparano-Huiban, Liz Sonenberg, and Julie Shah. Implicit coordination strategies for effective team communication. Human factors, 58(4):595–610, 2016.
- Gombolay et al.  Matthew C Gombolay, Reymundo A Gutierrez, Shanelle G Clarke, Giancarlo F Sturla, and Julie A Shah. Decision-making authority, team efficiency and human worker satisfaction in mixed human–robot teams. Autonomous Robots, 39(3):293–312, 2015.
- Ng et al.  Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
- Jara-Ettinger  Julian Jara-Ettinger. Theory of mind as inverse reinforcement learning. Current Opinion in Behavioral Sciences, 29:105–110, 2019.
Abbeel and Ng 
Pieter Abbeel and Andrew Y Ng.
Apprenticeship learning via inverse reinforcement learning.
Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
- Levine et al.  Sergey Levine, Zoran Popovic, and Vladlen Koltun. Nonlinear inverse reinforcement learning with gaussian processes. Advances in neural information processing systems, 24:19–27, 2011.
- Choi and Kim  Jaedeug Choi and Kee-Eung Kim. Hierarchical bayesian inverse reinforcement learning. IEEE transactions on cybernetics, 45(4):793–805, 2014.
- Wulfmeier et al.  Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888, 2015.
- Hadfield-Menell et al.  Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, and Anca Dragan. Inverse reward design. arXiv preprint arXiv:1711.02827, 2017.
- Wood  Robert E Wood. Task complexity: Definition of the construct. Organizational behavior and human decision processes, 37(1):60–82, 1986.
- Nagel  Rosemarie Nagel. Unraveling in guessing games: An experimental study. The American Economic Review, 85(5):1313–1326, 1995.
- Daneman and Carpenter  Meredyth Daneman and Patricia A Carpenter. Individual differences in working memory and reading. Journal of verbal learning and verbal behavior, 19(4):450–466, 1980.
- Nikolaidis et al.  Stefanos Nikolaidis, Swaprava Nath, Ariel D Procaccia, and Siddhartha Srinivasa. Game-theoretic modeling of human adaptation in human-robot collaboration. In Proceedings of the 2017 ACM/IEEE international conference on human-robot interaction, pages 323–331, 2017.
- Hadfield-Menell et al.  Dylan Hadfield-Menell, Stuart J Russell, Pieter Abbeel, and Anca Dragan. Cooperative inverse reinforcement learning. Advances in neural information processing systems, 29:3909–3917, 2016.
- Malik et al.  Dhruv Malik, Malayandi Palaniappan, Jaime Fisac, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan. An efficient, generalized bellman update for cooperative inverse reinforcement learning. In International Conference on Machine Learning, pages 3394–3402. PMLR, 2018.