Mixed-Initiative variable autonomy for remotely operated mobile robots

11/12/2019
by   Manolis Chiou, et al.
0

This paper presents an expert-guided Mixed-Initiative (MI) variable-autonomy controller for remotely operated mobile robots. The controller enables switching between different Level(s) of Autonomy (LOA) during task execution initiated by either the human operator or/and the robot. The controller is evaluated in two Search and Rescue (SAR) inspired experiments, one with a simulated robot and test arena and one with a real robot in a realistic environment. Our hypothesis is that timely switching between different LOAs will enable the system to overcome various performance degrading factors and thus enable superior performance compared to systems in which LOAs cannot switch on-the-fly. Statistically validated analyses from the two experiments provide evidence that: a) Human-Initiative (HI) systems outperform purely teleoperated or autonomous systems in navigation tasks; b) MI systems provide improved performance in navigation tasks, improved operator performance in cognitive demanding secondary tasks, and improved operator workload. Results also reinforce previous Human-Robot interaction (HRI) evidence regarding the importance of the operator's personality traits and their trust in the autonomous system. Lastly, the paper provides empirical evidence that identify two major challenges for MI control: a) the design of context-aware MI controllers; and b) the conflict for control between the robot and the operator.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 14

page 17

page 18

page 19

page 21

page 22

page 23

08/26/2021

Human operator cognitive availability aware Mixed-Initiative control

This paper presents a Cognitive Availability Aware Mixed-Initiative Cont...
07/01/2021

Trust, Shared Understanding and Locus of Control in Mixed-Initiative Robotic Systems

This paper investigates how trust, shared understanding between a human ...
09/01/2020

Control Framework for a Hybrid-steel Bridge Inspection Robot

Autonomous navigation of steel bridge inspection robots is essential for...
10/05/2021

Fessonia: a Method for Real-Time Estimation of Human Operator Workload Using Behavioural Entropy

This paper addresses the problem of the human operator cognitive workloa...
11/23/2018

A Blended Human-Robot Shared Control Framework to Handle Drift and Latency

Maximizing the utility of human-robot teams in disaster response and sea...
02/19/2018

Human-in-the-Loop Mixed-Initiative Control under Temporal Tasks

This paper considers the motion control and task planning problem of mob...
06/25/2021

Navigating A Mobile Robot Using Switching Distributed Sensor Networks

This paper proposes a method to navigate a mobile robot by estimating it...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the aftermath of the 9/11 World Trade Center (WTC) terrorist attack, several robots were used in an Urban Search and Rescue (USAR) operation. The robots’ tasks were to inspect areas beneath debris and rubble, or enter confined spaces (e.g. dangerous voids), that humans or dog rescuers could not enter. Robot operators were working under stressful and performance-degrading conditions such as cognitive fatigue, sleep deprivation, and minimum sensor information (e.g. caused by dust or poor lighting) which led to mistakes in robot operation [1].

On March 11th 2011 a tsunami hit Fukushima Daiichi nuclear power plant causing three nuclear meltdowns and the release of radioactive material. In an effort to initially assess and contain the situation, military-grade robots were used to inspect the buildings and the reactors for any damage, radiation, and other potentially hazardous conditions. These robots had to be operated by the nuclear power plant workers, after only brief training sessions. A typical scenario involved those newly trained operators remotely controlling the robots for many hours while wearing a hazmat suit, making operation a very challenging task. Additionally, these operators had to face a number of psychologically challenging and stressful situations (e.g. dosimeter alarms going off), as famously reported by one of the robot operators in his blog [2].

Several years after these disasters, many of the initial challenges and shortcomings of robot use in disaster response remain. Despite the need to make the use of robots in safety and time critical applications (e.g. USAR, nuclear decommissioning, and hazardous environment inspection) easier, robots used are still predominately teleoperated, with little or no autonomy used to assist the human operator. There are three main reasons why autonomous systems are not used in these settings. First, these tasks often involve environments that are highly unstructured and changing, e.g. a partially collapsed building. Second, the nature of these tasks requires specific human abilities such as critical decision making based on incomplete information (e.g. determining if a victim dead or alive; and risk assessment of certain actions); or communication with victims in Search and Rescue (SAR) [3]. Third, high consequence industries tend to be conservative and therefore do not trust autonomous systems. For example in the Fukushima Daiichi nuclear accident, such lack of trust led to the robots being deployed via pure teleoperation, despite having various autonomous capabilities [4]. For these reasons, current deployment of such robots always requires a human in the loop [5].

Traditionally, research on improving teleoperation has mostly focused on two major areas: interfaces and telepresence. Interfaces have been studied as a way to improve robot teleoperation and deal with its intrinsic disadvantages by providing the operator with improved situational awareness (SA) [6]. However, studies have shown that humans tend to rely too much on visual feedback and cues [7, 8], something that can overload their visual modality, making the interface design problem non-trivial. Telepresence systems are challenging to design and often require specialized equipment. This is because human senses (e.g. tactile senses) are very complex to be sensed by the robot and to be transferred as an experience to the operator, with a high degree of fidelity [9, 10].

Equipping robots with autonomous capabilities can potentially tackle some of the intrinsic difficulties of teleoperation. Several field studies have pointed out the need for robots that actively assist operators (e.g. in the 9/11 World trade center [1]; the DARPA robotics challenge [11]; and other studies e.g. [12]). What is required is a human-robot team that benefits from the capabilities of both agents, whilst counteracting the weaknesses of each. Such systems offer the potential to assist a human operator who may be struggling to cope with issues such as high workload, intermittent communications, operator multitasking, fatigue, and sleep deprivation. For example, a human operator might need to concentrate on a secondary task while temporarily delegating control to the robot to navigate autonomously. This is something very common as robot operators have to convey situational awareness information, e.g. to SAR task force team mates [12, 13].

Our research addresses the use of variable autonomy as an approach to blending the capabilities of humans and robots. A variable autonomy system is one in which control can be traded between a human operator and a robot by switching between different Levels of Autonomy [14]. Levels of Autonomy (LOAs) refer to the degree to which the robot, or any artificial agent, takes its own decisions and acts autonomously [15]. In the case of robotics, LOAs can vary from the level of pure teleoperation (human has complete control of the robot), to the other extreme which is full autonomy (robot has control of every capability), within a single robot.

This article addresses the problem of dynamically changing LOA on-the-fly (i.e. during task execution) using either human initiative (HI) or mixed initiative (MI) control. HI refers to a system where the human operator is solely responsible for switching LOA based on their judgement. MI refers to a system where both the human operator and the robot have authority to initiate actions and to change the LOA. How best to dynamically switch LOAs in order to improve system performance is a challenging and open problem that remain largely unexplored in the literature.

The hypothesis explored in this article is that timely switching between different LOAs during task execution (e.g. during navigation) can improve task performance and will enable the system to overcome various performance-degrading factors. This is compared to robotic systems in which LOAs cannot switch on-the-fly. More specifically we investigate the capacity of the human operator (based on their judgement) and the robot (based on an online performance metric) to use HI and MI control in order to improve performance compared to pure teleoperation or autonomy. In the rest of this article, when we use the term “robot”, we use it interchangeably for both the physical robot hardware and the (potentially autonomous) software system that controls this hardware. If a distinction between these elements is important, we make it explicit.

I-a Scope of current research

Our initial work [16]

demonstrated that conducting variable autonomy experiments is challenging due to the high number of intrinsic confounding factors that can lead to large variances in the results. Such confounding factors include individual differences in personality traits; experience in operating robots or playing video games; and map exploration strategies. In subsequent work we contributed a systematic variable autonomy experimental framework to avoid these issues 

[14]. In [14], human operators used an Operator Control Unit (OCU) and HI control to navigate a remotely controlled mobile robot in a maze-like test arena. The OCU was a laptop, joystick, mouse, and a screen showing the control interface (i.e. map, robot status, and video feedback from robot’s camera). The HI controller allowed operators to switch between two different LOAs on the fly: an autonomy LOA, in which the operator could give a navigational goal by clicking on the interface, causing the robot to navigate autonomously towards that point; and a teleoperation LOA, in which the human operator manually controlled the robot by using the joystick. The results showed that that HI variable autonomy can overcome various performance degrading factors and outperform both pure teleoperation and pure autonomy in various circumstances. Additionally, in [17] we explored the HRI aspects of the human operator interacting with and exploiting the HI controller. It provided evidence that operators’ interactions with the variable autonomy controller (e.g. frequency of LOA switching; preferred LOA) are not necessarily influenced only by task performance.

In this paper we advance variable autonomy research by moving from HI to MI robotic systems. We also provide a complete picture of research regarding robots which dynamically switch LOA and make the case that HI or MI control provides advantages over single modes of operation (e.g. pure teleoperation or pure autonomy). This is done through the design and evaluation of an MI controller.

I-B Contributions

This article makes the following contributions. It presents:

  • statistically-validated empirical evidence that HI systems outperform purely teleoperated or autonomous systems in navigation tasks, based on real robot USAR-inspired experiments;

  • a framework and guidance for designing robotic mixed initiative (MI) control systems. More specifically, this paper proposes the design of expert-guided MI controllers.

  • a novel MI control system in which both the operator (based on judgement) and the robot (based on an online performance metric) are able to switch between different LOAs. This controller is novel for the following reasons: a) it is the first MI controller that is capable of LOA switching during task execution; b) the first to be practically implemented and experimentally evaluated. Additionally, we took a novel approach in designing the MI controller as it makes use of expert knowledge; an online performance metric; and simplified contextual knowledge to infer if a LOA switch is needed.

We also provide, for the first time, a rigorous evaluation plus statistically-validated empirical evidence regarding the advantages of the MI control in various circumstances, as compared to HI and teleoperation. These advantages are: a) improved performance in navigation tasks; b) improved operator performance in cognitively demanding secondary tasks such as the mental rotation of 3D objects; c) a reduction in operator workload. In addition this article contributes an analysis of the interaction of a human operator with an MI system. This is the first study that has quantitatively reported on metrics from an MI system such as time spent in each LOA; frequency of LOA switches; perceived workload; and their correlation with system performance. Finally, we identify and provide empirical evidence and insights into two major challenges for MI control: the design of context-aware MI controllers; and the conflict for control between the operator and the robot.

Ii Background and related work

In this section, we discuss research on strategies for switching LOAs. More specifically, we identify gaps in the literature regarding three key aspects: a) conducting rigorous experiments on LOA switching; b) LOA switching initiated by the human operator (namely HI); and c) LOA switching initiated by the AI or the operator (namely MI). The focus of this paper is MI control and thus the relevant to MI robotic systems literature will be reviewed in detail. Literature on conducting variable autonomy experiments and investigating HI control is briefly discussed for completeness. For further information on experimental frameworks and HI please refer to our previous work [16, 14, 17].

Ii-a Conducting variable autonomy experiments

Perhaps surprisingly, our previous work [16, 14], identified that research on variable autonomy, especially MI, in mobile robots lacks the rigorous experimental framework necessary to draw reliable conclusions. We define rigorous in this context to mean an experimental framework which consist of: a) appropriate statistical analysis; b) clarity on assumptions and hypotheses; c) precise and detailed descriptions of the experimental protocols followed; d) a formalized and repeatable experimental paradigm.

A common deficiency is that many of the previous studies do not control for possible confounding factors such as uncontrolled test environments (e.g. [18]); absence of standardized training for the human operators (e.g [19, 20, 21]); the robot having different speed limits in the different conditions tested [20]; or the different navigation strategies of human operators, similar to the ones observed in our work [16] or in [22].

There are excellent examples of related work which do provide a rigorous protocol, statistical analysis and detailed description. However, they either are in the domain of shared control [23, 24], or they investigate different tasks than ours at a different level of abstraction (e.g. task allocation in an industrial assembly scenario [25]).

Ii-B Human-Initiative variable autonomy

A Human-Initiative (HI) variable autonomy system is a system in which the human operator can dynamically switch between different LOAs (e.g. between teleoperation and autonomy) [14]. In such systems only the human operator has the authority to initiate LOA switches based on their judgement.

Many of the systems found in literature, such as [26], are restricted to an initial LOA choice as they cannot change LOA on the fly. Other systems (e.g. [27]) aid the operator’s judgement by suggesting potential changes in the LOA but contrary to our work they are not validated experimentally, similar to other variable autonomy studies (e.g. [28]). Other SAR-inspired studies (e.g. [18]), in contrast to our work, are not focused on evaluating the overall task-performance when LOA levels can be dynamically switched. Additionally, those studies did not incorporate methods for degrading robot’s or operator’s performance in a systematic manner.

Ii-C Human-Robot Interaction with LOA switching robots

Human-Robot Interaction (HRI) in a LOA switching system remains mainly unexplored in the prior literature with the exception of our work in [17]. Previous variable autonomy studies investigated the human operator’s interaction with a robotic system, but were restricted to exploring a single LOA [19, 29]. Other studies, [27, 28, 18] did not present any data on the operator’s interaction with the LOA switching controller, unlike our work.

Our previous work [17] reported a systematic analysis of the ways in which human operators interact with, and exploit the capabilities of, a HI robotic system with dynamic LOA capabilities. In this article we extend this work by reporting on the HRI between the operator and a MI controller.

Ii-D Definition and taxonomy of Mixed-Initiative control

In [30] Jiang and Arkin define MI control in the context of human-robot teams as:

“A collaboration strategy for human-robot teams where humans and robots opportunistically seize (relinquish) initiative from (to) each other as a mission is being executed, where initiative is an element of the mission that can range from low-level motion control of the robot to high-level specification of mission goals, and the initiative is mixed only when each member is authorized to intervene and seize control of it.”

They also present the first taxonomy for MI robotic systems. Their taxonomy has three dimensions.

The first dimension is span-of-mixed-initiative which characterizes the control elements (initiatives) in which both agents are capable of initiating actions. The system we present in Section III-C is mostly-joint (a term from this dimension in  [30]) since both agents have initiative over two of the control elements: navigation execution and LOA switch. The second taxonomy dimension is initiative reasoning capacity which characterizes the ability of an agent to reason about taking the initiative. Our system is deliberative since it has the ability to reason about initiating actions deliberately based on an online performance metric and simplified context awareness. The final dimension of the taxonomy is initiative hand-off coordination which characterizes the strategies used by the system when shifting initiative from one agent to the other. Our system is characterized as explicitly-coordinated in this dimension.

Ii-E Mixed-Initiative control systems

Our literature survey found several systems characterized by authors as MI. However given the comprehensive taxonomy in [30], we believe that many of them cannot be characterized as truly MI control systems. This is due to the fact that the ability to trigger actions or LOA switches is not accessible to all members of the human-robot team in these systems. For example, in [31] only the operator is able to initiate actions, although these are based on system’s suggestions. In this survey we only present systems that have some form of true MI control (i.e. both human and operator can initiate actions or LOA switches).

Shared control is a term often used generally to describe systems in which the human cooperates at some level with the robot; or used specifically to describe research and systems in which some form of input mixing or blending between the robot’s and/or the operator’s commands is used. In this paper, to avoid confusion, we will use this term to explicitly refer to research involving some form of input mixing. Shared control from our perspective is a LOA that can fall under the banner of MI control. Any mixed initiative behaviour is restricted within a shared control LOA. This means that the robot can only take the initiative to blend its navigation control input with the one from the operator, in order to improve the control output. Similarly, in safeguard teleoperation (another form of shared control), the robot reactively takes the initiative to prevent collisions. In both cases no LOA switching takes place. Also in most cases the human operator does not have any initiative in choosing how to blend the control inputs. In Section III-D further discussion can be found on how our system relates to shared control and input blending.

Nielsen et al. [22] conduct experiments using multiple LOAs. However, the LOA is chosen during the initialization of the system and cannot change on the fly. Moreover, similar to shared control, the robot has only reactive initiative inside a specific LOA to prevent collisions. Lastly, initiative is not coordinated by any hand-off strategy. Multiple LOAs (teleoperation, safe mode, shared mode, autonomy) are tested in [19]. In shared mode the robot drives autonomously while accepting interventions from the operator. In safe mode the robot takes initiative only to prevent collisions. However, these LOAs cannot change on the fly and robot’s initiative is limited in safe mode. [20] present a control mode in which the operator gives directional commands using a joystick to adjust robot’s navigation. The system offers limited initiative depending on the frequency of the operators interaction.

Gombolay et al. [25, 32] use an industrial assembly scenario to investigate how varying degrees of task allocation authority between the human and the robot affect the human-robot team’s task performance, SA and task work-flow. Their system has three different LOAs: a) the human decides how to allocate the tasks; b) the human decides which tasks they will perform, while the robot allocates the remaining tasks to itself and to other humans; c) the robot has full authority on task allocation. Contrary to our work these LOA cannot change on the fly, hence the agents lack the initiative to override each other’s actions. Also the nature of the scenario is fundamentally different than our work because it includes more than two agents, is at a higher level of abstraction, and lastly the tasks require physical collaboration between the humans and the robot.

Research on MI systems that are able to switch LOA dynamically or have initiative capabilities not restricted to a single LOA, is fairly limited. Moreover, the MI systems proposed are either purely theoretical, or not experimentally evaluated. Bruemmer et al. [33] present a theoretical, multiple LOA, MI system. This system is based on theories of robot behavior (human understanding of the robot) and human behavior (robot understanding of the human). The latter proposes gathering readily available non-intrusive workload cues from the operator as an indication of poor performance. More specifically, it proposes the use of the frequency of human input and the number and kind of dangerous commands issued by the operator as performance indicators. This provides the robot with the capacity to initiate switches between the different LOAs. However, it can be argued that input frequency is not necessarily an indication of poor performance. Rather it reflects different operators’ driving styles. Adams et al. [34] propose a MI robot control architecture which relies on the detection of the operator’s emotional state. Initiative is mixed in all the levels of the system, i.e. in setting goals and constraints, planning and execution. Changes in control are initiated based on the operator’s sensed state (e.g. boredom, stress, drowsiness, engagement). This requires user-specific models that can be challenging to create. Lastly, in contrast to our work, [34] does not propose any hand-off coordination strategies and the system is not experimentally evaluated.

For multi-robot systems, variable autonomy often lies at a higher level of abstraction compared to our work on single robots. Manikonda et al. [35] describe a multi-robot MI controller and testbed for human-robot teams in tactical operations. The agents in the system share information via a common world model. Based on this information they are able to initiate modifications to their goals and associated roles in the team. In [36] a MI approach is proposed in a multi-robot search task. Robots are equipped with the ability to initiate changes in their respective search areas (e.g. size of search area). These changes are reactively triggered by specific events, e.g. the human operator has identified an item of interest.

In summary, MI robotic systems found in the literature offer limited initiative inside a predefined LOA. In the case of multi-robot systems, the MI lies in a higher level of abstraction, making assumptions about other layers, e.g. navigation. Moreover, contrary to our work, initiative actions from the robot are not based on task performance metrics. The robot controller instead takes initiative by reacting to sensor input (e.g. obstacles).

To the best of our knowledge, the work presented in this article is the first to use an online task performance metric to address the problem of switching LOA during task execution using a MI controller. Also we are the first to show in a systematic way the benefits to human operator cognitive workload and the benefits to different tasks performance (e.g. navigation, spacial awareness tasks etc) of a robotic system that initiates dynamic LOA switches.

Iii An expert-guided framework for designing Mixed-Initiative controllers

The fundamental problem in MI control is how to allow LOA switching by either the operator or the robot in order to improve the human-robot system’s performance. Without loss of generality, in the rest of this paper we assume a human-robot system which has two LOAs: one in which a human has most of the control (i.e. some form of teleoperation), and one in which the robot has most of the control (i.e. some form of autonomous control). However the descriptions in the following can be extended to the case where other LOAs exist, e.g. alternate autonomous or shared control LOAs.

In MI control the robot, meaning the hardware, can be seen as a resource with two different agents having control rights: the human operator and the robot’s autonomous control system. At any given moment, the most capable agent should take control. Hence, of particular importance is the ability for each agent to diagnose the need for a LOA change, and to take or relinquish control successfully.

An operator’s LOA switches are based on judgement. As demonstrated by the HI results in [14], given sufficient understanding of the system and the situation humans are able to determine when they need to change LOA. More specifically, analysis of the HI results from our previous work [17] revealed that operators switch LOA based on three factors: a) preferred LOA; b) context; and c) performance degradation. The preferred LOA is the LOA the participants tend to instinctively return to. Their preference may be based on a number of task- or system-specific factors (e.g. trust in the control software) and personal traits (e.g. preference to be in control). In context-sensitive LOA switches, operators are able to evaluate the current context and infer if a LOA switch is needed. For example they can change control preemptively as they predict performance degradation in a given situation. An example is a situation in which noise starts to appear in the sensors, thus they change to teleoperation as they predict degradation in the autonomous controller’s performance.

In order to enable the robot controller to automatically take control when it is needed, and to relinquish control when under-performing, it is necessary to provide it with a mechanism to detect when the performance of the human-robot system is degraded. For example, the robot controller could initiate a switch to a LOA that offered increased autonomous capabilities in situations where the human operator is too preoccupied with the primary cause of their performance degradation to voluntarily switch control to the robot.

To this end we propose an expert-guided approach to the design of MI controllers. For a given task, we assume the existence of a task expert that can provide the expected task performance for the human-robot system in the absence of performance degrading or other unexpected factors. This expert is needed in order to provide a reference point for behavior and performance, which might be different from the actual runtime performance and behavior of the system. Based on this assumption, the core idea is that the MI expert-guided controller should compare the current performance of the system with the expert performance. This comparison yields an online task effectiveness

metric, expressing how effective the system is performing its given task relative to the performance of the expert. If this metric indicates that the current task performance is worse than the expert expects, then the robot should initiate a LOA switch. Such experts can be created in many ways, e.g. through simulation, assuming no noise, learning by demonstration, etc. Depending of the application and the implementation, such experts can represent an optimal, close-to-optimal, heuristic, or optimistic performance reference point for the actual system performance to compare against at runtime.

The application of our expert-guided MI approach to a given human-robot team and task relies on the following assumptions: a) the human operator is willing to be handed control and to hand over control based on the initiative of the robot; b) the agent to which the control will be handed (i.e. either human or robot) is capable of correcting the task effectiveness degradation [16]; and c) the system is equipped with an expert controller providing information on how (given the assumptions made in the design of the expert) the system should be performing in a given task.

Iii-a An expert-guided Mixed-Initiative controller for navigation

Many of the applications for remotely operated robots involve navigation. Therefore for the remainder of this article we focus on the problem of MI control for navigation, specifically the task of the moving the robot to goal location as quickly and as safely as possible.

Creating an expert-guided MI navigation controller requires an online task effectiveness measure for navigation. After some experimentation with measures based on the numbers of collisions and the location of the robot at given points in time, we have chosen to use goal-directed motion error (refereed to as “error” for the rest of the article). This is the difference between the robot’s current motion and the motion of the robot required to achieve its goal (reach a target location).

Such a motion error can be trivially provided by taking the difference between the robot’s current direction of travel and the direction directly towards location of the navigation target, however this does not account for environment structure. To provide more environment context to the controller we extract an expert performance measure from a concurrently active navigation system which is given an idealised (unmapped obstacle- and noise-free) view of the robot’s world. In a real scenario (e.g. the robot has to inspect a building for damage) such view or map can be provided to the system as prior knowledge, e.g. taken from the building’s plan. This expert uses the navigation planner from the Robot Operating System (ROS) navigation stack [37]. More specifically, the global shortest feasible path is calculated based on Dijkstra’s algorithm [38], while the local path and optimal velocities are calculated using the dynamic window approach [39]. The input to the navigation planner is the robot’s location (from its localisation system), its target location, plus only the static map of the environment. Using this, the expert continually reports the velocity with which the robot should be moving towards its goal. By omitting dynamic obstacle maps from the planning process (i.e. additional obstacles detected by sensing elements of the environment not present in the map) the velocity suggested by the expert can be seen as an upper bound on the performance of the system. In essence, the expert provides an optimistic model of possible robot behavior.

From the expert-reported velocity we extract the speed in the axis (i.e. the axis denoting the forward/backward motion of the robot). This denotes the speed which the robot should be moving along the optimal path towards the goal. To create an error measure the expert’s speed is constantly compared with the current speed of the robot. The difference these values is the goal directed motion error that we use in the controllers presented in the remainder of this article. Please refer to Figure 1 for the block diagram of the system.

Fig. 1: The block diagram of the MI control system. Given a navigation goal; the current pose of the robot; and the map (or a known surrounding area); the expert planner yields a close-to-optimal suggested speed. This speed denotes how fast the robot should be moving towards achieving the navigation goal. Then this speed is compared with the current speed of the robot towards that goal to calculate the raw performance error. The MI controller decides on switching LOA based on the filtered error.

In addition to the general assumptions described in the previous section, our navigation-based expert MI approach assumes that: a) the region surrounding the robot is mapped; b) the robot’s control system has access to a target location for navigation; and c) the system is equipped with a navigation expert planner capable of reliably computing both a close-to-optimal path and velocities towards a navigation goal. The assumptions a and b represent the knowledge of the task, its environment, and what it is to be achieved. This knowledge is required by the expert planner in order to infer the close-to-optimal path and velocities. In a real scenario the robot operator can have such prior knowledge which can use as input to the robot. For example, the operator can provide a map to the robot with the floor plan (e.g. of the building to be explored, taken from the authorities or mapped by a UAV) and the coordinates of a point of interest (e.g. the location of a pipe suspected to be leaking or the potential location of victims).

Iii-B Threshold Mixed-Initiative controller

Using the above approach we created two different MI controllers. Our first approach was to create a threshold controller using the goal directed motion error metric. This controller observes the system’s performance over a period of time. If performance drops below a given threshold then a switch in the LOA is initiated regardless of which agent is in control. To implement this approach, the goal-directed motion error must satisfy the following requirements: a) frequent, brief changes in error should be treated as noise; and b) the final error signal to be used should express the accumulated error over time. The second requirement comes from our model since it assumes that the performance drop is observed over a short period of time before any initiative takes place.

The exponential moving average (EMA) algorithm (commonly known as Brown’s simple exponential smoothing [40]) applied (see equation 2) to the raw error (equation 1), satisfies both of these requirements. It acts as a smoothing filter to high frequency noise and also it accumulates the error over a period of time. It reflects the error trends quicker compared to a simple moving average as it does not have a phase shift. It is also easy to implement and computationally cheap. The raw error equation follows:

(1)

The raw error is the difference between the close-to-optimal speed from the expert planner and the current linear speed of the robot moving towards the goal. The equation for the smoothed error using the EMA follows:

(2)

The term refers to the final accumulated and smoothed error. The term refers to the current error observation. The smoothing factor controls the weight of the most recent (i.e. current) observation and thus the time window in which the error will be accumulated. The bigger the is, the more the current error observation is taken into account and thus the smaller the accumulated error time window is (i.e. past error observations contribute less in the calculation).

The threshold controller uses the filtered error to initiate LOA switches. When this error exceeds a given threshold, the controller infers that the current agent in control is under-performing. Thus, it initiates a change in the LOA. To calculate the error threshold and the smoothing factor we used the following procedure. We replayed the robot motion for all operator trials from the HI experiment in [14] through the controller with a given threshold and both selected via grid search. For each trial this yielded a set of LOA changes proposed by the controller. We compared this set of LOA switches to the LOA switches initiated by the operators in the previous HI data. We then used the cost function in Equation 3 to calculate the cost of a particular parametrisation of the controller for each individual trial. The final parameters chosen were the parameters that minimized the sum of costs for all participants. The cost function for each individual trial was:

(3)

Put simply, equation 3 expresses the LOA switch prediction error from the controller on HI data, with the addition of a penalty term for every prediction that does not match (i.e. false positives). Where is the time-stamp of the predicted LOA switch and is the time-stamp of the operator’s actual switch. The subscript denotes a specific time-stamp. Where is the number of predictions that do not match operator’s switches, and is a cost penalty. A non-matching prediction is defined as a prediction that falls outside of a small time window around any of the operator’s LOA switches. The assumption is that these predictions do not correspond to any actual operator’s LOA switches and thus they get penalized.

The parameters calculated with the grid search were and error . One could imagine two scenarios in which the value could be different: a) the system needs to react immediately to the current errors and hence needs to be large; or b) the system needs to be cautious and react only when errors have accumulated over time, and hence needs to be relatively small. However, in practice, these two scenarios are both included in realistic navigation tasks similar to the ones in our experiments, as they can both occur during task execution. The calculated here represents a trade-off between these two extremes. It offers fast reactions to large current errors, and at the same time it filters noise (i.e. sudden instantaneous errors). Hence, the value does not need to be re-derived for every scenario as long as the scenario is of the same nature and not a specialized case of navigation. For these reasons, the sensitivity of the controller to react and switch LOA should be governed by how the error value is managed.

We conducted a pilot experiment with a small number of participants, using these parameters with the threshold controller. It rapidly became clear that this controller did not work well. Poor performance was characterized by excessive LOA switching (i.e. the controller was oversensitive in detecting performance drops). As a result the controller was intrusive and impractical. This led us to design a new expert knowledge controller based on fuzzy logic, as described in the next section.

Iii-C Fuzzy Mixed-Initiative controller

A bang-bang Mamdani type fuzzy controller [41, 42] was designed to address the limitations of the threshold controller. The fuzzy lite [43] c++ library was used for the task. The repository containing the ROS code for the MI controller and any code necessary to replicate the experiments described in this paper, is provided under MIT license at [44].

In control theory a bang-bang controller (also known as on-off controller), is a controller that switches between two states. In this case, as explained later in this section, the controller’s two states were: a) switch LOA; b) do not switch LOA. The Mamdani type refers to the type of the fuzzy inference process. In Mamdani inference [41], both the antecedent and the consequent parts of the fuzzy control rules are linguistic, allowing for expert knowledge to be easily incorporated into the system. Hence, the output after the linguistic fuzzy rule aggregation is a fuzzy set and needs defuzzification. Fuzzy rule aggregation is the process by which the fuzzy sets denoting the activation of each rule are combined to produce a single output fuzzy set.

This controller goes beyond the threshold controller by utilizing expert knowledge in three ways: a) by defining fuzzy sets for the goal directed motion error, instead of a fixed threshold; b) by incorporating very simple context information; c) by taking informed decisions on when to initiate a LOA switch based on a fuzzy logic rule base. The hypothesis is that the fuzzy logic MI controller will yield smoother transitions between LOA switches decisions. Thus, will not be as intrusive (i.e. reactive) in switching LOA compared to the simple threshold controller.

In fuzzy logic a linguistic variable associates a linguistic concept and the fuzzy set representing that concept to a numerical value. For example the linguistic variable “error” when its value is “small”, associates the numerical value for “small” with the fuzzy set representing the concept of “small”, i.e. with the membership function of “small” fuzzy set. Our fuzzy controller has two variables as inputs. The first linguistic input variable is “error” that denotes the filtered error. This is the filtered error from the exponential moving average (EMA) as in the previous threshold controller. The smoothing factor used in the EMA filter, was the one found by the parameters search for the threshold controller. The second input variable is “speed” that denotes the speed at which the robot is currently moving.

(a)
(b)
Fig. 2: 1(a): Membership functions for the linguistic input variable “error”. 1(b): Membership functions for linguistic input variable “speed”.

For the linguistic input variable the universe of discourse (i.e. the range of all possible values for an input) is . A value of means that no goal directed motion error exists, i.e. the robot is making progress towards the goal without any performance degradation. The value of is the maximum error, meaning that the robot is not progressing towards the goal (e.g. robot is idle). This is despite the maximum speed of the robot being . Due to physical and mechanical constrains (e.g. acceleration limits) the maximum new speed that can be given to the robot at any time, cannot have a difference with the current speed more than . For the linguistic input variable the universe of discourse is . The value of is the maximum speed of the robot. The value of is the maximum reverse speed of the robot.

The input variable “error” (i.e. ) is mapped into three linguistic values (i.e. three fuzzy set membership functions, see Figure 1(a)). The set for input can be defined for the following linguistic values: . The error threshold calculated in our previous threshold controller was used to denote the fuzzy linguistic value “large”. In essence what operators consider to be a large error to justify a LOA switch, is encoded into the fuzzy controller. This knowledge was extracted by using a grid search algorithm on HI data as described in Section III-B. The values and membership functions for “small” and “medium” were heuristically chosen in order to smoothly overlap throughout the universe of discourse (see table II). This is a common practice when designing fuzzy controllers.

The input variable “speed” (i.e. ) is mapped into three linguistic values (see Figure 1(b)).The set for input can be defined for the following linguistic values: . The value “reverse” denotes that the robot’s speed is negative, which means the robot is reversing. The value “zero” denotes that the robot is idle and not moving. The value “forward” denotes that the robot is moving forward (see table II).

The fuzzification process transforms the crisp values of the inputs into fuzzy values. This is achieved using the fuzzy membership functions described above and in table II. Then, a set of fuzzy rules is applied to the fuzzy inputs (see table I). The following standard (i.e. commonly used) operators are used in the fuzzy inference process: for conjunction (i.e. “and”) the minimum operator is used; for disjunction (i.e. “or”) the maximum operator is used; for rule activation the minimum operator is used; for fuzzy rules aggregation the maximum operator is used. The fuzzy rules (see table I) used in the controller were constructed using expert knowledge from HI data. In essence the fuzzy rules dictate that the controller will initiate a LOA switch only when the “error” is “large” and the robot is not reversing. If the “error” is “large” and the robot is reversing, the assumption is that the agent in control is trying to extricate the robot from error situation. Hence, by taking into account this simplified contextual knowledge, the controller will not switch LOA.

Fig. 3: Output membership functions.

Similar to the work of Nagi et al. [42] we follow the fuzzy bang-bang relay controller (FBBRC) approach. This means that the Largest of Maxima (LOM) defuzzification method is used. The LOM method has the advantage of directly producing a two-level state output which in our case is mapped into “change” (i.e. switch) LOA and “no change” (see Figure 3). This allows for the antecedent part of the fuzzy rules to be freely chosen while the consequent part has only two linguistic values (i.e. “change” or “no change” LOA). The output’s universe of discourse represents the bang-bang output. The value of means that no LOA switching initiative will take place. The value of means that the controller will initiate a LOA switch.

The LOA switching problem under investigation is complex, ill-defined, and its underlying dynamics are not precisely known. Thus, a fuzzy controller offers several advantages. Primarily it allows the efficient management of the real world fuzziness through the use of expert knowledge coming from human operators. This can be easily achieved by extending the fuzzy sets and fuzzy rules base to include new linguistic input variables, metrics, heuristics etc. Additionally, the controller can be extended to have more output states in order to facilitate a more complex system. This is based on the fact that fuzzy logic allows for smoother transition between states. Lastly, fuzzy logic enables more transparency and better understanding of how the controller works, something important given the current state of the research.

No. Rules
1 IF error is small OR error is medium THEN LOA is no change
2 IF error is large AND speed is not reverse THEN LOA is change
3 IF speed is reverse AND error is large THEN LOA is no change
TABLE I: The fuzzy rule base.
Linguistic value Membership functions
error small
error medium
error large
speed reverse
speed zero
speed forward
TABLE II: The fuzzy membership functions for linguistic values of input variables “error” and “speed” (see Figure 2). For “error”, trapezoid membership functions have been used. For “speed” membership functions, two trapezoid for “reverse” and “forward” and one triangular for “zero”, were used. The membership functions were heuristically chosen in order to smoothly overlap throughout the universe of discourse. This is a common practice when designing fuzzy controllers. The membership function for “error large” was chosen based on the error threshold calculated in Section III-B.

Iii-D Relevance of the Mixed-Initiative controller to shared control

As explained in Section II-E, we consider shared control to be a specific LOA that only loosely falls under the banner of MI control. This is despite the similarities that shared control and input blending systems have with MI control. This is because in most cases of shared control only the robot has the initiative to adjust operator’s control input e.g. to blend robot commands with operator’s joystick navigation commands. In this section we will discuss how our MI control approach relates to shared control by using the formalism proposed by Dragan and Srinivasa [24]. In their work they interpret shared control as a policy blending. More specifically, “as an arbitration of two policies, namely, the user’s input and the robot’s prediction of the user’s intent. At any instant, given the input, , and the prediction, , the robot combines them using a state-dependent arbitration function ”.

Let us assume a navigation task similar to our case. Let us also assume that the navigation goal or a correct prediction of that goal, is known. Then, according to the above formalism, the arbitration function will decide on how the robot’s control input will be blended with the operator’s control commands to contribute towards navigating to the goal. Simply put, the more the operator’s commands are in accordance to the goal or prediction , the less the robot will blend its commands with . If is not in accordance with then the robot will heavily adjust so that the output commands will contribute to navigation towards the goal. In this example two extreme cases exist: a) the operator’s input does not need any correction and hence it is used as the output command directly; or b) the robot fully blends/changes and hence the robot’s commands are used as output. These two extreme cases roughly correspond to our MI approach with the equivalent of the arbitration function being the MI controller. Case a) corresponds to the teleoperation LOA (operator fully in control); and case b) corresponds to the autonomy LOA (robot fully in control).

However, there are fundamental differences between the above formalism and our MI controller. In shared control the arbitration function decides how the operator’s input will be blended with the robot commands in a continuous range from teleoperation to autonomy, and only in a very specific control element. This control element usually lies in low level control, e.g. velocity navigation commands. Also the initiative lies within the arbitration function. In contrast, the MI controller proposed here allows switches between different discrete LOAs with both the operator and the robot having the authority to initiate or override each other’s actions.

The framework and the controller presented in this paper (see Figure 1) are applicable and can be extended to also include multiple LOAs without changing their fundamental principles. For example a third LOA such as shared-control can be added. One way to do this is by extending the fuzzy controller’s rule base and the output membership function. A rule can be added to switch to shared control LOA when the “error” is “medium”. The practical interpretation of such LOA switch is that the operator needs some assistance but not necessarily in the form of the robot taking full control. Similarly, additional LOAs can be added depending on the application, e.g. a LOA in which the robot will perform a certain function (e.g. navigate autonomously) while the operator will perform some other function concurrently (e.g. control of a pan-tilt camera or of robot’s arm). However, because the problem of MI switching between two LOAs is already challenging, we decided to perform our investigation by implementing the two LOAs corresponding to the extreme cases: teleoperation and autonomous control.

Iv Experiment 1: evaluation using a simulated robot and test environment

To allow a direct comparisons between MI and HI control we first evaluate our expert-guided MI controller using the same framework as in our previous work [14]. To be useful, the MI algorithm should provide the same level of performance or better in terms of primary task completion time or score when compared to the HI system. The reason that the MI controller can be useful despite potentially having the same level of performance as HI is that there are situations in which a human-initiated LOA switch may not be possible (e.g. loss of communication or incapacitated operator), therefore the MI controller has the ability to provide additional system redundancy which his not evaluated in these experiments.

An experimental evaluation of the fuzzy logic MI controller described in section III-C was conducted. The aim was to make an initial evaluation of the MI controller and compare it’s performance with that of the HI controller. If performance on the tasks prove to be in a similar level or better than the HI controller, then the MI controller has a positive and meaningful impact upon the system. This will be especially true compared to using only teleoperation or autonomy. More specifically the experiment described in this section evaluates: a) the human’s and robot’s ability to switch LOA between teleoperation and autonomy in order to overcome circumstances in which an MI system is under-performing; b) how the MI controller performs compared to the HI one of [14]; c) the unfolding Human-Robot Interaction (HRI) of MI control.

Iv-a Experimental setup - test arena, tasks and tested conditions

.

(a)
(b)
Fig. 4: 3(a): The simulated arena and the robot model used in the experiment. 3(b): laser-derived SLAM map created in the simulation environment. Primary task was to drive from point A to B and back again to A. The yellow shaded region is where artificial sensor noise was introduced. The blue shaded region is where the secondary task was presented to the operator.

In the previous work [14] we carried out experiments in a high fidelity simulated arena with dimensions of meters (see Figure 4) using Human-Initiative (HI) control. The robot was controlled by an OCU, composed of a laptop, a joystick, a mouse and a screen showing the control interface. We used a simulated Pioneer-3DX mobile robot fitted with a camera and a laser scanner. We conducted the experiment described in this section using an identical setup to [14], i.e. identical robot arena; system; experimental paradigm and procedures. This is in order to facilitate direct comparison between the controllers.

Operators were given the primary task of navigating from point A in Figure 3(b) (the beginning of the arena) to point B (the end of the arena) and then back again to point A. During each experimental trial, two different kinds of performance degrading factors were introduced, one for each agent. At certain times, artificially generated sensor noise was used to degrade the performance of autonomous navigation. At other times, a cognitively intensive secondary task was used to degrade the performance of the human operator. Each of these performance degrading situations occurred twice per experimental trial, once on the way from point A to point B, and once on the way back from B to A. These degrading factors occurred separately from each other, as shown in Figure 3(b).

More specifically, autonomous navigation was degraded by adding Gaussian noise to the laser scanner range measurements, thereby degrading the robot’s localization and obstacle avoidance abilities. To ensure experimental repeatability, this additional noise was instantiated whenever the robot entered a predefined area of the arena, and was deactivated when the robot exited that area. Note that such region-specific noise can in fact happen in real-world applications, e.g. when a robot travels through a highly radioactive region during nuclear decommissioning or the exploration and remediation of nuclear disaster sites such as Fukushima.

To degrade the performance of the human, a secondary task of mentally rotating 3D objects was used [45]. Whenever the robot entered a predefined area in the arena, the operator was presented with a series of 10 cards, each showing images of two 3D objects (see Figure 4(a)). These cards were put by the experimenter on the OCU desk, on the right hand side of the operator. On five cards, the objects were identical but rotated, and on the other five cards the objects were mirror images with opposite chiralities. The operator was required to state whether or not the two objects were identical.

(a)
(b)
Fig. 5: 4(a): A typical example of a rotated 3D objects card. 4(b): The control interface as presented to the operator. Left: video feed from the camera, the control mode in use and the status of the robot. Right: The map showing the position of the robot, the current goal (blue arrow), the AI planned path (green line), the obstacles’ laser reflections (red) and the walls (black).

In [14] for each human test-subject, three different control modes were tested: a) teleoperation mode, in which the human operator was restricted to using only direct joystick control to drive the robot; b) autonomy mode, in which the operator was only allowed to guide the robot by clicking desired destinations on the robot’s laser-generated 2D map; c) HI

mode, in which the operator was allowed to switch LOA at any time (using a button on the joy-pad) according to their judgement, in order to maximize performance.

In the experiment presented here each operator was tested in Robot-Initiative (RI) and MI control. The MI controller, as described in Section III-C, gives the robot and the operator the capacity and authority to switch dynamically (i.e. during task execution) between teleoperation and autonomy. The fuzzy robot controller initiates LOA switches based on goal directed motion performance (i.e. effectiveness) and simplified (i.e. limited) context. The operator switches LOA using a button on the joy-pad, according to their judgement. The RI controller uses identical mechanisms with the MI controller for the robot LOA switching behavior. However, it removes the operator’s ability to trigger LOA switches and thus it restricts them to using the robot’s dictated LOA. In both controllers, when a LOA switch occurs, the robot alerts operators in three different ways using: a) an alarm sound identical to the one denoting “engine failure” in airplanes; b) synthetic speech expressing the LOA the robot has switched to; c) a GUI notification.

Iv-B Participants and procedure

The 24 participants of our previous HI study were asked to participate in this experiment. From these, 16 were available at the time of the experiment and participated. This allowed us, along with the identical setup, to directly compare the results from MI and RI with the ones of the same participants from the previous experiment. We used a within-groups experimental design with every participant performing two trials: a) one using the MI controller; b) one using the RI controller.

Each participant underwent extensive training before the experiment, similar to our previous work in [14], but adapted to the new controllers. This ensured that all participants had adequate understanding of the new system and had attained a common minimum skill level. Counterbalancing was used in the experimental trials (i.e. the order of the tested controllers was rotated for different participants) in order to prevent both learning and fatigue effects from biasing the results.

Participants were asked to perform the primary task (robot navigation) as quickly as possible while minimizing collisions. Participants were also instructed that, when presented with the secondary task, they should do it as quickly and as accurately as possible. They were explicitly told to prioritize the secondary task over the primary task, and only to perform the primary task if the workload allowed. This is to prevent different operators having different priorities (e.g. some of them focusing on driving while others focusing on the secondary task) and thus minimizing a potential confounding factor. Additionally, as explained in our previous work [14], when people are instructed to do both tasks in parallel to the best of their abilities, they either a) ignore the secondary task or b) choose random answers for the secondary task to alleviate themselves from the secondary workload, so that they can continue focusing on the primary task of robot driving.

The operators could only acquire SA information via the OCU, which displays real-time video feed from the robot’s front-facing camera, and the estimated robot location on the 2D SLAM map (for the interface see Figure

4(b)). All participants were given an identical and complete 2D map, generated by SLAM prior to the trials.

At the end of each experimental run participants completed a NASA Task Load Index (TLX) questionnaire [46]. NASA-TLX is a questionnaire system which enables the perceived workload and difficulty of using a system to be numerically quantified.

Iv-C Results: tasks performance

metric ANOVA control mode effect descriptive statistics
primary task
completion time
, ,
,
HI: ,
MI: ,
RI: ,
collisions
, ,
,
HI: ,
MI: ,
RI: ,
primary task
score
, ,
,
HI: ,
MI: ,
RI: ,
secondary task
completion time
, ,
,
HI: ,
MI: ,
RI: ,
baseline: ,
secondary task
errors
, ,
,
HI: ,
MI: ,
RI: ,
baseline: ,
NASA-TLX
scores
, ,
,
HI: ,
MI: ,
RI: ,
percentage of time
spent in autonomy
, ,
,
HI: ,
MI: ,
RI: ,
number of
LOA switches
, ,
,
HI: ,
MI: ,
RI: ,
TABLE III: table showing the ANOVA results and the descriptive statistics for the metrics used in the experiment.

A repeated measures one-way ANOVA was used to compare RI, MI and HI. An ANOVA with Greenhouse-Geisser correction was used for the cases where the sphericity assumption (i.e. that the variances of the differences between trials are not equal) was violated. For HI, the subset of data from [14] corresponding to the 16 participants was used. Although a large period of time (7 months) had passed since their initial participation, we identified learning as a factor that might have affected performance (see Section IV-E

). Fisher’s least significant difference (LSD) test was used for pairwise comparisons. Post-hoc adjustments such as Bonferroni were not used in this paper given the a) clear hypothesis; b) predefined post-hoc comparisons; c) small number of comparisons; d) early stage of research in this domain (i.e. to avoid missing important findings due to inflated type II error

[47, 48, 49]).

We consider a result to be significant when it yields a value less than . Lastly we report on the statistical power of the results and the effect size. The detailed statistical calculations are reported in table III

. In all graphs throughout the paper the error bars indicate the standard error.

ANOVA for primary task completion time (see Figure 5(a)) showed overall significantly different means between HI variable-autonomy, RI and MI. Pairwise comparison revealed that HI performed significantly worse (i.e. slower completion time) than the other two control modes with . Also MI variable autonomy performed significantly better than RI ().

(a)
(b)
Fig. 6: 5(a): primary task results; average time to completion (blue) and score (green) combining time and collisions penalty. The MI controller outperformed significantly both HI and RI. 5(b): secondary task time-to-completion. Participants when using MI or RI control performed the secondary task as good as in the baseline condition. In all graphs the error bars indicate the standard error.

The effect of control mode on the number of collisions was not significant between RI, MI and HI variable autonomy mode.

We used a primary task score in order to be able to capture the speed-accuracy trade-off that different operators might have (i.e. how fast an operator is driving the robot vs how carefully). The primary task score (see Figure 5(a)) is calculated by adding a time penalty of for every collision, onto the primary task completion time for each participant. This is inspired by the performance scores used in the RoboCup competitions [50]. ANOVA analysis showed that control mode had a significant effect on primary task score. LSD test suggests that HI variable autonomy is significantly () worse than the MI mode and the RI mode ( ). Lastly MI significantly () outperformed RI regarding primary task score.

The average time per trial that the participants took to complete one series of the 3D object rotations, is denoted by the Secondary task completion time (see Figure 5(b)). ANOVA showed that there is a significant difference between the means of secondary task completion times. HI variable autonomy mode performed significantly worse (i.e. more time to complete) than the other modes with . Performance between the baseline trial, the MI mode and RI mode were without any significant difference (, i.e. same level of performance).

The number of secondary task errors (see Figure 6(a)) is the average number of mistake/errors per trial that the participants made during one series of the 3D object rotations. The number of secondary task errors did not have significant differences between the different control modes as shown by ANOVA.

(a)
(b)
Fig. 7: 6(a): secondary task average number of mistakes/errors. 6(b): NASA-TLX score showing the overall trial difficulty/workload as perceived by the operators. RI was perceived as the easiest control mode and HI was perceived as the hardest.

Control mode had a significant effect on NASA-TLX scores (see Figure 6(b)) as shown by ANOVA. Pairwise comparisons showed that RI was perceived by participants as having the lowest difficulty, as compared to HI variable autonomy mode with and MI mode with . HI variable autonomy is perceived as being more difficult than MI ().

Iv-D Results: Human-Robot-Interaction

Similar to our previous work [17] we analyzed the average percentage of time spent in autonomy for each controller. Due to corrupted data for one of the participants, data from 15 participants was used for every result reported on this particular metric. This is contrary to the rest of the analysis in which data from all 16 participants was used. ANOVA did not show any effect caused by control mode in the average percentage of time spent in autonomy.

The number of LOA switches (see Figure 7(a)) performed in each trial denotes the frequency in which operators made use of the variable autonomy controller capabilities in HI; frequency that operators and AI used the variable autonomy controller in MI; and frequency in which AI used the variable autonomy controllers in RI. ANOVA showed that control mode had a significant effect on the number of LOA switches. Pairwise comparisons showed that RI had significantly () fewer LOA switches as compared to HI mode and MI mode. The number of LOA switches of HI and MI control modes were on the same level (i.e. no statistical difference, ). Out of the MI LOA switches, were initiated from the fuzzy MI controller. For completeness we present the histograms illustrating the number of LOA switches in MI and HI (see Figure 7(b) and Figure 9).

(a)
(b)
Fig. 8: 7(a): The average number of LOA switches per control mode. 7(b): Histogram showing the number of human operators who chose to make various different numbers of LOA switches during HI.
(a)
(b)
Fig. 9: 8(a): Histogram showing the number of human operators who chose to make various different numbers of LOA switches during MI. 8(b): Histogram showing the number of human operators and the number of LOA switches initiated by the fuzzy MI controller.

We performed a correlation analysis using a two-tailed Pearson’s to investigate any relationships between different metrics and other variables. The number of LOA switches in HI and the number of LOA switches in MI are highly correlated (). Similar to HI (see [17]) there was no correlation between the number of LOA switches and performance in the primary task score or secondary task completion time in RI and in MI.

No correlation was found between the percentage of time spent in autonomy in HI and MI (); and between the percentage of time spent in autonomy in HI and RI (). No correlation was found between MI and RI percentage of time spent in autonomy ().

The percentage of time spent in autonomy LOA is positively correlated with the primary task score in RI (). Also the same metrics are positively correlated in MI (). As expected given results in [17], no correlation was found between primary task score in HI and percentage of time spent in autonomy in HI ().

Lastly, no correlation was found between time spent in autonomy and secondary task completion time for any of the 3 control modes, HI (as found in [17]), MI, and RI.

Iv-E Discussion

Mixed-Initiative (MI) and Robot-Initiative (RI) outperformed Human-Initiative (HI) in terms of primary task performance. What this shows primarily, in the context of this experiment, is that the fuzzy robot controller (i.e. both in MI and in RI) is capable of successfully measuring human-robot system performance, inferring if a switch in the LOA is needed, and initiating a LOA switch. This is particularly true given that RI performs better than HI, meaning that the fuzzy robot controller initiative is at least as good as operator’s judgment in switching LOA (i.e. compared to the HI of [14]). The fact that MI outperforms RI possibly indicates learning effects in the primary task and the LOA switching. If that is the case, then it is due to the fact that we used the same participants in an identical experimental setup. However, this was necessary for a systematic initial evaluation of the MI controller.

In terms of secondary task performance, MI and RI outperform HI. The secondary task completion time for MI and RI is faster than HI and on the same level of performance as the baseline condition (i.e. secondary task conducted in isolation from the primary task). This can be explained by two possible causes: a) secondary task learning effects; b) the fact that operators were feeling confident to neglect the robot and focus on the secondary task, given its capabilities to take initiative and alert the operator. The latter is reinforced by anecdotal evidence as an informal chat with the participants revealed that most of them trusted the robot to take control and progress towards the navigation goal if needed. As one of the participants put it “even if I completely neglect the robot, at least it will do something meaningful”.

NASA-TLX showed that RI was perceived as marginally easier (i.e. less workload) than MI; and both MI and RI were perceived as significantly easier than HI. This can be an indication of the extra cognitive overhead that operators might have when they need to switch LOA based on judgment (e.g. HI). It is also an indication that having operators knowing that the robot will help them if needed, makes the task seem as easier. This is particularly true about the RI, as reflected in the NASA-TLX scores. Operators were restricted not to switch LOA themselves and thus, were not concerned about the LOA switching, just complied with AI’s initiative. This is in contrast to MI in which operators still had to do some thinking about switching LOA.

There was no statistical difference in the average time spent in autonomy between HI, MI, and RI. Despite that fact, the time spent in autonomy for MI and RI was correlated with the primary task performance, contrary to HI. This possibly has to do with the fact that in MI and RI the time spent in each LOA (i.e. teleoperation and autonomy) was almost equally split. It is an indication that both LOAs can equally contribute towards performance depending on the circumstances and the degradation factors.

Regarding the number of LOA switches in MI and HI, not only they were on the same level but they were also highly and positively correlated. The positive correlation means that participants followed similar patterns in the number of LOA switches both in HI and in MI. For example, participants that switched LOA very frequently in HI, also switched LOA very frequently in MI and vise versa for low frequency. This reinforces the findings of [17] that operators not only switch LOA based on reasons beyond performance, but also that personality traits play potentially a big role. Furthermore, the idea that some of the operator’s LOA switching is redundant, is further reinforced by the fact LOA switches in RI were much less frequent than HI and MI.

Lastly, out of the LOA switches in MI, were initiated by the robot. This means that despite operators’ proven LOA switching capabilities and learning effects, there are circumstances in which the robot is successfully contributing by taking initiative. However, for some participants (4 out of the 16), the robot did not contribute any LOA switch. This means that the fuzzy MI controller is successful at choosing not to switch LOA when a switch is not needed. Given the variety of operator styles observed (i.e. high frequency and low frequency switches), this is not trivial, and is evidence that the controller is able to cope with different driving styles using the same parameter set.

Iv-F Summary of main findings

This experiment offered several contributions and important insights into MI control. First, our expert-guided MI controller, both in MI and in RI conditions positively contributed towards performance and towards overcoming the degrading factors. This is due to the fact that the controller contributed towards timely LOA switching in both conditions (see section IV-D

). This is especially true in the case of the RI condition in which the operator was not allowed to switch LOA. Hence, any learning effects regarding timely LOA switching have not affected the results. The small number of LOA switches in RI, and the small standard deviation compared to the rest of the conditions, is evidence that the controller’s performance in LOA switching was consistent throughout the RI condition.

Second, the experiment provided evidence that when the robot is able to take initiative (i.e. MI and RI conditions), operators experience reduced workload. This is supported by NASA-TLX findings and also by the anecdotal evidence as discussed in Section IV-E. Improved performance in the secondary task compared to HI can be partially explained by the lower workload, but without excluding the potential contribution of learning effects.

A human-factors explanation for the improved performance in MI and RI, both in primary and secondary tasks, as well as for reduced workload, may come from the operator’s attentional model developed by Johnson et al.

[51]. They model attention, workload, and performance across automation mode transitions in aviation and aerospace systems. In the HI condition operators have the extra cognitive overhead of having to switch LOA based on their judgment. MI and RI conditions reduce this overhead. As a result, the operator has more cognitive resources free which in turn can be re-allocated more efficiently via attention in the task at hand after a LOA switch. This improves performance as predicted by the model of Johnson et al. [51].

Lastly, the positive correlation in the number of LOA switches between MI and HI is an important finding suggesting that operators follow the same style in controlling the robot regardless of the fact that the robot can take initiative. This means that the operators have not changed their strategy in order to cope with the new system.

V Experiment 2: evaluation using real robot and test environment

The experiment presented in this section was designed to investigate how well the fuzzy MI controller could generalize. Generalization in this context means that the controller should be able to function well (i.e. deliver similar performance to previous evaluation) in a different setting and under different conditions than the ones that led to its design (e.g. previous simulated environment). This is achieved by conducting an experiment with a real robot in a less controlled setting. More specifically the experiment aims to: a) provide evidence that the MI controller can generalize; b) factor out any learning effects that might have affected the results of the previous experiment; c) shift our experimental paradigm towards more complex, realistic environments and tasks.

V-a Experimental setup - apparatus, robot test arena, and control modes

In the experiment described here, we used identical software; variable autonomy controllers; interface (see Figure 12(a)); and OCU (see Figure 9(b)) to our previous experiment in simulation (see Section IV). The robot used was a Pioneer-3DX equipped with a laser range finder and a camera (see Figure 9(a)). Operators controlled the robot remotely, from a separate location, via the control interface. Any Situation-Awareness (SA) was solely gained from the control interface. The communications link between the robot and the OCU was achieved via WiFi.

(a)
(b)
Fig. 10: 9(a): The pioneer 3DX robot used in the experiment. 9(b): the Operator Control Unit (OCU), composed of a laptop, a joystick, a mouse and a screen showing the control interface. The same OCU was used in all variable autonomy experiments. Note the floor plan in front of the screen, used for the secondary task.

Our system offers the same two LOAs as in our previous experiment: teleoperation and autonomy. Three different control modes were tested in the experiment described here: 1) pure teleoperation, in which the operator was restricted to using only teleoperation LOA; 2) Human-Initiative (HI), in which the operator could dynamically switch between the teleoperation and autonomy LOA using a button press; 3) Mixed-Initiative (MI), in which both the operator and the robot had the ability and authority to dynamically switch between autonomy and teleoperation LOA.

(a)
(b)
Fig. 11: The first floor of School of Computer Science, University of Birmingham building was used as the arena for the USAR experiment. 10(a): The long corridor that connected the search areas (i.e. offices). 10(b): One of the offices used as a search area.

Part of the first floor of School of Computer Science, University of Birmingham building, was used as the arena for this experiment (see Figure 12(b) and Figure 11(a)). More specifically, a long corridor (see Figure 10(a)), 2 offices (see Figure 10(b)) and an open space of approximately square meters in total were used. The experiment took place on weekends and out of hours in order to prevent human activity in the building to be a confounding factor.

(a)
(b)
Fig. 12: 11(a): the floor plan, as kept in the university records, of the area that the experiment took place. This floor plan was printed and given to participants for the secondary task. 11(b): the floor plan of the experiment area annotated by an operator during the secondary task.
(a)
(b)
Fig. 13: 12(a): The control interface as presented to the operator. Left: video feed from the camera, the control mode in use and the status of the navigation goal. Right: The map showing the position of the robot (blue footprint and red arrow), the current goal (blue arrow), the AI planned path (green line), the obstacles’ laser reflections (red) and the walls (black). 12(b): The SLAM map of the arena as displayed to the human operator on the interface. Operators had to navigate in turn from point A; to B; to C; to D; and then back again to point A.

V-B Tasks and performance degradation factors

The overall theme of the experiment was an Urban Search and Rescue (USAR) scenario. In this scenario the robot operator had to remotely control the robot in the search zone (i.e. the building) and identify the positions and the statuses of victims and potential hazards. As often is the case in real operations, we assume that some prior knowledge of the building is given to the robot operator from the authorities or any relevant organization. This is represented in the experiment by the SLAM map that appears on the interface (see Figure 12(b)) and by the floor plan used in the secondary task, as kept in the university’s records (see Figure 11(a)).

The primary task was to navigate the robot between the different areas in the map in a predefined order. From point A to point B, C, D and then back to point A (see Figure 12(b)). In every one of those areas/points (excluding A) one victim and one hazard sign was placed. The victims were represented by stuffed animals. A meerkat represented an alive victim and a teddy-bear a dead victim (see Figure 14(a)). Hazards were represented by 3 commonly used hazard sings for flammable materials, radiation, and bio-hazard (see Fig 14 and Figure 14(b))). Inside the search areas the robot had to stop in the center of the room. Then the operator would have to identify and memorize the position and status of the victim and the sign. Both the victim and the sign were visible from the center of the area and in a radius of degrees around the robot (i.e. the victim and the sign could be anywhere around the central position, but visible); no further exploration was needed.

Fig. 14: Two of the hazard signs used in order to denote bio-hazard and radiation risk.
(a)
(b)
Fig. 15: 14(a): The stuffed animals representing the victims of the USAR scenario. A meerkat was representing a victim which is alive and a teddy-bear a victim which is dead. 14(b): The hazard sign used to denote flammable materials risk.

In addition to the primary task, the operator had to perform a secondary task every time the robot exited one of the search areas. This secondary task was designed to induce additional workload to the operator and degrade performance on the primary task. A pen and a paper portraying the floor plan was placed in-front of the operator (see Figure 9(b)). When the operator was asked to perform the secondary task by the experimenter, he had to sketch on the floor plan paper (see Figure 11(b)): a) the position of the victim denoted by a small x; b) the status of the victim denoted by a letter (A for alive, D for dead); c) the position of the hazard denoted by a small o; d) the status of the hazard denoted by a letter (R - radiation, F - flammable, and B - bio-hazard); e) the path that the robot followed from the previous visited point/area. The pen and the floor plan were placed in front of the operator and in a central position (see Figure 9(b)). This way individual differences regarding handedness (i.e. left vs right handed participants) did not bias the results. Similar tasks to the one described here, are typically required from robot operators in real world disaster response [5] as they are asked to sketch similar information for the SAR team. The speed and ability of the operators to annotate hazards, victims and paths, along with their correct status and respective positions, were the measures of secondary task performance.

The robot performance was degraded by two different degradation factors. The first factor was a box placed in one of the offices. This box was not part of the prior knowledge (i.e. the box was not in the SLAM map). The box would also narrow down the entrance to the office. For these two reasons navigation performance would degrade (i.e. the robot is either moving very slowly or is stuck), as the robot would try to plan a new path through this very tight passage and not through the obstacle. The second performance degradation factor was naturally occurring noise in the laser sensor in certain parts of the arena. This noise was due to shiny surfaces deflecting the laser’s beams and it was not controlled by the experimenter. However, due to the high and semi-systematic frequency of its appearance in the specific area of the map, it was adopted as part of the experimental design. Both of these factors, naturally occurring laser noise and unknown obstacles, often occur in real scenarios as the environments are dynamic. This adds to the realism of the experiment.

Lastly, in one of the offices the WiFi signal was weak. As a result, delays in the control commands and in SA updates in the interface (e.g. location in map, video feedback etc) occurred. This was systematic throughout all of the participants and trials. Hence, it did not constitute a confounding factor, while contributing further to the realism of the experiment.

V-C Participants and experimental design

A total of 12 volunteers participated in a within-groups experimental design, as every participant performed one trial for each of the 3 control modes (i.e. teleoperation, MI, and HI). The order of the three trials was rotated between all the different permutations for different participants. This counterbalancing technique was used in order to prevent learning and fatigue effects from introducing confounding factors to the results, since every participant performed all three trials. Additionally, for the secondary task, the signs, the victims, and their positions were randomized in every trial. Again, this was in order to eliminate any learning effects. A prior experience questionnaire showed, as in our previous experiments, that the majority of the participants were experienced in playing video games. Moreover, 8 out of the 12 participants participated in our previous experiment.

Participants underwent extensive standardized training, similar to our previous experiment. Due to space constraints a sub-region of the experiment’s area was used for the training. In order for the participants to proceed with the experiment they had first to demonstrate their abilities by completing three standardized test trials, one for every control mode. These trials mimicked the actual experimental trials (i.e. same primary and secondary tasks). The training and the test trials ensured that all participants had attained a common minimum skill level in operating the robot.

Participants were instructed to perform the primary task (controlling the robot to search the areas) as quickly and safely (i.e. avoiding collisions) as possible. Additionally they were told that when instructed to perform the secondary task (i.e. annotating information regarding victims, hazards, and paths), they should do it as quickly and as accurately as possible. They were explicitly told that they should give priority to the secondary task over the primary task and should only perform the primary task if the workload allowed. Lastly, participants were told that the best performing individuals in each trial would be rewarded with an extra gift voucher. However, they were not informed about how the best performing participants would be decided, as it would have had them biased towards specific factors. The purpose of this extra gift voucher was to provide an incentive for participants to achieve the best performance possible on both primary and secondary task.

At the end of each trial, participants had to complete an online NASA Task Load Index (NASA-TLX) questionnaire. As in our previous experiments, NASA-TLX was used to rate the level of difficulty and workload the participants experienced during each trial.

V-D Results

A repeated measures one-way ANOVA was used. The independent variable was the control mode with three levels: teleoperation; HI; and MI. In the cases that sphericity assumption was violated a Greenhouse-Geisser correction was used with the ANOVA. For pairwise comparisons after a significant ANOVA result, Fisher’s least significant difference (LSD) test was used to determine the conditions that differed. Similar to the rest of the paper, we consider a result to be significant when it yields a value less than . We also report on the statistical power of the results and on the effect size using . The detailed statistical calculations are reported in table IV.

metric ANOVA control mode effect descriptive statistics
primary task
completion time
, ,
,
HI: ,
MI: ,
teleop: ,
collisions
, ,
, ,
HI: ,
MI: ,
teleop: ,
primary task
score
, ,
,
HI: ,
MI: ,
teleop: ,
secondary task
completion time
, ,
,
HI: ,
MI: ,
teleop: ,
secondary task
errors
, ,
,
HI: ,
MI: ,
teleop: ,
NASA-TLX
scores
, ,
,
HI: ,
MI: ,
teleop: ,
number of
LOA switches
HI: ,
MI: ,
controller in MI: ,
TABLE IV:

table showing ANOVA results and descriptive statistics for all the metrics. For the number of LOA switches, the t-test result is reported.

ANOVA for primary task completion time (see Figure 15(a)) showed overall significantly different means between HI variable-autonomy, MI and teleoperation. Pairwise comparison reveals that HI performed significantly better (i.e. lower mean completion time) than teleoperation (). MI primary task completion time was statistically in the same level as teleoperation and HI (). The effect of control mode on the number of collisions was not significant.

(a)
(b)
Fig. 16: 15(a): primary task mean time-to-completion (green) and score combining time and collisions penalty (blue). HI significantly outperformed teleoperation in primary task. 15(b): mean secondary task completion time. Performance in the secondary task was in the same level for all three control modes. In all graphs the error bars indicate the standard error.

Similar to our previous experiments the primary task score (see Figure 15(a)) was calculated in order to capture any speed-accuracy trade-offs in the primary task. The primary task score was calculated by adding a time penalty of for every collision onto the primary task completion time for each participant. ANOVA analysis showed that control mode had a significant effect on the primary task score. As expected due to the small number of collisions, LSD pairwise tests showed very similar results with the primary task completion time. The HI controller significantly () outperformed pure teleoperation. The primary task score for MI controller was in the same level (i.e. no statistical difference found) as pure teleoperation and HI ().

Secondary task completion time (see Figure 15(b)) refers to the total time per trial that the participants took to complete the full annotation in the floor footprint. A full annotation is defined as a sketch that has annotated positions, statuses and paths for all the three search areas. ANOVA did not suggested a significant difference between the mean secondary task completion times for the different controllers.

(a)
(b)
Fig. 17: 16(a): NASA-TLX score showing the overall trial difficulty-workload as perceived by the operators. Teleoperation was perceived as harder compared to HI and MI. 16(b): Secondary task total number of errors for each trial. No significant differences were observed between the different control modes according to ANOVA.

The secondary task number of errors was measured. A position error was defined an annotation not representing the true position of a victim or hazard with relative accuracy. A status error was defined an annotation not representing the correct status, e.g. victim annotated alive (i.e. meerkat) when is it in-fact dead (i.e. teddy-bear). A path error was defined: a) a path that was not annotated; b) a path that was heavily colliding with the walls in the floor plan; c) a not accurately depicted path. In a real situation the rescue team should be able to find the victims which are alive, be aware of any hazards and their nature. This should happen by following the path and the position annotations provided and by using common sense regarding space orientation. This is represented in our experiment by the secondary task errors. No significant differences were observed between the different control modes with respect to the number of secondary task errors (see Figure 16(b)) according to ANOVA.

Control mode had a significant effect on NASA-TLX scores (see Figure 16(a)) as suggested by ANOVA. Pairwise comparisons showed that teleoperation was perceived as harder (i.e. more workload) compared to HI with and marginally harder than MI with . HI variable autonomy is perceived as having the same difficulty as MI ().

The mean number of LOA switches in HI and in MI were compared using a paired samples t-test. The reason for using a paired samples t-test is that the experiment had a within-groups design, i.e. we expect some correlation on the results given that the same participants performed both conditions. The mean number of LOA switches in MI was significantly higher than the number of LOA switches in HI, as shown by the t-test. Using Pearson’s correlation, no correlation was found in mean LOA switches between HI and MI (). Given that the correlation assumption of paired samples t-test does not hold true, we used a independent samples t-test to validate the result further. Again, the means of HI and MI are significantly different. No correlation was found between the primary task completion time, secondary task completion time, NASA-TLX score and the number of LOA switches in MI.

The mean number of LOA switches in MI due to robot’s initiative is , (). This number of LOA switches due to robot’s initiative is positively correlated () with the primary task completion time. No correlation was found between the secondary task completion time, NASA-TLX score and the mean number of LOA switches in MI due to robot’s initiative. The histograms illustrating the number of LOA switches in MI and HI can be seen in Figure 18 and Figure 19.

Fig. 18: Histogram showing the number of human operators who chose to make various different numbers of LOA switches during HI.
(a)
(b)
Fig. 19: 18(a): Histogram showing the number of human operators who chose to make various different numbers of LOA switches during MI. 18(b): Histogram showing the number of human operators and the number of LOA switches initiated by the fuzzy MI controller.

V-E Discussion

Regarding performance in the primary task, the HI controller outperformed teleoperation both in terms of score and time-to-completion. This evidence further reinforces our prior findings [14] that HI variable autonomy outperforms individual LOAs such as teleoperation and autonomy. Of particular importance is that this evidence came from conducting a real world scenario in which we aimed for realism and did not restrict potential degradation factors (e.g. naturally occurring noise or communication issues), while at the same time having a controlled experiment.

In contrast to our previous experiment (see Section IV), results from the use of MI controller are relatively more difficult to interpret. A trend can be seen as MI performed better than teleoperation in primary task, however the result was not statistically significant. Comparison between HI and MI does not lead to safe conclusions, as no statistical difference was found. We believe this is due to authority conflict between the operator’s and the robot’s initiative regarding LOA switching. This conflict arose from a restriction that the MI controller has. It is designed on the assumption that the agent who is in control (e.g. the robot or the operator) follows relatively close the path yielded from the expert planner. Our experiment was designed to control for exploration strategies by restricting operators in only visiting the center point of the offices/areas, as victims and hazards were visible from that point. However, there were occasions that operators decided to engage in some exploration or follow a less restricted path (i.e. compared to the one the expert planner yielded). For example operators that decided to move closer to a hazard sign in order to see the letters more clearly or in order to improve lighting conditions on the sign. Another example is provided by the operators that struggled to pass through the narrow passage created by the unseen object. In such cases the MI controller inferred a performance drop or a deviation from the navigation goal; as a result robot’s initiative switched to autonomy. At the same time, if operators have not yet finished their action, they would switch back to having control (i.e. teleoperation). This is further reinforced by participants’ feedback. Many of them noted that they generally trusted the MI controller as there were cases in which the controller switched LOA in a meaningful way. However, they felt restricted from driving freely and they also felt the controller was intrusive at times. Others noted that they felt the MI capabilities were redundant, given that HI controller was easy enough to use.

Every time a LOA switch takes place, the fuzzy MI controller re-initializes the error exponential moving average for a time period of . During this period the controller cannot initiate a LOA switch. In essence, this period of time acts as a minimum time between robot controller initiated switches. Despite this period, the conflict for control was not avoided, suggesting that it is happening on a bigger time scale (e.g. several seconds). Instead, this conflict for control between the robot and the operator can be avoided to a large extent by MI controllers that are context-aware. Imagine the following two situations and a MI controller that is not context-aware. In the first situation the robot is idle (i.e. no progress towards the goal) because the operator is neglecting it in order to perform a secondary task. The performance error measured by the controller is the maximum. In the second case the robot is stuck in a corner and the operator is reversing in order to escape the enclosure. Similar to the previous situation the performance error is the maximum. A MI controller that does not take context into account, would initiate a LOA switch in both cases. In the first case the switch into autonomy would be beneficial as the robot was idle. In the second case it can potentially lead to a collision as the operator is in the middle of a maneuver. It can also lead to a control conflict as the operator can try to take control back in order to complete the maneuver. A context-aware controller would have been able to distinguish that although in both cases the performance error was large, the situation was different. Hence, in the second case a LOA switch wouldn’t have happened. Our MI controller in this specific example, is aware that the large error while the robot is reversing, potentially means maneuvering to escape an entrapment (see fuzzy rule base in Section III-C). However, with cases such the ones discussed in the previous paragraph (e.g. operator performing exploration), our controller is not able to cope. We identify context awareness as a major challenge for robotic MI systems. We believe that fuzzy controllers can help towards this direction as they can be expanded with expert knowledge regarding context. Lastly, system transparency is another factor that might positively contribute towards tackling LOA switching conflicts.

The above evidence suggesting a control conflict is in accordance with the fact that MI had significantly more LOA switches compared to HI. In contrast to the results of Section IV in which LOA switches of MI and HI were highly correlated, in the experiment reported here no correlation was found. This further reinforces the notion that possibly these extra LOA switches stem from the robot-operator conflict for control as they would switch LOA back and forth. Similarly, a number of LOA switches initiated from the robot might be due to this control conflict. Possible evidence might come from the positive correlation found between the mean number of robot’s initiated LOA switches in MI and the primary task completion time. The more LOA switches the controller initiated, the more time it took participants to complete the primary task. This is either an indication of a conflict for control degrading performance, or it means that the MI controller is more active in switching LOA for the under-performing participants (i.e. they need more help from the robot).

Further HRI studies are required in order to investigate the phenomenon better as this conflict is a major challenge to overcome for MI systems. Lastly, the fact that the number of LOA switches is relatively high for both HI and MI, is in accordance with the findings of our previous two experiments (see [17] and Section IV). These findings suggested that the high number of LOA switches was due to reasons beyond performance (e.g. personality traits).

The secondary task performance (i.e. time-to-completion and number of errors) were on the same level for all three control modes. This evidence suggests that LOA switching capabilities did not have any effect on the secondary task performance. This is similar to the evidence of our previous work [14], in which teleoperation and HI performed equally on the secondary task. It also reinforces the possibility that the improved secondary task performance on our initial MI controller evaluation (compared to HI - see Section IV

), was due to learning effects. However because: a) the secondary task on the previous experiment is different from our current task; and b) the statistical power (i.e. the probability that a significant difference will be found if it exists) on our number of errors calculations is low; a safe conclusion cannot be reached.

Regarding the difficulty/workload of the trials, NASA-TLX showed that teleoperation was perceived as the most difficult control mode compared to HI and MI. This suggests, similar to the findings of [14], that the use of variable autonomy can alleviate operators from the burden of control. Perceived difficulty for MI and HI was on the same level according to the pairwise comparisons. This is in contrast to the findings of Section IV in which MI found to be easier than HI. However, due to the low statistical power, the possibility of MI and HI differing is not excluded.

Lastly, a possible explanation for performance and workload not been fully in agreement with the first experiment can be given by the model of Johnson et al. [51] (similar to the discussion in Section IV-E). At times, operators were allocating much of their cognitive resources to deal with the conflict for control. Hence, part of the performance degradation and workload arises from not been able to re-allocate their attention efficiently to the tasks.

Vi Limitations, insights and future work

The main limitation of the expert-guided MI controller presented here is that is not able to cope with the conflict for control. Two of the assumptions in which the controller was based are: a) the human operator is willing to be handed or to hand over control based on the robot initiative; and b) the agent to which the control will be handed is capable of correcting the task effectiveness degradation. A conflict for control arises when one or both of these assumptions are violated. Given the evidence presented in detail in the previous sections, we consider this conflict for control the main factor affecting the performance of MI in the second experiment compared to that of the first simulated experiment.

In our first simulated experiment both the primary and the secondary tasks were more restricted compared to the tasks in the second real world experiment. Hence, the operators’ actions remained within the experiment’s controller’s working envelope (i.e. relatively simple navigation). This resulted in the controller being able to generate a timely LOA switch when needed; and most importantly not to switch LOA when that would be unnecessary or intrusive. This in turn allowed the operators to trust the controller. Hence, the two above assumptions held true. In our second real world experiment operators in many cases deviated from the navigational goal which they had given to the robot in order to explore further or move freely. The controller lacked the knowledge of the new goal (e.g. next to the hazard sign which the operator wanted to inspect instead of the center of the office), or the context of operator’s commands. Due to this lack of knowledge the controller would initiate a LOA switch believing that the operator was not moving towards the goal, while instead the navigation goal had been changed. The operators found this intrusive and hence switched LOA, leading to the violation of both assumptions and to performance degradation.

In essence the conflict for control arises because the robot and the operator have the authority to aggressively override each other’s actions even in cases when they are not necessarily correct. When the robot incorrectly overrides the operator’s actions it is due to: a) an incorrect inference about the operator’s intention (e.g. lack of knowledge of the current goal); or b) a lack of contextual knowledge regarding the task and the operator’s action (i.e. in which context a specific action is taking place). When the operator overrides robot actions in the problematic cases, is mostly because he perceives the robot initiative as intrusive and aggressive and hence he does not want to hand over control. This is similar to the findings of Dragan and Srinivasa [24] in shared control. They found that when the robot is aggressive (i.e. intrusive) and wrong (i.e. takes a wrong action) it negatively affects performance and also user preference. However, in shared control any potential conflict for control is of a different nature as the operator cannot directly switch LOA and take over.

Regarding the expert-guided MI control approach we propose in this paper, further investigation is needed to tackle some of the confounding factors. More specifically a new experiment could be conducted in a real environment focused on comparing HI with RI and MI without a teleoperation condition. The experimental design should: a) use a higher number of participants; b) use a bigger robot arena; c) adjust the primary and secondary tasks to make them more difficult. Particular attention should be given to how to minimize the conflict for control either via the experimental design or via extending the MI controller.

An important direction for future work is how to make MI controllers more aware of context. The evidence presented in this paper suggests that this would positively impact the conflict for control. For example, a controller capable of predicting operator’s intention (e.g. new navigational goal) would be less likely to initiate an intrusive or erroneous LOA switch.

Lastly, a mathematical formalism for MI control in the context of LOA switching, similar to the one of Dragan and Srinivasa for shared control [24] is needed. This, in conjunction with the MI control taxonomy of [30] would give a solid foundation and a common understanding for future research.

Vii Conclusion and impact

This paper presented an expert-guided approach to designing MI controllers. It also presented a novel MI controller and its experimental evaluation using both a high fidelity simulator and in a realistic scenario.

The proposed controller uses an online performance metric that represents the effectiveness of goal directed motion, with parameters determined by comparing the controller to human performance on a prior experiment. The proposed controller was able to measure human-robot team performance, infer if a LOA switch was needed, and switch LOA.

Evidence from our initial evaluation showed the potential advantages of MI control. The MI controller was found to outperform HI control in both primary and secondary task performance. This in turn means that MI outperforms control modes that lack any LOA switching capabilities such as teleoperation and autonomy.

The second evaluation extended our experimental framework towards a less controlled and more realistic setting. The MI controller was used in a real robot performing in a USAR scenario. Results regarding performance advantages of MI over HI were less conclusive compared to the initial evaluation. However, the experiment yielded significant new insight into challenges and problems which need to be overcome in the design of MI systems. The difficulty to interpret the results was partially due to variance and poor statistical power in some of the statistical calculations. However, the major confounding factor in the results was the conflict for control that arose between the operator and the AI. This led operators and the AI to confusion on what they should do as the LOA was switching back and forth between autonomy (i.e robot in control) and teleoperation (i.e. human in control). We believe this control conflict is one of the major challenges for future MI control research and that the addition of further context-awareness (of task, environment, workload, progress etc.) is one way to address this problem. However, more generally, MI control has its merits since it provides redundancy in the cases where an operator might not be able to switch LOA if needed e.g. during loss of communication with the robot, or a sudden event impairing the operator.

Lastly, the USAR experiment provided important real world evidence that variable autonomy control in the form of HI significantly outperforms teleoperation in a navigation task. This is in accordance to our previous findings in [14] and denotes that human operators successfully use HI capabilities to overcome various performance degrading factors and situations. Additionally, further evidence was provided to support the hypothesis that operators switch LOA for reasons other than performance.

Overall, we believe that this paper has made a number of significant contributions to the MI research on mobile robots: a) proposed a framework for designing MI controllers; b) proposed a novel MI system; c) provided evidence on the benefits of MI control while identifying the shortcomings that constitute open challenges for the research field.

Acknowledgment

This work was funded by the British Ministry of Defence via the Defence Science and Technology Laboratory (Dstl), under their PhD bursary scheme, contract no. DSTLX-1000074621. It was also supported by the UK’s Engineering and Physical Sciences Research Council (EPSRC) under grants: the National Centre for Nuclear Robotics, EP/R02572X/1; and related grants EP/M026477/1, EP/P017487/1 EP/P01366X/1. Rustam Stolkin was partly funded by a Royal Society Industry Fellowship.

References

  • [1] J. L. Casper and R. R. Murphy, “Human-Robot Interactions During the Robot-Assisted Urban Search and Rescue Response at the World Trade Center,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 33, no. 3, pp. 367–385, 2003.
  • [2] Spectrum, “Fukushima robot operator writes tell-all blog,” http://spectrum.ieee.org/automaton/robotics/industrial-robots/fukushima-robot-operator-diaries, 2011, accessed: 2016-12-22.
  • [3] L. D. Dole, D. M. Sirkin, R. R. Murphy, and C. I. Nass, “Robots Need Humans in the Loop to Improve the Hopefulness of Disaster Survivors,” in IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN), 2015, pp. 707–714.
  • [4] K. Nagatani, S. Kiribayashi, Y. Okada, K. Otake, K. Yoshida, T. Nishimura, T. Yoshida, E. Koyanagi, S. Tadokoro, M. Fukushima, and S. Kawatsuma, “Emergency Response to the Nuclear Accident at the Fukushima Daiichi Nuclear Power Plants using Mobile Rescue Robots,” Journal of Field Robotics, vol. 30, no. 1, pp. 44–63, 2013.
  • [5] R. R. Murphy, “Human-Robot Interaction in Rescue Robotics,” IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 34, no. 2, pp. 138–153, may 2004.
  • [6] J. Y. C. Chen, E. C. Haas, and M. J. Barnes, “Human Performance Issues and User Interface Design for Teleoperated Robots,” Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 6, pp. 1231–1245, nov 2007.
  • [7] H. Yanco, M. Baker, R. Casey, and B. Keyes, “Analysis of human-robot interaction for urban search and rescue,” in IEEE International Workshop on Safety, Security and Rescue Robotics., 2006, pp. 22–24.
  • [8] M. Baker, R. Casey, B. Keyes, and H. A. Yanco, “Improved interfaces for human-robot interaction in urban search and rescue,” in IEEE International Conference on Systems, Man and Cybernetics (SMC), vol. 3, 2004, pp. 2960–2965.
  • [9] D. G. Caldwell, K. Reddy, O. Kocak, and A. Wardle, “Sensory requirements and performance assessment of tele-presence controlled robots,” in IEEE International Conference on Robotics and Automation (ICRA), 1996, pp. 1375–1380.
  • [10] D. G. Caldwell, A. Wardle, and M. Goodwin, “Tele-presence: visual, audio and tactile feedback and control of a twin armed mobile robot,” in IEEE International Conference on Robotics and Automation (ICRA), 1994, pp. 244–249.
  • [11] H. A. Yanco, A. Norton, W. Ober, D. Shane, A. Skinner, and J. Vice, “Analysis of Human-robot Interaction at the DARPA Robotics Challenge Trials,” Journal of Field Robotics, vol. 32, no. 3, pp. 420–444, 2015.
  • [12] R. R. Murphy and J. L. Burke, “Up from the Rubble: Lessons Learned about HRI from Search and Rescue,” Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting, vol. 49, no. 3, pp. 437–441, 2005.
  • [13] J. L. Burke, R. R. Murphy, M. D. Coovert, and D. L. Riddle, “Moonlight in Miami: Field study of human-robot interaction in the context of an urban search and rescue disaster response training exercise,” Human-Computer Interaction, vol. 19, no. 1-2, pp. 85–116, 2004.
  • [14] M. Chiou, G. Bieksaite, R. Stolkin, N. Hawes, K. L. Shapiro, and T. S. Harrison, “Experimental analysis of a variable autonomy framework for controlling a remotely operating mobile robot,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 3581–3588.
  • [15] T. B. Sheridan and W. L. Verplank, “Human and computer control of undersea teleoperators,” MIT Man-Machine Systems Laboratory, 1978.
  • [16] M. Chiou, N. Hawes, R. Stolkin, K. L. Shapiro, J. R. Kerlin, and A. Clouter, “Towards the Principled Study of Variable Autonomy in Mobile Robots,” in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, 2015, pp. 1053–1059.
  • [17] M. Chiou, G. Bieksaite, N. Hawes, and R. Stolkin, “Human-Initiative Variable Autonomy: An Experimental Analysis of the Interactions Between a Human Operator and a Remotely Operated Mobile Robot which also Possesses Autonomous Capabilities,” in AAAI Fall Symposium Series: Shared Autonomy in Research and Practice, 2016, pp. 304–310.
  • [18] J. L. Marble, D. J. Bruemmer, D. A. Few, and D. D. Dudenhoeffer, “Evaluation of supervisory vs. peer-peer interaction with human-robot teams,” in Proceedings of the 37th Hawaii International Conference on System Sciences, vol. 00, no. C, 2004, pp. 1–9.
  • [19] D. J. Bruemmer, D. A. Few, R. L. Boring, J. L. Marble, M. C. Walton, and C. W. Nielsen, “Shared Understanding for Collaborative Control,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 35, no. 4, pp. 494–504, jul 2005.
  • [20] D. A. Few, D. J. Bruemmer, and M. Walton, “Improved Human-Robot Teaming through Facilitated Initiative,” in IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), sep 2006, pp. 171–176.
  • [21] D. J. Bruemmer, R. L. Boring, D. A. Few, L. Julie, and M. C. Waiton, ““I Call Shotgun !”: An Evaluation of Mixed-Initiative Control for Novice Users of a Search and Rescue Robot,” in IEEE International Conference on Systems, Man and Cybernetics (SMC), 2004, pp. 2847–2852.
  • [22] C. W. Nielsen, D. A. Few, and D. S. Athey, “Using mixed-initiative human-robot interaction to bound performance in a search task,” in IEEE International Conference on Intelligent Sensors, Sensor Networks and Information Processing, 2008, pp. 195–200.
  • [23] T. Carlson, R. Leeb, R. Chavarriaga, and J. D. R. Millán, “Online modulation of the level of assistance in shared control systems,” in IEEE International Conference on Systems, Man and Cybernetics (SMC), no. 2, 2012, pp. 3339–3344.
  • [24] A. D. Dragan and S. S. Srinivasa, “A policy-blending formalism for shared control,” International Journal of Robotics Research, vol. 32, no. 7, pp. 790–805, 2013.
  • [25] M. C. Gombolay, R. A. Gutierrez, S. G. Clarke, G. F. Sturla, and J. A. Shah, “Decision-making authority, team efficiency and human worker satisfaction in mixed human–robot teams,” Autonomous Robots, vol. 39, no. 3, pp. 293–312, 2015.
  • [26] M. A. Goodrich, D. R. Olsen Jr., J. W. Crandall, and T. J. Palmer, “Experiments in Adjustable Autonomy,” in IJCAI Workshop on Autonomy, Delegation, and Control: Interacting with Autonomous Agents, 2001, pp. 1624–1629.
  • [27] M. Baker and H. A. Yanco, “Autonomy mode suggestions for improving human-robot interaction,” in IEEE International Conference on Systems, Man and Cybernetics (SMC), vol. 3, 2004, pp. 2948–2953.
  • [28] J. Shen, J. Ibanez-Guzman, T. C. Ng, and B. Seng Chew, “A collaborative-shared control system with safe obstacle avoidance capability,” in IEEE Conference on Robotics, Automation and Mechatronics, vol. 1, 2004, pp. 119–123.
  • [29] T. Carlson and Y. Demiris, “Collaborative Control in Human Wheelchair Interaction Reduces the Need for Dexterity in Precise Manoeuvres,” in Robotic Helpers: User Interaction, Interfaces and Companions in Assistive and Therapy Robotics, a Workshop at ACM/IEEE HRI 2008, 2008, pp. 59–66.
  • [30] S. Jiang and R. C. Arkin, “Mixed-Initiative Human-Robot Interaction: Definition , Taxonomy , and Survey,” in IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2015, pp. 954–961.
  • [31] A. Finzi and A. Orlandini, “Human-Robot Interaction through Mixed-Initiative Planning for Rescue and Search Rovers,” in

    Congress of the Italian Association for Artificial Intelligence

    , 2005, pp. 483–494.
  • [32] M. Gombolay, A. Bair, C. Huang, and J. Shah, “Computational design of mixed-initiative human–robot teaming that considers human factors: situational awareness, workload, and workflow preferences,” International Journal of Robotics Research, vol. 36, no. 5-7, pp. 598–617, 2017.
  • [33] D. J. Bruemmer, J. L. Marble, D. D. Dudenhoeffer, M. O. Anderson, and M. D. Mckay, “Mixed-initiative control for remote characterization of hazardous environments,” in 36th Annual Hawaii International Conference on System Sciences, 2003, pp. 9 pp.–.
  • [34] J. A. Adams, P. Rani, and N. Sarkar, “Mixed initiative interaction and robotic systems,” AAAI Workshop on Supervisory Control of Learning and Adaptive Systems, vol. Tech. Rep., no. WS-04-08, pp. 6–13, 2004.
  • [35] V. Manikonda, P. Ranjan, Z. Kulis, and C. Drive, “A Mixed Initiative Controller and Testbed for Human Robot Teams in Tactical Operations,” in AAAI Fall Symposium, 2007, pp. 92–99.
  • [36] B. Hardin and M. A. Goodrich, “On Using Mixed-Initiative Control : A Perspective for Managing Large-Scale Robotic Teams,” in The 4th ACM/IEEE international conference on human robot interaction (HRI), 2009, pp. 165–172.
  • [37] E. Marder-Eppstein, E. Berger, T. Foote, B. Gerkey, and K. Konolige, “The office marathon: Robust navigation in an indoor office environment,” in IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 300–307.
  • [38] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, no. 1, pp. 269–271, 1959.
  • [39] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 22–33, 1997.
  • [40] R. G. Brown, Smoothing, forecasting and prediction of discrete time series.   Courier Corporation, 1963.
  • [41] E. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” International Journal of Man-Machine Studies, vol. 7, no. 1, pp. 1–13, 1975.
  • [42] F. Nagi, L. Perumal, and J. Nagi, “A new integrated fuzzy bang-bang relay control system,” Mechatronics, vol. 19, no. 5, pp. 748–760, 2009.
  • [43] J. Rada-Vilela, “fuzzylite: a fuzzy logic control library in C++,” 2013.
  • [44] M. Chiou, “Expert-guided mixed-initiative fuzzy controller code,” https://github.com/ManolisCh/fuzzy_mi_controller_repo, 2019.
  • [45] G. Ganis and R. Kievit, “A New Set of Three-Dimensional Shapes for Investigating Mental Rotation Processes : Validation Data and Stimulus Set,” Journal of Open Psychology Data, vol. 3, no. 1, 2015.
  • [46] D. Sharek, “A Useable, Online NASA-TLX Tool,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 55, no. 1, pp. 1375–1379, 2011.
  • [47] D. L. Streiner and G. R. Norman, “Correction for Multiple Testing: Is There a Resolution?” CHEST Journal, vol. 140, no. 1, p. 16, 2011.
  • [48] K. J. Rothman, “No Adjustments Are Needed for Multiple Comparisons,” Epidemiology, vol. 1, no. 1, pp. 43–46, 1990.
  • [49] T. V. Perneger, “What’s wrong with Bonferroni adjustments.” BMJ (Clinical research ed.), vol. 316, no. 7139, pp. 1236–8, 1998.
  • [50] A. Jacoff, E. Messina, B. Weiss, S. Tadokoro, and Y. Nakagawa, “Test arenas and performance metrics for urban search and rescue robots,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3, 2003, pp. 3396–3403.
  • [51] A. W. Johnson, K. R. Duda, T. B. Sheridan, and C. M. Oman, “A closed-loop model of operator visual attention, situation awareness, and performance across automation mode transitions,” Human Factors, vol. 59, no. 2, pp. 229–241, 2017.