Put the Bear on the Chair! Intelligent Robot Interaction with Previously Unseen Objects via Robot Imagination

08/12/2021 ∙ by Hongtao Wu, et al. ∙ National University of Singapore Johns Hopkins University 0

In this letter, we study the problem of autonomously placing a teddy bear on a previously unseen chair for sitting. To achieve this goal, we present a novel method for robots to imagine the sitting pose of the bear by physically simulating a virtual humanoid agent sitting on the chair. We also develop a robotic system which leverages motion planning to plan SE(2) motions for a humanoid robot to walk to the chair and whole-body motions to put the bear on it, respectively. Furthermore, to cope with the cases where the chair is not in an accessible pose for placing the bear, a human-robot interaction (HRI) framework is introduced in which a human follows language instructions given by the robot to rotate the chair and help make the chair accessible. We implement our method with a robot arm and a humanoid robot. We calibrate the proposed system with 3 chairs and test on 12 previously unseen chairs in both accessible and inaccessible poses extensively. Results show that our method enables the robot to autonomously put the teddy bear on the 12 unseen chairs with a very high success rate. The HRI framework is also shown to be very effective in changing the accessibility of the chair. Source code will be available. Video demos are available at https://chirikjianlab.github.io/putbearonchair/.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Object affordances describe potential interactions with an object [gibson1979ecological]

. They play an important role in human perception of real-world objects with tremendous variance of appearances

[dicarlo2012does]. Object affordances are very related to the functionality of an object. For example, a chair is able to afford the functionality of sitting. Thus, it possesses the sitting affordance. For robots operating in unstructured environments and interacting with previously unseen objects, e.g., personal robots, the understanding of object affordances is very desirable. It allows robots to access unseen objects and understand how to interact with them efficiently and intelligently. This further brings about the possibility of more natural and intelligent human-robot interactions. Reasoning object affordances can also help robots discover the potential of an object to afford a functionality, despite it not being a typical instance of the object class to which the functionality is related (e.g., the improvised chair assembled with books and boxes in Fig. 5(d)). Independently identifying affordances represents a higher level of robot intelligence.

Fig. 1: Overview. (a) The robot imagines the sitting pose for a previously unseen chair and seats the bear on the chair. A robot arm with a mounted depth camera scans the chair and tracks the chair and the humanoid robot. The frame at the lower left corner indicates the base frame of the robot arm which is used as the world frame throughout the paper. (b) Result of putting the bear on the chair. (c) The humanoid robot retrieves its hands and uprights its body after putting the bear on the chair.

To address the problem of affordance reasoning in robotics, we introduced the interaction-based definition (IBD) which defines objects from the perspective of interactions instead of appearances as in our previous work [wu2020chair]. In particular, IBD defines a chair as “an object which can be stably placed on a flat horizontal surface in such a way that a typical human is able to sit (to adopt or rest in a posture in which the body is supported on the buttocks and thighs and the torso is more or less upright) stably above the ground”.

Fig. 2: Pipeline

. The chair is randomly placed in front of the robot arm for 3D scanning which reconstructs the 3D model of the chair. The three snapshots in the sitting imagination are the results of three sittings trials. Only the left one with a check is classified as a

correct sitting. The instruction given by the robot in the human-robot interaction is ”Please rotate the chair about the vertical axis counterclockwise for 90 degrees!”

In this paper, we go beyond our previous work. We propose a novel method for robots to imagine how to sit on an unseen chair, i.e., the sitting pose for the chair. We also develop a robotic system, consisting of a Franka Emika Panda robot arm and a NAO humanoid robot, to actualize the understanding of the sitting affordance by autonomously putting a teddy bear on the chair (Fig. 1). To “put the bear on the chair”, the robot needs to carry the bear to the chair and seat the bear on it. To accomplish this task, the robot first needs to understand the sitting affordance of the chair, i.e., how the chair can be sat. Also, the robot needs to plan a whole-body motion to put the bear on the chair while satisfying several constraints on kinematics, stability, and collision. Moreover, if the chair is in an inaccessible pose (e.g., when the chair is facing a wall), the robot should be able to reason the accessibility and understand how to make the chair accessible. In this paper, we introduce a human-robot interaction (HRI) framework with closed-loop visual feedback to enable the robot to instruct a human to rotate the chair and make the chair accessible for putting the bear. Fig. 2 shows the pipeline of our method and details can be found in Sec. IV. Results show that our method enables the robot to autonomously put the bear on 12 previously unseen chairs with diverse shapes and appearances in 72 trials, including both accessible and inaccessible chair poses, with a success rate of . Also, our HRI framework is shown to be very effective. It is able to successfully change the accessibility of the chair for placing the bear with a success rate in 36 trials with inaccessible poses. To our knowledge, this work is the first to physically seat an articulated agent on an unseen chair in the real world. The aim of this work is to examine how well the robots are able to accomplish this challenging task by leveraging different components of robotics. We envision promising future applications of our method for robots operating in household environments and interacting with humans.

This work differs from our previous work [wu2020chair] in various ways. Rather than classifying whether an unseen object is a chair or not, in this work, we assume the given object is known as a chair a priori, as such can be determined from our previous work. The robot imagines how to sit on the chair, i.e., the sitting pose. Also, unlike [wu2020chair] which focuses on perceiving the object affordance without real-robot experiments, this work goes beyond by actualizing the understanding of affordances with physical experiments. The main contributions of the paper are as follows:

  • A method for robots to imagine the sitting pose for a previously unseen chair.

  • A human-robot interaction framework with closed-loop visual feedback to enable the robot to reason the accessibility of a chair and interact with a human to change the accessibility if necessary.

  • A robotic system which is able to autonomously put a teddy bear on a previously unseen chair for sitting.

Ii Related Work

Object Affordance of Chairs. The class of chairs is an important object class in human life. The exploration of sitting affordance has become popular in recent decades [hinkle2013predicting, grabner2011makes, seib2016detecting, wu2020chair, bar2006functional]. Hinkle and Olson [hinkle2013predicting] simulate dropping spheres onto objects and classify objects into containers, chairs, and tables based on the final configuration of spheres. Grabner et al. [grabner2011makes] fit a humanoid mesh onto an object mesh and exploit the vertex distance and the triangle intersection to evaluate the object affordance as a chair for chair classification. Instead of object classification, we use object affordances to enable robots to understand the interaction with a chair and showcase the understanding with real-robot experiments.

Affordance Reasoning.

There is a growing interest in reasoning object affordances in the field of computer vision

[myers2015affordance, sawatzky2017weakly, roy2016multi, ruiz2020geometric, zhu2014reasoning, ho1987representing, desai2013predicting] and robotics [stoytchev2005behavior, do2018affordancenet, chu2019learning, wu2020can, abelha2017learning, piyathilaka2015affordance]. Stoytchev [stoytchev2005behavior] introduces an approach to ground tool affordances via dynamically applying different behaviors from a behavioral repertoire. [do2018affordancenet, chu2019learning, sawatzky2017weakly, roy2016multi, desai2013predicting]

use convolutional neural networks (CNNs) to detect regions of affordance in an image. Ruiz and Mayol-Cuevas

[ruiz2020geometric]

predict affordance candidate locations in environments via the interaction tensor. In contrast to detecting different types of affordances, we focus on understanding the sitting affordance and use it for real-robot experiment.

Physics Reasoning. Reasoning of physics has also been introduced to the study of object affordances [wu2020chair, wu2020can, battaglia2013simulation, zhu2015understanding, zhu2016inferring, kunze2017envisioning]. Battaglia et al. [battaglia2013simulation] introduce the concept of intuitive physics engine which simulates physics to explain human physical perception of the world. Zhu et al. [zhu2016inferring] employ physical simulations to infer forces exerted on humans and learn human utilities from videos. The digital twin [boschert2016digital] makes use of physical simulations to optimize operations and predict failures in manufacturing. In contrast, we do not infer physics or use it to predict outcomes, but exploit physics to imagine potential interactions with objects to perceive object affordances.

Iii Methods

Iii-a Problem Formulation

Throughout the paper, we assume the given previously unseen chair is upright and the back of the agent (or the bear) is supported while sitting. The virtual agent in the imagination is an articulated humanoid body (Fig. 3(a)). To put the agent (the bear) on the chair, we want to find 1) the sitting pose where and specify the position and rotation of the agent’s (the bear’s) base link in the world frame and 2) the joint angles of the agent (the bear). However, the joints of the bear can be considered fixed111The teddy bear is a plush toy. The joints are not rigidly fixed but have large damping coefficients. and the joint angles are already close to those of a sitting configuration (Fig. 1(b)). Thus, in this paper, we simplify the problem to just finding .

According to the interaction-based definition (IBD) of chairs (Sec. I), the torso is more or less upright when sitting. Thus, we further simplify the problem by restricting . denotes the rotation about the z-axis of the world frame and is the yaw angle of the base link. denotes the initial rotation which sets the agent to an upright sitting configuration with its face facing towards the x-axis of the world frame. That is, given an unseen chair, the problem becomes that of finding the position and the yaw angle of the base link in the world frame for sitting. We denote the direction indicated by in the xy-plane as the sitting direction. The base link of the agent is its pelvis link.

Iii-B Revisit of Sitting Affordance Model (SAM)

In the sitting imagination (Sec. III-C), we simulate a virtual humanoid agent sitting on the chair in physical simulations. The resultant configuration of the agent at the end of the simulation is denoted as . The sitting affordance model (SAM) [wu2020chair] evaluates whether is a correct sitting by comparing it with a key configuration . We briefly review SAM here and further details can be found in [wu2020chair].

The evaluation is based on four criteria. 1) Joint Angle Score. The joint angles of a configuration

can be described with a vector

where denotes the total number of the agent’s joints. The joint angle score is defined as . denotes the weight of the -th joints. () is the -th element of (). Lower is better. 2) Link Rotation Score. According to the IBD, the torso is more or less upright. Thus, SAM considers the link rotation in the world frame when evaluating . The link rotation score is defined as . is the weight for the -th link. () is the z-axis unit vector of the -th link frame in (). Lower is better. 3) Sitting Height. Sitting height is also an important factor in sitting. SAM takes into account in the evaluation of . 4) Contact. SAM also counts the number of contact points of the agent’s links. The number of contact points of all the body links can be described with a vector where denotes the total number of links.

is classified as a correct sitting if the following are all satisfied: , , , . , , , and are four thresholds. is a binary function which outputs 1 if 1) the total number of contact points is larger than some thresholds and 2) the number of contact points for lower and upper body links are both larger than zeros, and 0 if otherwise.

Fig. 3: Sitting imagination. (a) Sitting imagination setting. (b) OBB transformation . (c) and (d) show the snapshot at the beginning and the end of the sitting imagination, respectively. The checks and crosses in (d) indicate whether the sitting is correct or not. (e) shows the chair used for imagination in (c) and (d). (f), (g), and (h) show the sitting poses obtained from three rotations . for all of them. The black dot and the red arrow indicate the sitting position and the yaw angle , respectively. (f) has the smallest value of and thus is selected as the imagined sitting pose for the robot to place the bear.

Iii-C Sitting Imagination

We physically simulate the agent sitting on the chair in different rotations to find and for sitting (Fig. 3). Given the 3D model of the chair, we first compute the minimum volume oriented bounding box (OBB) [trimesh]. As in [wu2020chair], we then apply a rigid body transformation to the model (Fig. 3(b)). horizontally translates the chair to align the OBB center with the origin of the world frame in the xy-plane. rotates the chair about the z-axis to align the OBB with the coordinate system of the world frame. We apply

because we notice that the back of the chair is heuristically coincident with one of the OBB faces which benefits the finding of correct sittings in the imagination. After applying

, we attach the world frame as the body frame of the chair.

The rotation of the chair in the simulation is enumerated by setting and (Fig. 3(c)). Note that is a rotation about the z-axis in the world frame. We drop the agent from above the chair to simulate sitting (Fig. 3(d)). Unlike [wu2020chair] which simulates drops for different chair rotations one-by-one, we simulate them simultaneously to reduce the runtime. Before the drop, the agent is set to a pre-sitting configuration facing the x-axis (Fig. 3(a)). The base link of the agent is placed on a plane 15cm above the chair OBB. For each rotation, we first sample three positions on the plane to drop: the origin and two positions with a translation of along the x-axis from the origin, respectively. is scaled linearly with respect to the size of the OBB. If no more than one correct sitting is found for all rotations, four extra positions are sampled to drop: positions with and translations along the x-axis from the origin, respectively. The reason we start sampling drops around the origin of the plane, which is aligned with the center of the OBB horizontally, is that most chairs have their seats positioned close to the center of the OBB. However, for some chairs, the seats are closed to the OBB peripheral. Thus, if not enough correct sittings can be found, our search expands towards the peripheral. In total, for a chair, we simulate 24 drops if no extra drops is needed and 56 drops if otherwise.

The rotation with the largest number of correct sittings is selected as the best rotation for sitting. If more than one rotations have the largest (Fig. 3(f)(g)(h)), we select the one with the smallest averaged value of (lower is better). The sitting pose of the agent in the world frame is:

(1)
(2)

where and the inverse ; and are the weighted average of the agent’s position and yaw angle relative to the chair frame of all the correct sittings with . The weight is the reciprocal value of of the sitting.

Fig. 4: Motion Planning. (a) The motion of putting the bear on the chair. (b) After putting down the bear, the robot first releases the bear by opening its hands. Then, it retrieves the hands and uprights the body. (c) The setting of the planning module in real and rviz environment. The red dash lines in the top figure indicate the planning arena in the real world. The yellow ellipsoid in the bottom figure shows the goal.

Iii-D Motion Planning

Optimal Control Module. We assume the motion of placing the bear is quasi-static. Due to the complexity of the motion constraints, we formulate the planning of this motion as a trajectory optimization problem [han2020can]. We use the direct collocation method [tsang1975optimal] to solve this problem:

(3)
subject to (4)
(5)
(6)
(7)
(8)
Collision constraints (9)

where ; and denote the state of the system (joint angles) and the control inputs at the -th time interval, respectively. The quasi-static assumption simplifies the state transition as Eq. (4). And Eq. (5) limits the magnitude of the control inputs to satisfy the quasi-static assumption. Eq. (7) confines the robot’s center of mass (COM) horizontal projection to be within the supporting polygon formed by the feet to ensure the stability of the robot. Eq. (9) ensures that the trajectory is collision-free.

The initial configuration is a pre-defined standing posture (Fig. 4(a)). The goal configuration is generated via a constrained optimization [han2020can]:

(10)
subject to (11)
(12)
(13)
Collision constraints (14)

The cost (Eq. (10)) aims to minimize 1) which is the distance between the COM horizontal projection and the center of the supporting polygon and 2) which is the bending angle of the torso. A less bending torso exerts less torque on the motors. This makes uprighting the body easier after finishing placing the bear (Fig. 4(b)). is the forward kinematics of the robot. Eq. (13) ensures that the robot reaches the goal pose such that the bear sits at the imagined sitting pose .

SE(2) Planning Module. We use RRTConnect [kuffner2000rrt] in the OMPL library [sucan2012the-open-motion-planning-library] to plan the trajectory for the robot to carry the bear to the chair. The robot is encapsulated with an ellipsoid for collision checking with the FCL library [pan2012fcl]. The setting is shown in Fig. 4(c).

Iv System Pipeline

Fig. 2 shows the pipeline of our method. The robot arm first scans and reconstructs the 3D model of the chair (Sec. V-A). Then, the sitting imagination is conducted to find the imagined sitting pose (Sec. III-C). With , we first determine the goal pose for the NAO to walk to and place the bear. The rotation of is set such that the NAO faces the opposite of the sitting direction indicated by . The position of is on a horizontal ray which originates from the projection of on the xy-plane and points towards the sitting direction. It is initially set such that the NAO is away from . If the NAO is in collision with the chair at , we move it along the ray until it is collision-free. After that, if the distance between the NAO and is too large, we move horizontally along the ray and make it closer to the robot due to the robot workspace constraint. We then use the optimal control module to pre-plan the motion to place the bear and check the validity of . To reduce the planning time, the motion is simplified as a bilaterally symmetric motion – the motion of the left-half body is symmetric to that of the right-half. At the beginning of the motion, the bear is held in hands facing the NAO (Fig. 4(a)). Thus, the bear faces the sitting direction. We restrict the motion to be an motion, in which only the pitch joints are activated, to maintain the facing of the bear throughout the motion.

The trajectory to walk to is then planned with the planning module. If is out of the planning arena or blocked by obstacles, no plans will be made. In this case, the NAO gives a language instruction to the human to rotate the chair about the vertical axis such that the sitting direction points towards the NAO where there are no obstacles in between (see HRI in Fig. 2). The instruction is generated from a template: “Please rotate the chair about the vertical axis for degrees”. . is the multiple of 30 degrees closest to the precise rotation angle such that the sitting direction is pointing towards the NAO222We use multiples of 30 degrees instead of the precise angle because it is easier for humans to understand and act in the HRI.. The pose of the chair during the interaction is tracked by the iterative closest point (ICP) [besl1992ICP]. The transformation of the chair in the interaction is denoted as . The imagined sitting pose and the goal are updated and transformed by after the interaction. The planning module then tries to plan a trajectory to the updated . This process is repeated until a valid trajectory is found. In practice, we regard the trial as a failure if no trajectories can be found after three interactions.

After a valid trajectory is found, the NAO is controlled to walk and follow the waypoints along the trajectory via a PID controller. The walking motion is controlled by the NAOqi SDK333https://developer.softbankrobotics.com/nao6/naoqi-developer-guide/naoqi-apis. The bear is passed to the NAO manually before it starts walking. When the NAO arrives at , the optimal control module plans a whole-body motion to place the bear based on the robot current pose. Finally, the robot executes the motion, releases the bear, and uprights its body (Fig. 4(b)).

V Experiments

Fig. 1(a) shows the experiment setup. A PrimeSense Carmine 1.09 RGB-D camera is mounted on the robot arm. Besides scanning the chair, the camera is also used for tracking the chair in the HRI and the NAO, with an ArUco tag placed on top of its head, during its walking.

Fig. 5: Data. (b) shows a captured depth image when the robot is at the capturing pose shown in (a). (c) shows three capturing poses of the 3D scanning. (d) shows all the chairs in the dataset. Among the chair in the test set, two of them, indicated in the red box, are not typical chairs but are able to afford the sitting affordance. One is a step stool; the other is improvised by assembling books and boxes.

V-a 3D Scanning

In the experiment, the chair is placed randomly in a 5050cm squared area in front of the robot arm in its upright pose. The robot arm is moved to 9 pre-defined collision-free configurations to capture depth images of the scene with the RGB-D camera (Fig. 5(a)(b)(c)). The pose of the camera at each view is obtained from the forward kinematics of the robot arm. This allows us to use TSDF Fusion [curless1996volumetric] to densely reconstruct the scene and the point cloud. The chair is segmented from the scene by plane segmentation in the PCL library [rusu20113d].

V-B Data

Our dataset contains 15 chairs with diverse shapes and appearances (Fig. 5(d)). They are designed for children aged from 0 to 3. The reason we choose small chairs is that the size of the chair is restricted by the workspace of the NAO and the robot arm444If the chair is too tall, the NAO will be too short to put the bear on it. If the chair is too large, the robot arm will not be able to scan it.. We use 3 chairs (calibration set) to calibrate the simulation and the motion planning and control. The rest 12 chairs (test set), which are unseen by the robot, are used for testing.

V-C Physical Simulation

Pybullet [coumanspybullet] is used as the physics engine for sitting imagination. The chair and the virtual humanoid agent are imported with the URDF files which specify the mass, COM, inertia matrix, friction coefficient, and joint properties. We use the default Coulomb friction model. The collision in the simulation is modelled as inelastic. The physics attributes of the chair is computed with Meshlab [cignoni2008meshlab]. As the chairs in the dataset are designed for children, we set the height and weight of the agent in the simulation accordingly [onis2008child].

Vi Results & Discussions

Fig. 6: Result: Accessible. (a) Snapshot of the chair. (b) Imagined sitting pose. (c) The robot walks to the chair. (d) and (e) show the beginning and the end of the motion of putting the bear on the chair. (f) and (g) show the motion of retrieving hands and uprighting the body after placing the bear. (h) Results.

We implement our method with the Robot Operating System (ROS) on a computer running Intel Core i9-10920X @ 3.5GHz CPU. Our unoptimized implementation takes about 70s, 4s, and 29s for 3D scanning (including capturing and model reconstruction), sitting imagination, and motion planning (including both the whole-body and motion planning), respectively. The walking and placing the bear motions take about 67s and 24s, respectively.

Accessible. In the first set of experiments, the chair is placed such that it is accessible for seating the bear. The sitting direction points towards the NAO with a deviation within a range of degrees. No HRIs are needed in this case. For each chair in the test set, we place it in 3 different poses for testing, resulting in 36 trials in total (Fig. 6).

Inaccessible + Human Obeys. In the second set of experiments, the chair is placed such that it is inaccessible. That is the sitting direction of the chair is either pointing towards 1) the robot arm or 2) to the edges of the planning arena (Fig. 7(a)). In both cases, no valid trajectories can be found in the initial configuration. HRIs are needed to rotate the chair and make it accessible. We recruit 6 volunteers to participate in the experiments. For each chair in the test set, we place it in 2 different inaccessible poses (24 trials in total). We ask the human to obey the instruction given by the NAO throughout the whole trial (Fig. 7).

Inaccessible + Human Disobeys. There exist many uncertainties in HRIs (e.g., the human is distracted or misunderstands the instruction) which will result in inaccessibility even after the interaction. In this set of experiment, we want to test the robustness of our method in addressing these uncertainties. For each chair in the test set, we place it in an inaccessible pose as in the Inaccessible + Human Obeys setting (12 trials in total). We ask the human to deliberately disobey the first instruction given by the NAO and obey the following instructions (Fig. 8).

We recruit 15 annotators to annotate the experiment results. Each trial is annotated by five different annotators. For each trial, we show the experiment video and the images of the bear at the end of the trial. The annotator is then asked 1) “Do you think the robot has been successful in seating the bear on the chair?” For the trials where the chair is inaccessible, we also ask 2) “Do you think the human obeyed the instruction given by the NAO?” for each HRI in the trial and 3) “Is the chair accessible at the end of all the human-robot interactions?” The reason we recruit human annotators to annotate the results is that we think there is a perspective variance (e.g., whether a trial is successful) among different human subjects. The results on the test set are shown in the Tab. I.


(1) Seating Bear Success
Experiment Trial Positive Annotation Num.
Num. = 5 4 3
Accessible 36 26 34 34
Inaccessible + Human Obeys 24 18 22 23
Inaccessible + Human Disobeys 12 10 10 11
Total 72 54 66 68

(2) Human Obeys in HRI
Experiment Interact. Positive Annotation Num.
Num. = 5 4 3
Inaccessible + Human Obeys 26 20 23 23
Inaccessible + Human Disobeys 11 8 11 11
Total 37 28 34 34

(3) Chair Accessible After HRI
Experiment Interact. Positive Annotation Num.
Num. = 5 4 3
Inaccessible + Human Obeys 23 23 23 23
Inaccessible + Human Disobeys 11 10 11 11
Total 34 33 34 34
TABLE I: Experiment Results on the Test Set.
Fig. 7: Result: Inaccessible + Human Obeys. (a) Snapshot of the chair. (b) Imagined sitting pose. (c) Before HRI. (d) HRI. (e) After HRI. (f) and (g) show the beginning and the end of putting the bear on the chair. (h) Results.
Fig. 8: Result: Inaccessible + Human Disobeys. (a) Snapshot of the chair. (b) Imagined sitting pose. (c) Before HRI. (d) The human disobeys the instruction in the first HRI. (e) After the first HRI. (f) After the last HRI. (g) and (h) show the beginning and the end of putting the bear on the chair. (i) Results.

In table (1) of Tab. I, we show the number of trials with at least 5, 4, and 3 positive annotations to the first question. We count the trial as successful if and only if the sitting imagination found the sitting pose and more than half of all the 5 annotations (at least 3) are positive. The results justify our point of perspective variance: the number of trials with at least 5, 4, and 3 positive annotations varies from each other. For example, some annotators allow the bear to be a bit tilted on the chair while others may count it as a failure. In general, we are able to achieve a very high success rate of on the 12 unseen chairs in the test set (68 successful trials out of all the 72 trials). The success rates of the three sets of experiments are roughly the same. The step stool accounts for all the 4 failure cases in the 72 trials. Two in Accessible and one in Inaccessible + Human Obeys and Inaccessible + Human Disobeys, respectively. In all of these failure cases, the sitting imagination fails to find the sitting pose because the depth of the seat is too shallow. A successful trial of the step stool is shown in Fig. 6. Notably, the success rate of all the 6 trials of the improvised chair is . This opens up a promising potential of our imagination method to discover the affordance of an object which can afford the sitting functionality despite it not being a typical chair.

In table (2) of Tab. I, we show the total interaction number in all the inaccessible trials and the number of interactions with at least 5, 4, and 3 positive annotations to the second question. We only count the trial when HRIs are involved, i.e., we disregard the failure trials where the sitting imagination fails to find the sitting pose and no instructions will be given. Also, for Inaccessible + Human Disobeys, the interactions in which the human deliberately disobeys the instruction are not counted. We consider the volunteer obeys the NAO’s instruction if more than half of all the annotations are positive. Interestingly, we observe that in some Inaccessible + Human Obeys trials, although the volunteer was told to obey the robot instruction, he/she somehow disobeyed the instruction. And in these trials, the robot was able to give new instructions based on the current pose of the chair until it is accessible, resulting in a larger number of interactions (26) than the total number of trials counted (23). The perception variance also exists in this annotation. Some people requires the rotation of the chair to be very close to the angle in the instruction, while others don’t. In general, in of all HRIs, the human is considered to follow the robot instruction.

In table (3), we show the total number of trials with HRIs and the number of trials with at least 5, 4, and 3 positive annotations to the third question. Similar to table (2), we only count the trial when HRIs are involved. The goal of the HRI is to make the chair accessible for placing the bear. We evaluate the accessibility of the chair at the end of the HRI to assess the effectiveness of our HRI framework. We consider the chair accessible after the HRI if more than half of all the annotations are positive. The results show that our framework is very effective – it achieves a success rate in making the chair accessible in all the trials with successful sitting imagination. The perspective variance in this annotation is not as significant as that in table (1) and (2).

Besides the test set results shown in Tab. I, we also test on the three chairs in the calibration set with the same three experiment settings (18 trials in total). Although the chairs are seen, the poses of the chairs are new and different from those in the calibration. The success rate of placing the bear in the 18 trials is 100%. In the 9 trials involving HRIs, the success rate of making the chair accessible is 100%.

Vii Conclusions & Future Work

We propose a novel method to imagine the sitting pose of a previously unseen chair. We develop a robotic system which is able to put a teddy bear on the chair autonomously via robot imagination. Moreover, we introduce a human-robot interaction (HRI) framework to change the accessibility of the chair when the chair is in an inaccessible pose. Experiment results show that our method enables the robot to put the bear on 12 previously unseen chairs in 72 trials with a very high success rate. The HRI framework is also shown to be very effective in making the chair accessible from inaccessible poses. Future work can adaptively change the size and the shape of the humanoid agent in the sitting imagination to imagine the affordance of chairs with various sizes. Mobile manipulators and larger humanoid robots can be used to put the bear on these chairs in the real world. Exploring a more versatile whole-body motion planning for placing the bear is also a promising future direction.

Acknowledgment

The authors thank Yuanfeng Han for his helpful discussions, all the volunteers for the HRI experiments, and all the annotators for the human annotations.

References