Food manipulation: A cadence of haptic signals

04/23/2018 ∙ by Tapomayukh Bhattacharjee, et al. ∙ University of Washington 0

Autonomous assistive feeding is challenging because it requires manipulation of food items with various compliance, sizes, and shapes. To better understand how humans perform a feeding task and explore ways to adapt their strategies to robots, we collected a rich dataset of human subjects' feeding instances and compared them with position-controlled instances via a robot. In the analysis of the dataset which includes measurements from visual and haptic signals, we demonstrate that humans vary their control policies to accommodate to the compliance and the shape of the food item being acquired. We propose a taxonomy of manipulation strategies for feeding to highlight such policies. Our subsequent analysis of failed feeding instances of humans and the robot highlights the importance of adapting the policy to the compliance of a food item. Finally, as the first step to generate compliance-dependent policies, we propose a set of classifiers which classifies haptic and motion signals during bite acquisition into four compliance-based food categories. Temporal Convolution Network (TCN) outperforms other classifiers with an accuracy of 82.2

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Nearly million (18.7%) among the non-institutionalized US population had a disability in 2010 [1]. Among them, about million needed assistance with one or more activities of daily living (ADLs) or instrumental activities of daily living (IADLs). Key among these activities is feeding, which is both time-consuming for the caregiver, and challenging for the care recipient to accept socially [2].

Although there are several automated feeding systems in the market [3, 4, 5, 6], they have lacked widespread acceptance as they use minimal autonomy, demanding a time-consuming food preparation process [7], or pre-cut packaged food.

Eating free-form food is arguably one of the most intricate manipulation tasks we perform in our daily lives, demanding robust nonprehensile manipulation of a deformable hard-to-model target. Thus, automating food manipulation is daunting. The universe of foods, cutlery, and human strategies is massive. In this paper, we take a small first step towards organizing the science of autonomous food manipulation.

First, we collect a large and rich dataset of human strategies for food manipulation composed of trials collected over more than hours of data collection (Figure 0(a)), recording ground-truth interaction forces, torques, poses, and RGBD imagery, providing us unprecedented and in-depth data on the mechanics of food manipulation (Figure 0(c)).

Second, we analyze our dataset to build a taxonomy of food manipulation, organizing the complex interplay between fork and food (Figure 5).

A key observation from the human subject data was that the choice of a particular control policy for bite acquisition depended heavily on the compliance of the item being manipulated. People used different strategies for soft and hard items, such as tilting the fork to prevent slipping for a slice of banana, or wiggling the fork to increase pressure for a carrot. Several feeding concerns, such as how the target would bite, were also reflected in the manipulation strategies, not only during transport but also in bite acquisition.

(a) Human feeding study
(b) Robot feeding experiment
(c) Example feeding motion, rendered from one of the collected human subject trials. The subject needed two skewering attempts because the item slipped in the first attempt.
Fig. 1: Examples of a feeding task with a dinner fork.
(a) Experimental setup
(b) Instrumented fork, Forque
(c) Food items
(d) Success rate comparison
Fig. 2: Collection of a rich dataset of multimodal signals (forces, poses, RGBD) of human subjects and a robot using an instrumented fork (Forque) to acquire a bite of different food items and feed a mannequin. The robot success rate using a position-control scheme was lower than that of humans who controlled forces and motions to acquire food items of varying compliance.

Third, the importance of choosing a control policy based on the compliance of a food item was further highlighted when, using a simple position-control strategy with a vertical skewering motion (Figure 2(a)), a robot (Figure 0(b)) had multiple failures (Figure 1(d)) in picking up soft and hard-skinned items. Human subjects, on the other hand, demonstrated various control policies to adapt to their compliance (Figure 2(b), Figure 2(c)).

This key insight motivated us to explore a set of classifiers for compliance-based food categorization, which takes the motion and force/torque signals of the fork and outputs the compliance category of the food being manipulated. Such food classification based on haptic and motion signals instead of only vision-based classification [8, 9, 10, 11, 12] is beneficial during food manipulation, as visually similar items may have different compliance and therefore may need different control policies. Our best-performing classifier based on Temporal Convolutional Network [13] successfully categorized food items for both the human and robot experiments despite the different manipulation strategies.

Food manipulation promises to be a fascinating new challenge for robotics. Our main contributions in this paper are a rich dataset, an intuitive taxonomy, and a haptic analysis. We are excited about further work that builds upon all three of these contributions towards a science of food manipulation.

Ii Related Work

Our work connects three areas of research: food manipulation, manipulation taxonomies, and haptic classification.

Ii-1 Food manipulation

Studies on food manipulation in the packaging industry [14, 15, 16, 17, 18, 19] have focussed on the design of application-specific grippers for robust sorting and pick-and-place. Crucially, not only did they identify the need for haptic sensing as critical for manipulating non-rigid food items, but also that few manipulators are able to deal with non-rigid foods with a wide variety of compliance [14, 15, 16, 17, 18, 19].

Research labs have explored meal preparation as an exemplar multi-step manipulation problem, baking cookies [20], making pancakes [21], separating Oreos [22], and preparing meals [7] with robots. Most of these studies either interacted with a specific food item with a fixed manipulation strategy [20, 21] or used a set of food items for meal preparation which required a different set of manipulation strategies [7]. Importantly, all of these studies emphasized the use of haptic signals (through joint torques and/or fingertip sensors) to perform key sub-tasks.

Ii-2 Manipulation Taxonomies

Our work is inspired by the extensive work in human grasp and manipulation taxonomies [23, 24, 25, 26, 27] which has not only organized of how humans interact with everyday objects but also inspired the design of robot hands and grasping algorithms [28].

However, unlike most of these studies, our focus here is to develop an application-specific taxonomy, with a specific focus on manipulating deformable objects for feeding. We believe this focus is critical as feeding is both a crucial component of our everyday lives, and uniquely different in how we interact with the world. In that regard, our work echoes the application-specifc work in human-robot interaction on handovers, also a crucial and unique act [29, 30], where the analysis and taxonomy of human-human handovers laid the foundation for algorithms for seamless human-robot handovers [29, 30, 31].

Ii-3 Haptic Classification

Most of the studies on haptic classification use specialized or distributed sensors on robot hands or fingertips for direct robot-hand and object interactions. Our work focuses on using a tool (Forque) to record forces and motions of Forque-food interactions. Researchers have previously used haptic signals to classify haptic adjectives using multimodal sensing [32] from sophisticated BioTac [33] robotic fingers. Studies have also been done to categorize rigid and deformable objects using piezo-resistive rubber [34], and classify materials using randomly-distributed strain gauges on an artificial bio-inspired finger to sense surface texture [35]. Researchers have also used tactile sensing for inferring object properties such as elasticity of deformable objects [36], hardness [37], texture, weight [38] and compliance [38, 39]. Some studies used forces for localizing features in flexible materials [40], object recognition [41, 42, 43], inferring grasp policies [44] as well as slip detection [45]. Most of these studies focus on using specific haptic exploratory behaviors (tap, squeeze, slide etc.) to extract meaningful information from a variety of distributed sensors and do not address the problem of classifying food items.

In a related work on meal preparation application, Gemici and Saxena [7] learn physical properties of 12 food items using end-effector forces, torques, poses, joint torques, and fingertip forces. However, they carefully designed the robotic actions (e.g. cut, split, flip-turn) using multiple tools (knife, fork, spatula) to extract meaningful sensor information to infer physical properties such as hardness, plasticity, elasticity, tensile strength, brittleness, and adhesiveness. Our objective is to classify food items into compliance-based categories using a variety of forces and motions that people use naturally when manipulating different food items.

Iii Experimental Setup

(a) Robot feeding
(b) Human: Wiggling for a carrot
(c) Human: Twirling for noodles
Fig. 3: Selected robot and human trajectories. The robot used a simple position-control of vertical skewering motion, while humans showed diverse strategies, accommodating to the item being manipulated. Note that the mannequin was place higher for the robot for a feasible trajectory.

We built a specialized test rig for our experiments (Figure 1(a)). Our goal was to perform high-fidelity capture of both motion and force during feeding.

Iii-a Forque: A Force-Torque fork sensor

We instrumented a dinner fork (Figure 1(b)) to measure forces, torques, and motions generated during food manipulation. We selected an ATI Nano25 F/T sensor for 6-axis force/torque (F/T) measurements due to its minimal size, weight, appropriate sensing range and resolution for food manipulation. We designed the end of the Forque handle to attach spherical markers for motion capture with the NaturalPoint Optitrack system [46]. We designed Forque’s shape and size to mimic the shape and size of a real dinner fork. We 3D printed the handle and the tip of the Forque in plastic and metal respectively. A wire connecting the F/T Sensor with its Net F/T box runs along the length of the Forque along a special conduit to minimize interference while feeding.

Iii-B Perceptual data

To collect rich motion data, we installed Optitrack Flex13 [47] motion capture cameras on a specially-designed rig, with full coverage of the workspace. This provided full 6 DOF motion capture of the Forque at 120 frames per second (FPS). In addition, we installed a calibrated (both extrinsically and intrinsically) Astra RGBD [48] camera for recording the scene at FPS, as well as a Canon DSLR RGB camera for recording videos for human labeling (Figure 1(a)).

(a) Tilted approach angle
(b) Wiggle for hard and hard skin
(c) Scraping the bowl to scoop
(d) Hitting the plate when piercing a hard skin (grape)
Fig. 4: Selected highlights: Different manipulation strategies in the approach and bite acquisition phases. is the applied force on Forque’s z-axis, and is the torque about Forque’s x-axis.

Iii-C Food items

We selected food items classified into four categories based on their compliance: hard skin, hard, medium and soft. We had three food items for each of the four categories of food. In the hard skin category, we selected pepper, (cherry) tomato, and grape, whereas for the hard category, we had carrot, celery, and apple. In the medium category, we selected cantaloupe, watermelon, and strawberry and for the soft category, we had banana, blackberry, and egg. Section Section VII validates our categorization of the food items. Since all these items could be picked up by skewering, we added noodles and potato salad (in separate containers), to diversify the manipulation strategies. Figure 1(c) shows typical plates of food offered to the subjects.

Iii-D Data Collection

We compiled the data as rosbag files using ROS Indigo on Ubuntu 14.04. The system clocks were synchronized to a Network Time Protocol server. We measured the average sensor delay between the Optitrack mocap signal and force/torque signal to be 30ms from 10 repeated trials. Our dataset is available at [49].

Iv Human Study Procedure

The task of each participant was to feed the mannequin. Before each experiment, we asked the participants to sign a consent form and fill a pre-task questionnaire. We asked our participants to pick up different food items from a plate or bowl using the Forque and feed a mannequin head as if they were actually feeding a person. The head was placed at the height of a seated average human (Figure 1(a)).

For each session, we provided the participant with a plate of 48 pieces of food (4 pieces per item for 12 food items), a cup of potato salad, and a bowl of noodles. We asked each participant to pick up noodles and potato salad 4 times each to maintain consistency.

Before each trial, a participant held the Forque at a predefined position marked on the table by a tape. When a computerized voice said “start” the participant could pick up any food item of their choice and feed the mannequin. After the participant brought the food item near the mouth of the mannequin, they waited until the experimenter said “stop”. They then discarded the food item and repeated another trial. We define a trial as one instance of feeding the mannequin, from “start” to “stop”.

There were trials per session. Each participant had 5 sessions with a 2 to 5 minute break between each session, and each session began with a new plate (Figure 1(c)), giving us trials per participant. We had 12 participants in the range of 18 - 62 years of age. This resulted in a grand total of trials. However, due to a technical glitch, we missed recording data for one of the sessions, thus giving us trials. For a left-handed participant, we inverted the experimental setup so that they could naturally feed the mannequin with their left hand.

At the end of the experiment (after 5 sessions), we gave each participant a post-task questionnaire asking about their manipulation strategies during the task (see Supplementary Material). The experiments were done in accordance with our University’s Institutional Review Board (IRB) review.

V A Taxonomy of Feeding: Insights from Human Subject Experiments

Fig. 5: A partial taxonomy of manipulation strategies relevant to a feeding task.

Feeding is a complex task. Creating a taxonomy of manipulation behaviors for feeding is helpful in systematically categorizing it into sub-tasks. Segmentation allows us to better understand the different strategies people use in different phases of this task. During the human subject experiments, we observed that there were a variety of force and position control strategies that subjects used to manipulate different categories of food items. We developed a partial taxonomy to better understand the manipulation strategies used for feeding. We divided the feeding task into four primary phases: 1) rest, 2) approach, 3) bite acquisition, and 4) transport (Figure 5111Drawings in taxonomy are derivatives of “Fork” icon by Stephanie Szemetylo,“Bowl” icon by Anna Evans,“Hand” icon by Jamie Yeo, and “Noodle” icon by Artem Kovyazin [50]).

V-a The rest phase: choose which item to pick up

We define the rest phase as the phase before any motion is executed and when the arm is at rest. This phase could be used for preliminary decision making, such as deciding which item to pick up.

V-B The approach phase: prepare for bite acquisition

Once a subject decides which item to pick up, they move the Forque to acquire the item. We define the approach

phase to be from the moment the subject starts moving the Forque until an active bite acquisition motion begins, such as piercing, twirling, or scooping. This phase serves as a key preparation step for successful bite acquisition. During this phase, the subject (a) decides which high-level motion strategy to use in bite acquisition and aligns the food and the Forque accordingly, and (b) uses the non-dominant hand to stabilize the container if necessary. For deciding the high-level motion, i.e, whether to skewer, scoop, or twirl, the shape and size of the item played a key role. The following are key behaviors we observed that were critical for successful bite acquisition.

V-B1 Subjects aligned the food and the Forque for better bite acquisition or feeding

For food items with asymmetric shapes or irregular curvatures, such as celery, strawberry, or pepper, seven subjects used their Forque at least once to reorient the food items and expose a flat surface so that they could pierce the food item easily. If there were any exposed flat surfaces on the food, the subjects often oriented the Forque to align with the normal to the surface. If the item was large, the subjects oriented the Forque to aim at only a corner of the item so that the person being fed could bite the item without biting the Forque tines. For easier bite acquisition, they controlled the approach angle based on the compliance of the food. For hard food items, such as carrots, celeries, and apples, subjects approached the food with the Forque normal to an exposed surface.

V-B2 Subjects used environment geometry to stabilize the motion of oval food items for skewering

For small oval food items, such as grapes or tomatoes, which tended to slip or roll, some subjects used the geometry of the plate (extruded edge) as a support to stabilize the items (Figure 3(d)). This strategy was observed even for relatively large items, such as hard-boiled eggs. If the egg was resting on its high curvature surface, some subjects used nearby food items or the extruded edge of the plate as a support to stabilize the egg. In one of the responses to the post-task questionnaire, one of the subjects mentioned, “I would … corner it at the edge of the plate.” Five subjects used the environment geometry at least once to stabilize food items.

V-B3 Subjects used bimanual manipulation strategies to access difficult-to-reach items using environment geometry

If there was only a little potato salad or noodle left, subjects applied bimanual manipulation strategies in which they used one hand to tilt or hold the container, while the other hand acquired the food with the Forque, often using the container wall as a support (Figure 3(c)). All subjects used bimanual strategies at least once to either hold or tilt the container.

(a) Multiple attempts to acquire “enough of a bite”
(b) Tilting the Forque for feeding
Fig. 6: Selected highlights: Trajectories adapted for appropriate amount of food and easy feeding. is the position of the Forque along global y-axis, and is the rotation about global z-axis.

V-C The bite acquisition phase: apply position and force control

Once the Forque made contact with the food item, subjects used various force and position control strategies to acquire the bite. We define the bite acquisition phase to be from the moment the Forque is in contact with the food item until the liftoff, when the item is lifted off from the plate. During this phase, the compliance of food items was a key factor in deciding the control strategy. While simple vertical skewering was common for medium-compliance items, a few interesting strategies were for the hard skin, hard, and soft categories.

V-C1 For hard and hard skin items, subjects used only some of the Forque tines or applied wiggling motions to increase pressure

Subjects skewered the hard and hard skin food items using only some of the Forque tines, which enabled them to put more pressure and thus pierce easily. All subjects used this strategy at least once. Eight subjects used a wiggling motion (from left to right and back) to pierce the food items (Figure 3(b)). One of the subjects mentioned, “… sometimes needed to wiggle the fork back and forth to concentrate the pressure at only one tine to break through the skin of tomato, grape, etc.”

V-C2 For soft items, subjects skewered at an angle to prevent slip

For soft items such as slices of bananas which tended to slip out of the Forque tines during liftoff, subjects skewered the item at an angle (Figure 3(a)) to prevent slip by increasing friction using gravity. All subjects used this strategy at least once. For example, one of the subjects mentioned in the post-task questionnaire, “I would try to penetrate the fork at an angle to the food to mimimize slices coming out…”

V-D The transport phase: feed the target

We define the transport phase as the phase after the food item is lifted from the plate until it is brought near the mannequin. For our experiments, the subjects were instructed to feed the mannequin as they would feed a person. This played a role in deciding the manipulation strategies. In the post-task questionnaire, many subjects mentioned two key factors for feeding which affected their manipulation strategy: (a) ease of bite, and (b) appropriate amount of bite.

V-D1 Subjects skewered food items at locations and orientations that would benefit the feeding task

For long and slender items, such as carrots, some subjects skewered it in one corner so that a person may be able to easily take a bite without hitting the Forque tines. This also played a role in selecting the orientation of the Forque when skewering the food item. For example, some subjects reported that they changed the orientation of the Forque before piercing a food item for ease of feeding. Eight subjects used these strategies.

V-D2 Subjects adapted their transport motion to prevent food from falling off

Subjects adapted their motion towards the mannequin after picking up certain food items to prevent the items from falling off. They detected (felt) this using subtle haptic signals. One subject mentioned, “Unless it felt as if it might fall, I usually moved at what I thought was about the same speed.” Another subject said, “I tried to be faster with eggs because they break apart easily and fall off the fork.” Yet another mentioned, “With many softer foods (bananas specifically), I brought my arm up in a scooping motion to the mouth.” Note, even for potato salad, one subject remarked “I had to maneuver to pick up potato chunks which I could feel were buried inside or slipping off the fork.”

V-D3 Subjects oriented the Forque to benefit the feeding task

While approaching the mannequin, the subjects oriented the Forque such that the item was easy for the person to bite (Figure 5(b)). All subjects used this strategy. Interestingly, one of the subjects said, “Since I was using a fork and not a spoon, I could re-orient solid foods quite easily … I had to re-orient the fork often after picking food up in order to make it easier to bite for the humans.”

V-D4 Subjects picked food up multiple times in one trial to acquire an appropriate amount

Although we never specified any specific food amount per bite, a few subjects attempted multiple scoops or twirls for noodle and potato salad to acquire an appropriate amount of food for a bite (Figure 5(a)). Six subjects showed such a strategy.

(a) Classification per category
(b) TCN with various feature sets
(c) TCN’s convolutional kernel outputs
Fig. 7: Figure 6(a) compares 4 classifiers using 3-fold cross-validation accuracy. Solid lines show the expected performance of a random classifier. Each classifier uses the feature set with best results. Figure 6(b) shows comparison of TCN models trained with various feature sets. are the forces and their derivatives in Forque’s local frame, are the torques and their derivatives in Forque’s local frame, are the positions and their derivatives in the global frame, and are the rotations and their derivatives in the global frame. Each feature includes its first-order derivative. Force along the principal axis of the Forque, , is the most important feature for classification, as alone can correctly classify 74% of the test samples. Figure 6(c) shows TCN’s convolutional layers’ final output before its linear layers. The most distinctive features are found in the later half of the time series in force, torque, and their derivatives (the red boxed regions).

V-E Humans learned from failures

The subjects were not perfect in manipulating food items but after only a few failures, they learned to respond to the subtle haptic signals. This was probably because there was a mismatch between subjects’ initial estimations of the forces and motions required to pick up a food item and the actual physical interactions. One subject mentioned, “… the celery was harder than I was expecting. So, after a couple of times, I knew to exert more force.”

For oval food items with hard skin, such as grapes and tomatoes, the food either slipped or rolled multiple times. When skewering halved hard-boiled eggs, the yolk was often separated from the white during liftoff. In an answer to a question in the post-task questionnaire, one of the subjects mentioned, “The egg was tricky. I learned to spear it by the white part and the yolk at the same time to keep it together.”

The subjects also dropped soft items multiple times. Figure 1(d) shows the bite acquisition success rate. Even when the motion led to a successful bite acquisition, there were unintended results such as hitting the plate when piercing a hard skin food item. For these cases, after their first or second trials, subjects changed their manipulation strategies. One of the subjects remarked, “I also learned to spear grapes by just one prong of the fork.” Out of all the trials when subjects learned from their previous failures and changed their strategy, were for hard skin, for hard, for soft, and for medium food items. Despite the changes in manipulation strategies, human subjects were never perfect even at the end of their last sessions.

Vi Haptic Classification

One key observation from the human subject experiments was that people use different manipulation strategies for interacting with food items of different compliance. The choice of the manipulation strategies were crucial for successful bite acquisition, as can be seen from Figure 1(d) which compares human’s average success rate with that of the robot, which used a position-control policy with a vertical skewering motion. While the robot’s performance was comparable to humans in hard and medium categories, it performed poorly on hard skin and soft items, for which humans utilized several strategies to account for their compliance (Section V-C). To facilitate control policies based on compliance, we present haptic classification of food items into four compliance-based categories: hard skin, hard, medium, and soft. There were 240 trials per subject (4 categories x 3 food items x 4 pieces per food item x 5 sessions). We missed recording data from one session (48 trials) due to a technical glitch. Thus, this resulted in 2832 (240 trials x 12 subjects - 48 trials) trials.

Vi-a Discriminative models using LSTM, TCN, and SVM

We use three discriminative models: Long Short Term Memory Networks (LSTM 

[51]), Temporal Convolutional Networks (TCN [13]

, and Support Vector Machines (SVM 

[52]).

LSTMs are a variant of Recurrent Neural Networks (RNN) which have been shown to be capable of maintaining long-term information. At every time-step, an LSTM updates its internal states and outputs a categorical distribution across the four categories. For LSTM, we stacked two layers of LSTM with 50 layers, which is then connected to a rectified linear unit (ReLU) and a linear layer. We then performed a softmax operation to get the probability distribution.

Unlike an LSTM, which maintains an internal state, a Temporal Convolutional Network (TCN) takes the whole trajectory as one input. It learns kernels along the temporal dimensions and across features. For TCN, we stacked four convolutional networks, each with one dimensional temporal kernels of window size 5. Between each layer, we performed one ReLU operation and max pooling of width 2. The final output is connected to a ReLU and a linear layer before performing a softmax operation. For TCN, we scaled the temporal dimension of each training data to have 64 steps using bilinear interpolation, where 64 was chosen to approximately match the average temporal length of the data. Cross entropy loss was used for LSTM and TCN.

For SVM, we interpolated each timeseries feature similar to that of TCN, concatenated the interpolated timeseries features to obtain a feature vector [52, 53, 54] and then used a linear kernel [55]

to train the SVM classifier. We implemented LSTM, TCN using PyTorch 

[56], and SVM using scikit-learn [57].

Vi-B Generative models using HMMs

To use hidden Markov models (HMMs) for classification, we train one HMM per food category 

[52, 58, 59]. We characterize an HMM model () by where is the state-transition matrix, defines the continuous multivariate Gaussian emissions, and is the initial state distribution [52, 58, 59]. Let be the number of food categories and let be a training observation vector for contact duration . During training, we estimate the model parameters to locally maximize using the iterative Baum-Welch method [52, 58, 59]. In our case, (hard skin, hard, medium, soft). For a test sequence , we assign the label (food category) which maximizes the likelihood of the observation [52, 58]:

We implemented HMMs using the GHMM library [60]. For each of the food-category based HMMs, we optimized the number of hidden states to give maximum validation accuracy. This resulted in 3 hidden states for all the categories. One can think of these states as the hidden states that describe the Forque-food interaction once the Forque tines are inside the food item. We set a uniform prior to all the states.

(a) Confusion matrix of per-class classification (b) Confusion matrix of per-item classification (c) Confusion matrix for robot data (d) Robot vs. Human
Fig. 8: Confusion matrices for haptic classification using Temporal Convolution Network. Most confusion happens across nearby haptic categories, e.g. between hard skin and hard, or medium and soft. In the per-item classification (Figure 7(b)), confusions across different categories are minimal compared to within-category confusion. Haptic classification of robot data shows similar trends (Figure 7(c)). However, Figure 7(d) shows that robot (shown in red) and human (shown in black) experiments generated different forces.

Vii Results

Figure 6(a) compares the performance of our four classifiers. We used 3-fold cross validation on 2832 trials collected from human subject experiments. For each classifier, we tested various combinations of feature sets for each classifier and displayed the one with the best performance. The features we tested are local forces, torques, global pose (positions and orientations) of the Forque, and their first-order derivatives. For classifiers trained with multiple features of different magnitude scales, we normalized their values. TCN and LSTM used all features, while SVM and HMMs achieved the best performance with a combination of forces and positions. The best performing classifier was TCN, which showed accuracy for category classification. Note that the HMM is a generative model, unlike the other classifiers presented here, and thus, it classifies by modeling the distributions of these 4 categories individually. The models are not optimized to maximize the discriminative aspects of these different categories.

To analyze the importance of various features in classification, we compared the performance of TCN (the best performing classifier) when trained with different feature sets in Figure 6(b) and Figure 6(c). We could see that forces and positions are critical in the classification. In fact, the -directional force, the force along the principal axis of the Forque, alone can correctly identify of the samples.

The confusion matrix in Figure 7(a) provides further insights on where the classifier fails. The most confusion happens between nearby categories, e.g. between medium and soft, and hard skin and hard which have similar haptic properties. The per-item classification (Figure 7(b)) further shows items are most likely to be misclassified as items within the same class, which validates our compliance categories.

Viii Robot Experiments

We conducted robot experiments to see whether the robot can perform the feeding tasks using a fixed manipulation strategy which uses a position-control scheme and a vertical skewering motion.

Viii-a Experimental Setup

We used a Fetch robot with a back-drivable 7 DOF arm. We modified the handle of the Forque so that it could be grasped by the robot’s gripper. Our robot experimental setup was otherwise identical to the human setup.

Viii-B Experimental Procedure

We programmed the robot using a programming by demonstration (PbD) technique [61] by saving a series of waypoints (joint configurations) of the arm through human demonstrations (Figure 2(a)). We performed a total of 240 trials (4 categories x 3 food items x 4 pieces per food item x 5 sessions) of robot experiments. In each trial, the robot used a vertical skewering motion to pick up a food item from a fixed location on the plate. We randomly selected 4 such locations on the plate. After each trial, we discarded the skewered food item and manually placed another food item from the plate in that location for the next trial. After one session, we replaced the entire plate with a new plate and repeated this procedure for 5 sessions. We did not program the scooping and twirling motion, and thus, did not use noodles and potato salad for these experiments. We collected the same data as during the human subject experiments.

Viii-C Results

Human subjects used different forces and motions to acquire food items of varying compliance. This implies that a robot may benefit from choosing its motion strategy based on a compliance-based categorization, and learn to force-control as humans would. While we delegate the force-control policy learning as our future work, we performed the robot experiments to see if the robot could successfully feed the target using a fixed manipulation strategy with a position-control scheme and a vertical skewering motion.

Interestingly, using a different control scheme and bite acquisition motion does not affect the haptic classification accuracy. Figure 7(c) shows the confusion matrix using a 4-fold cross validation of robot experiments. When trained with a TCN on the robot data, we get 222Note, using a 3-fold crossvalidation scheme, we get lower accuracy of probably because of lack of data (20 trials per food item). accuracy, which shows that the observations from each category are different enough for haptic inference even with the position-control scheme. However, the robot experiments (position-control with a vertical skewering motion) and human experiments lead to very different forces as shown in Figure 7(d) which shows the forces along the principal axis of the Forque. Thus, the classifier for human subject data resulted in only accuracy for classifying robot data.

However, using a different control scheme and bite acquisition motion affected the bite acquisition success rate. This is evident from Figure 1(d) where we see that the robot failed to pick up food items belonging to soft and hard skin categories more than that of humans. This is probably because humans were using varied forces and motions to pick up food items of different compliance (See Section V). This further shows the need for different manipulation strategies for different compliance-based categories, which we delegate as our future work for robot manipulation.

Ix Discussion

We observed some interesting factors that could affect the forces and motions of the feeding task. Some subjects grasped the Forque much closer to the tines while others held it unusually high. Some subjects held the Forque at unusual rotations about its principal axis. These were probably due to some cultural differences. Interestingly, subjects’ personal choices could also affect their manipulation strategies. For example, one subject mentioned, “… prefer [to] avoid yolk (I hate hard-boiled eggs).” We also observed that subjects picked up noodles using both clockwise and counter-clockwise twirls.

Despite the universe of human strategies, people were never perfect in manipulating food items of varying compliance even after learning from failures (Section V-E). One caveat in instrumenting the Forque with an ATI F/T sensor resulted in a different weight distribution than a standard dinner fork, and although the dimensions of the Forque tines and the Forque handle matched a standard dinner fork dimensions, the material composition and tine sharpness may not exactly match a standard dinner fork. However, only one subject explicitly pointed out the difference in weight distribution. Finally, there was a wire attached to the F/T sensor, by which only one subject reported that his feeding motions were affected. Nonetheless, it remains to be seen how well subjects can learn from the failures given enough time to practice.

References

  • Brault [2012] Matthew W Brault. Americans with disabilities: 2010. Current population reports, 7:70–131, 2012.
  • Perry [2008] Lin Perry. Assisted feeding. Journal of advanced nursing, 62(5):511–511, 2008.
  • [3] Obi. https://meetobi.com/,[Online; Retrieved on 25th January, 2018].
  • [4] My spoon. https://www.secom.co.jp/english/myspoon/food.html,[Online; Retrieved on 25th January, 2018].
  • mea [a] Meal-mate, a. https://www.made2aid.co.uk/productprofile?productId=8&company=RBF%20Healthcare&product=Meal-Mate,[Online; Retrieved on 25th January, 2018].
  • mea [b] Meal buddy, b. https://www.performancehealth.com/meal-buddy-system,[Online; Retrieved on 25th January, 2018].
  • Gemici and Saxena [2014] Mevlana C Gemici and Ashutosh Saxena. Learning haptic representation for manipulating deformable food objects. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 638–645. IEEE, 2014.
  • Brosnan and Sun [2004] Tadhg Brosnan and Da-Wen Sun.

    Improving quality inspection of food products by computer vision—-a review.

    Journal of food engineering, 61(1):3–16, 2004.
  • Gunasekaran [1996] Sundaram Gunasekaran. Computer vision technology for food quality assurance. Trends in Food Science & Technology, 7(8):245–256, 1996.
  • Savakar and Anami [2009] Dayanand G Savakar and Basavaraj S Anami. Recognition and classification of food grains, fruits and flowers using machine vision. International Journal of Food Engineering, 5(4), 2009.
  • Mendoza and Aguilera [2004] F Mendoza and JM Aguilera. Application of image analysis for classification of ripening bananas. Journal of food science, 69(9), 2004.
  • Ding and Gunasekaran [1994] Kexiang Ding and Sundaram Gunasekaran.

    Shape feature extraction and classification of food material using computer vision.

    Transactions of the ASAE, 37(5):1537–1545, 1994.
  • Lea et al. [2016] Colin Lea, René Vidal, Austin Reiter, and Gregory D Hager. Temporal convolutional networks: A unified approach to action segmentation. In Computer Vision–ECCV 2016 Workshops, pages 47–54. Springer, 2016.
  • Chua et al. [2003] PY Chua, T Ilschner, and DG Caldwell. Robotic manipulation of food products–a review. Industrial Robot: An International Journal, 30(4):345–354, 2003.
  • Erzincanli and Sharp [1997] F Erzincanli and JM Sharp. Meeting the need for robotic handling of food products. Food Control, 8(4):185–190, 1997.
  • Morales et al. [2014] R Morales, FJ Badesa, N Garcia-Aracil, JM Sabater, and L Zollo. Soft robotic manipulation of onions and artichokes in the food industry. Advances in Mechanical Engineering, 6:345291, 2014.
  • Brett et al. [1991] PN Brett, AP Shacklock, and K Khodabendehloo. Research towards generalised robotic systems for handling non-rigid products. In ICAR International Conference on Advanced Robotics, pages 1530–1533. IEEE, 1991.
  • Williams et al. [2001] Tomos G Williams, Jem J Rowland, and Mark H Lee. Teaching from examples in assembly and manipulation of snack food ingredients by robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 4, pages 2300–2305. IEEE, 2001.
  • Blanes et al. [2011] C Blanes, M Mellado, C Ortiz, and A Valera. Technologies for robot grippers in pick and place operations for fresh fruits and vegetables. Spanish Journal of Agricultural Research, 9(4):1130–1141, 2011.
  • Bollini et al. [2011] Mario Bollini, Jennifer Barry, and Daniela Rus. Bakebot: Baking cookies with the pr2. In The PR2 workshop: results, challenges and lessons learned in advancing robots with a common platform, IROS, 2011.
  • Beetz et al. [2011] Michael Beetz, Ulrich Klank, Ingo Kresse, Alexis Maldonado, Lorenz Mösenlechner, Dejan Pangercic, Thomas Rühr, and Moritz Tenorth. Robotic roommates making pancakes. In IEEE-RAS International Conference on Humanoid Robots, pages 529–536. IEEE, 2011.
  • [22] Oreo separator machines. https://vimeo.com/63347829,[Online; Retrieved on 1st February, 2018].
  • Cutkosky [1989] Mark R Cutkosky. On grasp choice, grasp models, and the design of hands for manufacturing tasks. IEEE Transactions on robotics and automation, 5(3):269–279, 1989.
  • Feix et al. [2016] Thomas Feix, Javier Romero, Heinz-Bodo Schmiedmayer, Aaron M Dollar, and Danica Kragic. The grasp taxonomy of human grasp types. IEEE Transactions on Human-Machine Systems, 46(1):66–77, 2016.
  • Napier [1956] John R Napier. The prehensile movements of the human hand. Bone & Joint Journal, 38(4):902–913, 1956.
  • Ciocarlie et al. [2007] Matei Ciocarlie, Corey Goldfeder, and Peter Allen. Dimensionality reduction for hand-independent dexterous robotic grasping. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3270–3275. IEEE, 2007.
  • Bullock et al. [2013] Ian M Bullock, Raymond R Ma, and Aaron M Dollar. A hand-centric classification of human and robot dexterous manipulation. IEEE Transactions on Haptics (TOH), 6(2):129–144, 2013.
  • Ciocarlie and Allen [2009] Matei T Ciocarlie and Peter K Allen. Hand posture subspaces for dexterous robotic grasping. The International Journal of Robotics Research, 28(7):851–867, 2009.
  • Grigore et al. [2013] Elena C Grigore, Kerstin Eder, Anthony G Pipe, Chris Melhuish, and Ute Leonards. Joint action understanding improves robot-to-human object handover. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4622–4629. IEEE, 2013.
  • Strabala et al. [2013] Kyle W Strabala, Min K Lee, Anca D Dragan, Jodi L Forlizzi, Siddhartha S Srinivasa, Maya Cakmak, and Vincenzo Micelli. Towards seamless human-robot handovers. Journal of Human-Robot Interaction, 2(1):112–132, 2013.
  • Cakmak et al. [2011] Maya Cakmak, Siddhartha S Srinivasa, Min K Lee, Jodi Forlizzi, and Sara Kiesler. Human preferences for robot-human hand-over configurations. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 1986–1993. IEEE, 2011.
  • Chu et al. [2015] Vivian Chu, Ian McMahon, Lorenzo Riano, Craig G McDonald, Qin He, Jorge Martinez Perez-Tejada, Michael Arrigo, Trevor Darrell, and Katherine J Kuchenbecker. Robotic learning of haptic adjectives through physical interaction. Robotics and Autonomous Systems, 63:279–292, 2015.
  • Lin et al. [2009] Chia-Hsien Lin, Todd W Erickson, Jeremy A Fishel, Nicholas Wettels, and Gerald E Loeb. Signal processing and fabrication of a biomimetic tactile sensor array with thermal, force and microvibration modalities. In IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 129–134. IEEE, 2009.
  • Drimus et al. [2011] Alin Drimus, Gert Kootstra, Arne Bilberg, and Danica Kragic. Classification of rigid and deformable objects using a novel tactile sensor. In ICAR International Conference on Advanced Robotics, pages 427–434, 2011.
  • Jamali and Sammut [2011] Nawid Jamali and Claude Sammut. Majority voting: Material classification by tactile sensing using surface texture. IEEE Transactions on Robotics, 27(3):508–521, 2011.
  • Frank et al. [2010] Barbara Frank, Rüdiger Schmedding, Cyrill Stachniss, Matthias Teschner, and Wolfram Burgard. Learning the elasticity parameters of deformable objects with a manipulation robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1877–1883, 2010.
  • Takamuku et al. [2007] Shinya Takamuku, Gabriel Gomez, Koh Hosoda, and Rolf Pfeifer. Haptic discrimination of material properties by a robotic hand. In IEEE 6th International Conference on Development and Learning (ICDL), pages 1–6, 2007.
  • Kaboli et al. [2014] Mohsen Kaboli, Philipp Mittendorfer, Vincent Hügel, and Gordon Cheng. Humanoids learn object properties from robust tactile feature descriptors via multi-modal artificial skin. In IEEE-RAS International Conference on Humanoid Robots, pages 187–192, 2014.
  • Bhattacharjee et al. [2017] Tapomayukh Bhattacharjee, James M Rehg, and Charles C Kemp. Inferring object properties with a tactile-sensing array given varying joint stiffness and velocity. International Journal of Humanoid Robotics, pages 1–32, 2017.
  • Platt et al. [2011] Robert Platt, Frank Permenter, and Joseph Pfeiffer. Using bayesian filtering to localize flexible materials during manipulation. IEEE Transactions on Robotics, 27(3):586–598, 2011.
  • Hosoda and Iwase [2010] Koh Hosoda and Tomoki Iwase. Robust haptic recognition by anthropomorphic bionic hand through dynamic interaction. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1236–1241, 2010.
  • Schneider et al. [2009] Alexander Schneider, Jürgen Sturm, Cyrill Stachniss, Marco Reisert, Hans Burkhardt, and Wolfram Burgard. Object identification with tactile sensors using bag-of-features. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 243–248, 2009.
  • Allen and Roberts [1989] Peter K Allen and Kenneth S Roberts. Haptic object recognition using a multi-fingered dexterous hand. In IEEE International Conference on Robotics and Automation, pages 342–347, 1989.
  • Coelho et al. [2001] Jefferson Coelho, Justus Piater, and Roderic Grupen. Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot. Robotics and Autonomous Systems, 37(2):195–218, 2001.
  • Heyneman and Cutkosky [2015] Barrett Heyneman and Mark R Cutkosky. Slip classification for dynamic tactile array sensors. The International Journal of Robotics Research, 35(4):404–421, 2015.
  • opt [a] Optitrak markers, a. http://optitrack.com/products/motion-capture-markers/#mcm-12.7-m4-10,[Online; Retrieved on 1st February, 2018].
  • opt [b] Optitrak flex 13 cameras, b. http://optitrack.com/products/flex-13/,[Online; Retrieved on 1st February, 2018].
  • [48] Orbbec astra. https://orbbec3d.com/product-astra/,[Online; Retrieved on 1st February, 2018].
  • Bhattacharjee et al. [2018] T. Bhattacharjee, H. Song, G. Lee, and S. S. Srinivasa. Replication data for: Food manipulation : A cadence of haptic signals, 2018. URL http://dx.doi.org/10.7910/DVN/8TTXZ7.
  • [50] Noun project - icons for everything. http://thenounproject.com,[Online; Retrieved on 1st February, 2018].
  • Hochreiter and Schmidhuber [1997] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • Wiens et al. [2012] Jenna Wiens, Eric Horvitz, and John V Guttag. Patient risk stratification for hospital-associated c. diff as a time-series classification task. In Advances in Neural Information Processing Systems, pages 467–475, 2012.
  • Hoai et al. [2011] Minh Hoai, Zhen-Zhong Lan, and Fernando De la Torre. Joint segmentation and classification of human actions in video. In

    CVPR IEEE Conference on Computer Vision and Pattern Recognition

    , pages 3265–3272. IEEE, 2011.
  • Bagnall et al. [2017] Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 31(3):606–660, 2017.
  • Hsu et al. [2003] Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. A practical guide to support vector classification. 2003.
  • Paszke et al. [2017] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
  • Pedregosa et al. [2011] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al.

    Scikit-learn: Machine learning in Python.

    Journal of Machine Learning Research, 12:2825–2830, 2011.
  • Kadous et al. [2002] Mohammed Waleed Kadous et al. Temporal classification: Extending the classification paradigm to multivariate time series. University of New South Wales, 2002.
  • Rabiner [1990] Lawrence R Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In A. Waibel and K. F. Lee, editors, Readings in Speech Recognition, pages 267–296. Kaufmann, San Mateo, CA, 1990.
  • [60] General hidden markov model library. http://ghmm.org/,[Online; Retrieved on 12th January, 2018].
  • Elliott et al. [2017] Sarah Elliott, Russell Toris, and Maya Cakmak. Efficient programming of manipulation tasks by demonstration and adaptation. In IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 2017.