Robot manipulation of fabric has applications in senior care and dressing assistance [11, 12, 13], sewing , ironing , laundry folding [22, 27, 44, 54], fabric upholstery manufacturing [32, 50], and handling gauze in robotic surgery . However, fabric manipulation is challenging due to its infinite dimensional configuration space and unknown dynamics.
We consider the task of transforming fabric from a rumpled and highly disordered starting configuration to a smooth configuration via a series of grasp and pull actions. We explore a deep imitation learning approach based on a Finite Element Method (FEM) fabric simulator with an algorithmic demonstrator and use DAgger  to train policies. Using color and camera domain randomization [39, 49], learned policies are evaluated in simulation and in physical experiments with the da Vinci Research Kit (dVRK) surgical robot . Figure 1 shows examples of learned trajectories in simulation and the physical robot.
This paper contributes: (1) a novel formulation of fabric smoothing in terms of a sequence of pick and pull actions, (2) a simulation environment for data generation and evaluation of fabric smoothing with three difficulty tiers of initial state complexity in terms of coverage and visible corners, and (3) deep imitation learning of fabric smoothing policies which transfer to physical experiments on a da Vinci surgical robot when considering coverage performance across all tiers using color or depth input images.
Ii Related Work
proposed a method of iteratively re-grasping the lowest hanging point of a fabric to flatten and classify fabrics. Subsequently, Kita et al.[17, 18] used a deformable object model to simulate fabric suspended in the air, allowing the second gripper to grasp at a desired point. Follow-up work generalized to a wider variety of initial configurations of new fabrics. In particular, Maitin-Shepard et al. , Cusumano-Towner et al. , and Doumanoglou et al.  identified and tensioned corners to fold laundry or to bring clothing to desired positions. These methods rely on gravity to reveal corners of the fabric. We consider the setting where a single armed robot adjusts a fabric strewn across a surface without lifting it entirely in midair, which is better suited for larger fabrics or when robots have a limited range of motion.
Reinforcement Learning (RL)  has potential for manipulating deformable objects. In folding, Matas et al.  assumed that fabric is flat, and Balaguer et al.  began with fabric gripped in midair to loosen wrinkles. In contrast, we consider the problem of bringing fabric from a highly rumpled configuration to a flat configuration. Using model-based RL, Ebert et al.  were able to train robots to fold pants and fabric. This approach, however, requires executing a physical robot for many thousands of actions and then training a video prediction model. In surgical robotics, Thananjeyan et al.  used RL to learn a tensioning policy to cut gauze, with one arm pinching at a pick point to let the other arm cut. We focus on cases where the initial fabric state may be highly rumpled and disordered.
In among the most relevant prior on fabric smoothing, Willimon et al.  present an algorithm that pulls at eight fixed angles, and then uses a six-step stage to identify corners from depth images using the Harris Corner Detector . They present experiments on three simulated trials and one physical robot trial. Sun et al.  followed up by attempting to explicitly detect and then pull at wrinkles. They measure wrinkledness as the average absolute deviation in a local pixel region for each point in a depth map of the fabric  and apply a force perpendicular to the largest wrinkle. Sun et al. evaluate on eight fixed, near-flat fabric starting configurations in simulation. In subsequent work, Sun et al.  improved the detection of wrinkles by using a shape classifier as proposed in Koenderink and van Doorn . Each point in the depth map is classified as one of nine shapes, and they use contiguous segments of certain shapes to define a wrinkle. While Sun et al. were able to generalize the method beyond a set of hard-coded starting states, it was only tested on nearly flat fabrics in contrast to the highly rumpled configurations we explore.
This paper extends prior work by Seita et al.  that only estimated a pick point and pre-defined the pull vector. In contrast, we learn the pull vector and pick point simultaneously. Second, by developing a simulator, we generate far more training data, and perform systematic experiments comparing depth and color image inputs.
Iii Problem Statement
Given a deformable fabric and a flat fabric plane, each with the same rectangular dimensions, we consider the task of manipulating the fabric from a start state to a state that maximally covers the fabric plane.
Concretely, let be the full state of the fabric at time with positions of all its points (see Section IV). Let represent the image observation of the fabric at time , where as an image with pixels, and channels for depth images, or for color images. Let be the set of actions the robot may take (see Section IV-A). The objective is coverage , the percentage of the fabric plane covered by .
We frame this as imitation learning [1, 30], where a demonstrator provides data in the form of paired observations and actions . From , the robot’s goal is to learn a policy that maps an observation to an action, and executes sequentially until a coverage threshold or iteration termination threshold is reached.
Iv Fabric and Robot Simulator
We implemented a Finite Element Method (FEM)  fabric simulator and interface with an OpenAI gym environment design . The fabric (Figure 2) is represented as a grid of point masses, connected by three types of springs :
Structural: between a point mass and the point masses to its left and above it.
Shear: between a point mass and the point masses to its diagonal upper left and diagonal upper right.
Flexion: between a point mass and the point masses two away to its left and two above it.
Each point mass is acted upon by both an external gravitational force which is calculated using Newton’s Second Law and a spring correction force
for each of the springs representing the constraints above, where is a spring constant, and are positions of any two point masses connected by a spring, and is the default spring length. We update the point mass positions using Verlet integration . Verlet integration computes a point mass’s new position at time , denoted with , as:
where is the position, is the velocity, is the acceleration from all forces, and is a timestep. Verlet integration approximates where is the position at the last time step, resulting in
The simulator adds damping to simulate loss of energy due to friction, and scales down , leading to the final update:
where is a damping term, which we tuned to 0.02 based on visually inspecting the simulator.
We apply a constraint from Provot  by correcting point mass positions so that spring lengths are at most 10% greater than at any time. We also implement fabric-fabric collisions following  by adding a force to “separate” two points if they are too close.
The simulator provides access to the full fabric state , which contains the exact positions of all points, but does not provide image observations which are more natural and realistic for transfer to physical robots. To obtain image observations of a given fabric state, we create a triangular mesh and render using Blender (https://www.blender.org/
). Blender is open-source software that can render images and simulate lighting and camera positions.
We define an action at time as a 4D vector which includes the pick point represented as the coordinate over the fabric plane to grasp, along with the pull direction. The simulator implements actions by grasping the top layer of the fabric at the pick point. If there is no fabric at , the grasp misses the fabric. After grasping, the simulator pulls the picked point upwards and towards direction and , deltas in the and direction of the fabric plane. In summary, actions are defined as:
representing the pick point coordinates and the pull vector ( relative to the the pick point.
Iv-B Starting State Distributions
The performance of a smoothing policy depends heavily on the distribution of starting fabric states. We randomize the starting state to generate three difficulty tiers, with initial coverage based on 2000 simulations:
Tier 1, % Coverage (High): starting from a flat fabric, we make two short, random pulls to slightly perturb the fabric. All fabric corners remain visible.
Tier 2, % Coverage (Medium): we let the fabric drop from midair on one side of the fabric plane, perform one random grasp and pull across the plane, and then do a second grasp and pull to cover one of the two fabric corners furthest from its plane target.
Tier 3, % Coverage (Low): starting from a flat fabric, we grip at a random pick point and pull high in the air, drag in a random direction, and then drop, usually resulting in one or two corners hidden.
Figure 3 shows examples of color and depth images of fabric initial states in simulation and real physical settings for all three tiers of difficulty. The supplementary material contains additional examples.
V Baseline Policies
We propose five baseline policies for fabric smoothing.
As a naive baseline, we test a random policy that uniformly selects random pick points and pull directions.
V-2 Highest (Max )
This policy, tested in Seita et al.  grasps the highest point on the fabric. We get the pick point by determining , the highest of the points from . To compute the pull vector, we obtain the target coordinates by considering where ’s coordinates would be if the fabric is perfectly flat. The pull vector is then the vector from ’s current position to that target.
Sun et al. 
propose a two-stage algorithm to first identify wrinkles and then to derive a force parallel to the fabric plane to flatten the largest wrinkle. The process repeats for subsequent wrinkles. We implement this method by finding the point in the fabric of largest local height variance. Then, we find the neighboring point with the next largest height variance, treat the vector between the two points as the wrinkle, and pull perpendicular to it.
This policy uses complete state information from to find the fabric corner furthest from its fabric plane target, and pulls it towards that target. When a corner is occluded and underneath a fabric layer, this policy will grasp the point directly above it on the uppermost fabric layer, and the resulting pull usually decreases coverage.
When a fabric corner is occluded, and other fabric corners are not at their targets, this policy picks above the hidden corner, but pulls away from the fabric plane target to reveal the corner for a subsequent action.
|1||Random||25.0 +/- 14.6||2.43 +/- 2.2|
|1||Highest||66.2 +/- 25.1||8.21 +/- 3.2|
|1||Wrinkle||91.3 +/- 7.1||5.40 +/- 3.7|
|1||Oracle||95.7 +/- 2.1||1.76 +/- 0.8|
|1||Oracle-Expose||95.7 +/- 2.2||1.77 +/- 0.8|
|2||Random||22.3 +/- 12.7||3.00 +/- 2.5|
|2||Highest||57.3 +/- 13.0||9.97 +/- 0.3|
|2||Wrinkle||87.0 +/- 10.8||7.64 +/- 2.8|
|2||Oracle||94.5 +/- 5.4||4.01 +/- 2.0|
|2||Oracle-Expose||94.6 +/- 5.0||4.07 +/- 2.2|
|3||Random||20.6 +/- 12.3||3.78 +/- 2.8|
|3||Highest||36.3 +/- 16.3||7.89 +/- 3.2|
|3||Wrinkle||73.6 +/- 19.0||8.94 +/- 2.0|
|3||Oracle||95.1 +/- 2.3||4.63 +/- 1.1|
|3||Oracle-Expose||95.1 +/- 2.2||4.70 +/- 1.1|
Vi Simulation Results for Baseline Policies
We evaluate baseline fabric smoothing policies by running each for 2000 trajectories in simulation. Each trajectory draws a randomized fabric starting state from one of three difficulty tiers (Section IV-B), and lasts for a maximum of 10 actions. Trajectories can terminate earlier under two conditions: (1) if a pre-defined coverage threshold is obtained, or (2) the fabric is out of bounds over a certain threshold. For (1) we use 92% as the threshold, which produces visually smooth fabric (e.g., see the last image in Figure 4) and avoids demonstrator data being dominated by taking actions of short magnitudes at the end of trajectories. For (2) we define a fabric as out of bounds if it has any point which lies at least 25% beyond the fabric plane relative to the full distance of the edge of the plane. This threshold allows the fabric to go slightly off the fabric plane, though we do not allow a pick point to lie outside the fabric plane.
Table I indicates that both oracle policies attain nearly identical performance and have the highest coverage among the baseline policies, with about 95% across all tiers. The wrinkles policy is the next best policy in simulation, with 91.3%, 87.0%, and 73.6% final coverage for the three respective tiers, but requires substantially more actions per trajectory.
One reason why the oracle policy still performs well with occluded corners is that the resulting pulls can move those corners closer to their fabric plane targets, making it easier for subsequent actions to increase coverage. Figure 4 shows an example trajectory from the oracle policy on a tier 3 starting state. The second action pulls at the top layer of the fabric above the corner, but the resulting action still moves the occluded corner closer to its target.
Vii Imitation Learning with DAgger
We use the oracle (not oracle-expose) policy to generate demonstrations and corrective labels. For each tier, we generate 2000 trajectories from the demonstrator and use that as offline data. We train a fabric smoothing policy in simulation using imitation learning on synthetic images. When behavior cloning [34, 33] on demonstrator data, the robot’s policy will learn the demonstrator’s actions on states in the training data, but generalize poorly outside the data distribution . To address this, we use Dataset Aggregation (DAgger) , which requests the demonstrator to label the states the robot encounters when running its learned policy. A limitation of DAgger is the need for continued access to the demonstrator’s policy, rather than just offline data. The oracle corner-pulling demonstrator is cheap to query, so in practice this does not cause problems.
Vii-a Policy Training Procedure
The imitation learning code uses OpenAI baselines  to make use of its parallel environment support. We run the fabric simulator in ten parallel environments, which helps to alleviate the major time bottleneck when training, and pool together samples in a shared dataset.
We use domain randomization  during training. For color images, we randomize the fabric color by selecting RGB values uniformly at random across intervals that include shades of blue, purple, pink, red, and gray. We also vary the shading of the fabric plane. For both color and depth images, we randomize the image brightness with gamma corrections 
, and randomize the camera pose with independent Gaussian distributions for each of the position and orientation components.
We first train with a “behavior cloning (BC) phase” where we minimized the
error on the offline demonstrator data, and then use a “DAgger phase” which rolls out the agent’s policy and applies DAgger. We used 500 epochs of behavior cloning based on when the network’serror roughly converged on a held-out validation dataset. Further training details are in the supplementary material.
Vii-B Simulation Experiments
For all simulated training runs, we evaluate on 50 new tier-specific starting states that were not seen during training. Figure 5 shows results across all tiers, suggesting that after behavior cloning, DAgger improves final coverage performance by 6.1% (averaging over six runs). In addition, color policies attain better coverage in simulation than depth policies with gains of 10.8%, 8.3%, and 10.9% across respective tiers, which may be due to high color contrast between the fabric and fabric plane in the color images, as opposed to the depth images (see Figure 3).
In all difficulty tiers, the color policies get higher final coverage performance than the wrinkles policy (from Table I): 94.8% over 91.3%, 89.6% over 87.0%, and 91.2% over 73.6%, respectively, and gets close to the corner pulling demonstrator despite only having access to image observations. The depth policies outperform the wrinkles policy only on tier 3, with 80.3% versus 73.6% coverage.
Viii Physical Experiments
The da Vinci Research Kit (dVRK) surgical robot  is a cable-driven surgical robot with imprecision as reviewed in prior work [24, 43]. We use a single arm with an end effector that can be opened to 75°, or a gripper width of 10mm. We set a fabric plane at a height and location that allows the end-effector to reach all points on it. To prevent potential damage to the grippers, the fabric plane is foam rubber, which allows us to liberally set the gripper height to be lower and avoids a source of height error present in . For the fabric, we cut a 5x5 inch piece from a Zwipes 735 Microfiber Towel Cleaning Cloth with a blue color within the distribution of domain randomized fabric colors. We mount a Zivid One Plus RGBD camera 0.9 meters above the workspace, which is used to obtain color and depth images.
Viii-a Physical Experiment Protocol
We manually create starting fabric states similar to those in simulation for all tiers. Given a starting fabric, we randomly run one of the color or depth policies for one trajectory for at most 10 steps (as in simulation). Then, to make comparisons fair, we “reset” the fabric to be close to its starting state, and ran the other policy.
During preliminary trials, the dVRK gripper would sometimes miss the fabric by 1-2 mm, which is within the calibration error. To counter this, we measure structural similarity  of the image before and after an action to check if the robot moved the fabric. If it did not, the next action is adjusted to be closer to the center of the fabric plane, and the process repeats until the robot touches fabric.
Viii-B Physical Experiment Results
|(1) Start||(2) Final||(3) Max||(4) Actions|
|T1 C||78.4 +/- 4.4||96.2 +/- 2.3||96.2 +/- 2.3||1.8 +/- 1.4|
|T1 D||77.9 +/- 3.6||78.8 +/- 23.6||90.0 +/- 9.5||5.5 +/- 4.1|
|T2 C||58.5 +/- 5.9||87.7 +/- 13.3||92.7 +/- 4.4||6.3 +/- 3.2|
|T2 D||58.7 +/- 5.3||64.9 +/- 19.7||85.7 +/- 8.0||8.3 +/- 3.1|
|T3 C||46.2 +/- 3.7||75.0 +/- 17.9||79.9 +/- 13.5||8.7 +/- 2.0|
|T3 D||47.0 +/- 3.4||63.2 +/- 9.1||74.7 +/- 9.6||10.0 +/- 0.0|
We run 20 trajectories for each combination of input modality (color or depth) and tiers, resulting in 120 total as shown in Table II. We report starting coverage, ending coverage, maximum coverage across the trajectory after the initial state, and the number of actions. The maximum coverage allows for a more nuanced understanding of performance, because policies can take strong initial actions that achieve high coverage (e.g., above 80%) but a single counter-productive action at the end can substantially lower coverage.
Results suggest that, despite not being trained on real images, the learned policies can smooth fabric in the physical world. All policies improve over the starting coverage across all tiers, for both color and depth policies. Final coverage averaged across all tiers is 86.3% and 69.0% for color and depth, respectively, with net coverage gains of 25.2% and 7.8% over starting coverage. In addition, the color policy deployed on tier 1 starting states was able to hit the 92% coverage threshold 20 out of 20 times.
Qualitatively, the color-trained policy is effective at “fine-tuning” by taking several short pulls to trigger at least 92% coverage. For example, Figure 6 shows a trajectory taken by a color policy trained on tier 3 starting states, where it is able to smooth the highly wrinkled fabric in seven actions.
The depth policies do not perform as well, but this is in large part because the depth policy sometimes takes counterproductive actions after several reasonable actions. Depth policies may have lower performance due to uneven texture on the fabric we use, which is difficult to replicate in simulation.
Viii-C Experiments With Yellow Fabrics
|(1) Start||(2) Final||(3) Max||(4) Actions|
|T1 C||81.5 +/- 3.6||71.7 +/- 25.2||89.6 +/- 6.2||7.6 +/- 2.9|
|T1 D||83.1 +/- 2.9||85.9 +/- 15.3||91.9 +/- 5.3||4.6 +/- 4.4|
To further test color versus depth policies, we used the same two policies trained on tier 1 starting states and deployed them on yellow fabric. The color distribution “covered” by domain randomization included shades of blue, purple, pink, red, and gray, but not yellow. We recreated five starting fabric conditions where, with a blue fabric, the color policy attained at least 92% coverage in just one action.
Results in Table III indicate poor performance from the color policy, as coverage decreases from 81.5% to 71.7%. Only two out of five trajectories resulted in at least 92% coverage. We observed behavior shown in Figure 7 where the policy fails to pick at a corner or to pull in the correct direction. The depth policy is invariant to colors, and is able to achieve higher ending coverage of 85.9%. This is higher than the 78.8% coverage reported in Table II due to relatively easier starting states.
Viii-D Failure Cases
The policies, particularly those trained with depth images, are susceptible to pulling near the center of the fabric for fabrics that are already nearly smooth, as shown in Figure 8. This results in poor coverage and may lead to cascading errors (as in Figure 7). One cause may be that there are several fabric corners that are equally far from their targets, which creates ambiguity in which corner should be pulled. It may be worthwhile to formulate corner picking with a mixture model to resolve this ambiguity.
Ix Conclusion and Future Work
We investigate baseline and learned policies for fabric smoothing. Using a low fidelity fabric simulator and a custom environment, we train policies in simulation using DAgger with a corner pulling demonstrator. We use domain randomization to transfer policies to a surgical robot. When testing on fabric of similar color to that used in training, color-based policies achieve higher coverage than depth-based policies, but depth could be more valuable in practice for unseen colors.
In future work, we will test on fabric shapes and configurations where corner pulling policies may get poor coverage. We plan to apply deep reinforcement learning, using the simulation environment for color and depth images with DDPG  and other state-of-the-art RL methods to potentially learn richer policies that can explicitly reason over multiple time steps and varying geometries. We will utilize higher-fidelity fabric simulators such as ARCSim . Finally, we would like to extend the method beyond fabric coverage to tasks such as folding and wrapping, and will apply it to ropes, strings, and other deformable objects.
This research was performed at the AUTOLAB at UC Berkeley in affiliation with Honda Research Institute USA, the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS “People and Robots” (CPAR) Initiative, and by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, NSF National Robotics Initiative Award 1734633. The authors were supported in part by Siemens, Google, Amazon Robotics, Toyota Research Institute, Autodesk, ABB, Samsung, Knapp, Loccioni, Intel, Comcast, Cisco, Hewlett-Packard, PhotoNeo, NVidia, and Intuitive Surgical. Daniel Seita is supported by a National Physical Science Consortium Fellowship. We thank Jackson Chui, Michael Danielczuk, Shivin Devgon, and Mark Theis.
-  (2009) A Survey of Robot Learning From Demonstration. Robotics and Autonomous Systems 57. Cited by: §III.
-  (2011) Combining Imitation and Reinforcement Learning to Fold Deformable Planar Objects. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
-  (1998) Large Steps in Cloth Simulation. In ACM SIGGRAPH, Cited by: §IV.
-  (2006) Finite Element Procedures. Prentice Hall. External Links: Cited by: §IV.
-  (2019) A Grasping-centered Analysis for Cloth Manipulation. arXiv:1906.08202. Cited by: §II.
-  (2016) OpenAI Gym. External Links: Cited by: §A-A, §IV.
-  (2011) Bringing Clothing Into Desired Configurations with Limited Perception. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
-  (2017) OpenAI Baselines. GitHub. Note: https://github.com/openai/baselines Cited by: §VII-A.
-  (2014) Autonomous Active Recognition and Unfolding of Clothes Using Random Decision Forests and Probabilistic Planning. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
-  (2018) Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control. arXiv:1812.00568. Cited by: §II.
-  (2018) Deep Haptic Model Predictive Control for Robot-Assisted Dressing. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2018) Tracking Human Pose During Robot-Assisted Dressing using Single-Axis Capacitive Proximity Sensing. In IEEE Robotics and Automation Letters (RA-L), Cited by: §I.
-  (2016) Iterative Path Optimisation for Personalised Dressing Assistance using Vision and Force Information. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (1988) A Combined Corner and Edge Detector. In In Proceedings of the Fourth Alvey Vision Conference, Cited by: §II.
-  (2014) An Open-Source Research Kit for the da Vinci Surgical System. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I, §VIII.
-  (2015) Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR), Cited by: §C-A.
-  (2009) A Method For Handling a Specific Part of Clothing by Dual Arms. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
-  (2009) Clothes State Recognition Using 3D Observed Data. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
-  (1992) Surface shape and curvature scales. Image and vision computing 10 (8), pp. 557–564. Cited by: §II.
-  (2017) DART: Noise Injection for Robust Imitation Learning. In Conference on Robot Learning (CoRL), Cited by: §VII.
-  (2016) Multi-Sensor Surface Analysis for Robotic Ironing. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2015) Folding Deformable Objects using Predictive Simulation and Trajectory Optimization. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (2016) Continuous Control with Deep Reinforcement Learning. In International Conference on Learning Representations (ICLR), Cited by: §IX.
-  (2014) Learning Accurate Kinematic Control of Cable-Driven Surgical Robots Using Data Cleaning and Gaussian Process Regression.. In IEEE Conference on Automation Science and Engineering (CASE), Cited by: §VIII.
-  (2010) Cloth Grasp Point Detection Based on Multiple-View Geometric Cues with Application to Robotic Towel Folding. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
-  (2018) Sim-to-Real Reinforcement Learning for Deformable Object Manipulation. Conference on Robot Learning (CoRL). Cited by: §C-A, §II, §VIII.
-  (2012) A Geometric Approach to Robotic Laundry Folding. In International Journal of Robotics Research (IJRR), Cited by: §I.
-  (2012) Adaptive Anisotropic Remeshing for Cloth Simulation. In ACM SIGGRAPH Asia, Cited by: Appendix A, §IX.
-  (2018) Learning Dexterous In-Hand Manipulation. arXiv:1808.00177. Cited by: §C-B.
-  (2018) An Algorithmic Perspective on Imitation Learning. Foundations and Trends in Robotics 7. Cited by: §III.
-  (2007) Unfolding of Massive Laundry and Classification Types by Dual Manipulator. Journal of Advanced Computational Intelligence and Intelligent Informatics 11 (5). Cited by: §II.
-  (1983) Robotic Fabric Handling for Automating Garment Manufacturing. Journal of Manufacturing Science and Engineering 105. Cited by: §I.
Efficient Training of Artificial Neural Networks for Autonomous Navigation. Neural Comput. 3. Cited by: §VII.
-  (1989) Alvinn: An Autonomous Land Vehicle in a Neural Network. Technical report Carnegie-Mellon University. Cited by: §VII.
-  (2003) Digital video and hdtv algorithms and interfaces. 1 edition, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. External Links: Cited by: §VII-A.
-  (1995) Deformation Constraints in a Mass-Spring Model to Describe Rigid Cloth Behavior. In Graphics Interface, Cited by: §IV, §IV.
-  (2012) Using Depth and Appearance Features for Informed Robot Grasping of Highly Wrinkled Clothes. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.
International Conference on Artificial Intelligence and Statistics (AISTATS), Cited by: §I, §VII.
-  (2017) CAD2RL: Real Single-Image Flight without a Single Real Image. In Robotics: Science and Systems (RSS), Cited by: §I.
-  (2018) Robotic Manipulation and Sensing of Deformable Objects in Domestic and Industrial Applications: a Survey. In International Journal of Robotics Research (IJRR), Cited by: §II.
-  (2012) Experiments Towards Automated Sewing With a Multi-Robot System. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
Deep Transfer Learning of Pick Points on Fabric for Robot Bed-Making. In International Symposium on Robotics Research (ISRR), Cited by: §A-B, §II, §V-2.
-  (2018) Fast and Reliable Autonomous Surgical Debridement with Cable-Driven Robots Using a Two-Phase Calibration Procedure. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §VIII.
-  (2012) A Trajectory Generation of Cloth Object Folding Motion Toward Realization of Housekeeping Robot. In International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Cited by: §I.
A Heuristic-Based Approach for Flattening Wrinkled Clothes. Towards Autonomous Robotic Systems. TAROS 2013. Lecture Notes in Computer Science, vol 8069. Cited by: §B-B, §B-B, §II, §V-3.
-  (2015) Accurate Garment Surface Analysis using an Active Stereo Robot Head with Application to Dual-Arm Flattening. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
-  (2018) Introduction to Reinforcement Learning. 2nd edition, MIT Press, Cambridge, MA, USA. Cited by: §II.
-  (2017) Multilateral Surgical Pattern Cutting in 2D Orthotropic Gauze with Deep Reinforcement Learning Policies for Tensioning. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I, §II.
-  (2017) Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §C-B, §I, §VII-A.
-  (1987) Vision Guided Robotic Fabric Manipulation for Apparel Manufacturing. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (1967) Computer Experiments on Classical Fluids: I. Thermodynamical Properties of Lennard−Jones Molecules. Physics Review 159 (98). Cited by: §IV.
-  (2004-04) Image Quality Assessment: From Error Visibility to Structural Similarity. Trans. Img. Proc.. Cited by: §VIII-A.
-  (2011) Model for Unfolding Laundry using Interactive Perception. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning. In IEEE Robotics and Automation Letters (RA-L), Cited by: §I.
Appendix A Fabric Environment
The fabric simulator is implemented in Python with Cython for increased speed. The simulator is low fidelity compared to more accurate simulators such as ARCSim 
, but has the advantage of being easier to adapt to the smoothing task we consider. Some relevant hyperparameters are shown in TableIV.
The fabric smoothing environment is implemented with a standard OpenAI gym  interface. Each action is broken up into six stages: (1) a grasp, (2) a pull up, (3) a pause, (4) a linear pull towards a target, (5) a second pause, and (6) a drop. Steps (2) through (6) involve some number of iterations, where each iteration changes the coordinates of “pinned” points on and then calls one “update” method for the fabric simulator to adjust the other, non-pinned points.
We implement a grasp by first simulating a gripper moving downwards from a height higher than the highest fabric point, which simulates grasping only the top layer of the fabric. At a given height , to decide which points in are grasped given pick point , we use a small sphere centered at with a radius of 0.003 units, where units are scaled so 1 represents the length of a side of the fabric plane. If no points are gripped, then we lower until a point is within the radius. In practice, this means usually 2-5 out of the points on the cloth are grasped for a given pick point. Once any of the fabric’s points are within the gripper’s radius, those points are considered fixed, or “pinned”.
A-A2 Pull Up
For 50 iterations, the pull adjusts the -coordinate of any pinned point by units, and keeps their and coordinates fixed. In practice, tuning is important. If it is too low, an excessive amount of fabric-fabric collisions can happen, but if it is too high, then coverage substantially decreases. In future work, we will consider dynamically adjusting the height change so that it is lower if current fabric coverage is high.
A-A3 First Pause
For 80 iterations, the simulator keeps the pinned points fixed, and lets the non-pinned points settle.
A-A4 Linear Pull to Target
To implement the pull, we adjust the and coordinates of all pinned points by a small amount each time step (leaving their coordinates fixed), in accordance with the deltas in the action. The simulator updates the position of the non-pinned points based on the implemented physics model. This step is run for a variable amount of iterations based on the pull length.
A-A5 Second Pause
For 300 iterations, the simulator keeps the pinned points fixed, and lets the non-pinned points settle. This period is longer than the first pause because normally more non-pinned points are moving after a linear pull to the target compared to a pull upwards.
Finally, the pinned points are “un-pinned” and thus are allowed to lower due to gravity. For 1000 iterations, the simulator lets the entire cloth settle and stabilize for the next action.
A-B Starting State Distributions
We provide details on how we generate starting states from the three distributions we use (see Section IV-B).
Tier 1. We perform a sequence of two pulls with pick point randomly chosen on the fabric, pull direction randomly chosen, and pull length constrained to be short (about 10-20% of the length of the fabric plane). If coverage remains above 90% after these two pulls we perform a third short (random) pull.
Tier 2. For this tier only, we initialize the fabric in a vertical orientation with tiny noise in the direction perpendicular to the plane of the fabric. Thus the first action in Tier 2 initialization is a vertical drop over one of two edges of the plane (randomly chosen). We then randomly pick one of the two corners at the top of the dropped fabric and drag it approximately toward the center for about half of the length of the plane. Finally we grip a nearby point and drag it over the exposed corner in an attempt to occlude it, again pulling for about half the length of the plane.
Tier 3. Initialization consists of just one high pull. We choose a random pick point, lift it about 4-5 times as high as compared to a normal action, pull in a random direction for 10-25% of the length of the plane, and let the fabric fall, which usually creates less coverage and occluded fabric corners.
The process induces a distribution over starting states, so the agent never sees the same starting fabric state.
These starting state distributions do not generally produce a setting when we have a single corner fold that is visible on top of the fabric, as that case was frequently shown and reasonably approached in prior work .
Appendix B Details on Baseline Policies
We describe the implementation of the analytic methods from Section V in more detail.
B-a Highest (Max )
We implement this method by using the underlying state representation of the fabric , and not the images . For each , we iterate through all fabric point masses and obtain the one with the highest z-coordinate value. This provides the pick point. For the pull vector, we deduce it from the location of the fabric plane where it would be located if the fabric were perfectly flat.
To avoid potentially getting stuck repeatedly pulling the same point (which happens if the pull length is very short and the fabric ends up “resetting” to the prior state), we select the five highest points on the fabric, and then randomize the one to pick.
The implementation approximates the method in Sun et al. ; implementing the full algorithm is difficult due to its complex, multi-stage nature and the lack of open source code. Their wrinkle detection method involves computing variance in height in small neighborhoods around each pixel in the observation image ,
-means clustering on the pixels with average variance above a certain threshold value, and hierarchical clustering on the clusters found by-means to obtain the largest wrinkle. We approximate their wrinkle detection method by isolating the point of largest local variance in height using . Empirically, this is accurate at selecting the largest wrinkle. To estimate wrinkle direction, we find the neighboring point with the next largest variance in height.
We then pull perpendicular to the wrinkle direction. Where Sun et al.  constrains the perpendicular angle to be one of eight cardinal directions (north, northeast, east, southeast, south, southwest, west, northwest), we find the exact perpendicular line and its two intersections with the edges of the fabric. We choose the closer of these two as the pick point and pull to the edge of the plane.
For the oracle policy, we assume we can query the four corners of the fabric and know their coordinates. Since we know the four corners, we know which of the fabric plane corners (i.e., the targets) to pull to. We pick the pull based on whichever fabric corner is furthest from its target.
B-D Oracle Expose
The oracle expose policy is an extension to the oracle policy. In addition to knowing the exact position of all corners, the policy is also aware of the occlusion state of all corners. The occlusion state is a 4D boolean vector which indicates a 1 if a given corner is visible to the camera from the top-down view or a 0 if it is occluded. The oracle expose policy will try to solve all visible corners in a similar manner to the oracle policy using its extended state knowledge. If all four corners are occluded, or all visible corners are within a threshold of their target positions, the oracle expose policy will perform a revealing action on an occluded corner. We implement the revealing action as a fixed length pull 180 degrees away from the angle to the target position. This process is repeated until the threshold coverage is achieved.
Appendix C Details on Imitation Learning
C-a DAgger Pipeline
|Stepps per env, between gradient updates||20|
|Gradient updates after parallel steps||240|
|Demonstrator (offline) trajectories||2000|
|Policy learning rate||1e-4|
|Policy regularization parameter||1e-5|
|Behavior Cloning epochs||500|
|DAgger steps after Behavior Cloning||50000|
We collected demonstrations by running the oracle corner policy (not oracle expose) for 2000 trajectories for each of the three starting state tiers. We then run behavior cloning on this offline data for 500 epochs before running DAgger.
Each DAgger “iteration” rolls out 10 parallel environments for 20 steps each (hence, 200 total new samples) which are labeled by the oracle corner policy. These are added to a growing dataset of samples which includes the demonstrator’s original offline data. After 20 steps per parallel environment, we draw 240 minibatches of size 128 each for training. Then the process repeats with the agent rolling out its new policy. DAgger hyperparameters are in Table V. In practice, the regularization for the policy impacted performance significantly. We use 1e-5 and saw poor performance with 1e-3 and 1e-4. The total number of DAgger steps was limited to 50,000 due to compute and time limitations; training for substantially more steps is likely to yield further improvements.
The policy neural network architecture is similar to the one in Matas et al.  with four convolutional layers, each with 32 filters of size , followed by dense layers of size 256 each, for a total of 3.44 million parameters. The parameters, in more detail, are (ignoring biases for simplicity):
policy/convnet/c1 864 params (3, 3, 3, 32) policy/convnet/c2 9216 params (3, 3, 32, 32) policy/convnet/c3 9216 params (3, 3, 32, 32) policy/convnet/c4 9216 params (3, 3, 32, 32) policy/fcnet/fc1 3276800 params (12800, 256) policy/fcnet/fc2 65536 params (256, 256) policy/fcnet/fc3 65536 params (256, 256) policy/fcnet/fc4 1024 params (256, 4) Total model parameters: 3.44 million
As input, the policy consumes images of size , and produces a 4D vector with a hyperbolic tangent applied to make components within . We optimize using Adam  with learning rate and use regularization of .
C-B Domain Randomization and Simulated Images
To transfer a policy to a physical robot, we use domain randomization  during training. We randomize fabric colors, the shading of the fabric plane, and camera pose. We do not randomize the simulator’s parameters as done in OpenAI et al.  and leave this to future work.
For color images only, we apply color randomization. The cloth background and foreground colors are set at default RGB values of and , respectively, creating a default blue color. With domain randomization, we create a random noise vector of size three where each component is independently drawn from and then add it to both the background and foreground colors. Empirically, this creates images of various shades “centered” at the default blue value.
For both color and depth images, we apply camera pose randomization, with Gaussian noise added independently to the six components of the pose (three for position using meters, three for orientation using degrees). Gaussians are drawn centered at zero with standard deviation 0.04 for positions and 0.9 for degrees.
After Blender produces the image, we next adjust the brightness via OpenCV gamma corrections111https://www.pyimagesearch.com/2015/10/05/opencv-gamma-correction/, with separately tuned values for color and depth images, and with representing no brightness change. We draw for depth images (to make the images darker to match physical images) and draw for color images.
Only after the above are applied, do we then independently add uniform noise to each pixel. For each full image with pixel values between 0 and 255, we draw a uniform random variablebetween -15 and 15. We then draw additive noise for each pixel independently.
Appendix D Experiment Setup Details
D-a Image Processing Pipeline
The original color and depth images come from the mounted Zivid One Plus RGBD camera, and are processed in the following ordering:
For depth images only, we apply in-painting to fill in missing values (represented as “NaN”s) in depth images based on surrounding pixel values.
Color and depth images are then cropped to be images that allow the entire fabric plane to be visible, along with some extra background area.
For depth images only, we clip values to be within a minimum and maximum depth range, tuned to provide depth images that looked reasonably similar to ones processed in simulation. We convert images to three channels by triplicating values across the channels. We then scale pixel values to be within and apply the OpenCV equalize histogram function for all three channels.
For depth and color images, we apply bilateral filtering and then de-noising, both implemented using OpenCV functions. These help smooth the uneven fabric texture without sacrificing cues from the corners.
D-B Physical Experiment Setup and Procedures
To map from neural network output to a position with respect to the robot’s frame, we calibrate the positions by using a checkerboard on top of the fabric plane. We move the robot’s end effectors with the gripper facing down to each corner of the checkerboard and record positions. During deployment, for a given coordinate frame, we perform bilinear interpolation to figure out the robot position from the four surrounding known points. After calibration, the robot reached positions on the fabric plane to 1-2 mm of error. Figure9 shows a visualization of the heuristic we employ to get the robot to grasp fabric when it originally misses by 1-2mm.