Deep Imitation Learning of Sequential Fabric Smoothing Policies

by   Daniel Seita, et al.
berkeley college

Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color or depth images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic demonstrator that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of color vs. depth images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 120 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, policies trained in simulation attain 86 coverage for color and depth inputs, respectively, suggesting the feasibility of learning fabric smoothing policies from simulation. Supplementary material is available at fabric-smoothing.




VisuoSpatial Foresight for Physical Sequential Fabric Manipulation

Robotic fabric manipulation has applications in home robotics, textiles,...

Learning to Smooth and Fold Real Fabric Using Dense Object Descriptors Trained on Synthetic Color Images

Robotic fabric manipulation is challenging due to the infinite dimension...

Robot Bed-Making: Deep Transfer Learning Using Depth Sensing of Deformable Fabric

Bed-making is a common task well-suited for home robots since it is tole...

VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation

Robotic fabric manipulation has applications in cloth and cable manageme...

Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation

In mobile manipulation (MM), robots can both navigate within and interac...

Learning Robust Bed Making using Deep Imitation Learning with DART

Bed-making is a universal home task that can be challenging for senior c...

Simitate: A Hybrid Imitation Learning Benchmark

We present Simitate --- a hybrid benchmarking suite targeting the evalua...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Robot manipulation of fabric has applications in senior care and dressing assistance [11, 12, 13], sewing [41], ironing [21], laundry folding [22, 27, 44, 54], fabric upholstery manufacturing [32, 50], and handling gauze in robotic surgery [48]. However, fabric manipulation is challenging due to its infinite dimensional configuration space and unknown dynamics.

We consider the task of transforming fabric from a rumpled and highly disordered starting configuration to a smooth configuration via a series of grasp and pull actions. We explore a deep imitation learning approach based on a Finite Element Method (FEM) fabric simulator with an algorithmic demonstrator and use DAgger [38] to train policies. Using color and camera domain randomization [39, 49], learned policies are evaluated in simulation and in physical experiments with the da Vinci Research Kit (dVRK) surgical robot [15]. Figure 1 shows examples of learned trajectories in simulation and the physical robot.

Fig. 1: Learned policies executed in simulation and with a physical da Vinci surgical robot. Policies are learned in simulation using DAgger with an algorithmic demonstrator with full state information, using structured domain randomization with color or depth images. The 4-step trajectory in simulation (top) increases coverage from 43% to 95%. The 7-step trajectory on the physical da Vinci robot (bottom) increases coverage from 49% to 92%.

This paper contributes: (1) a novel formulation of fabric smoothing in terms of a sequence of pick and pull actions, (2) a simulation environment for data generation and evaluation of fabric smoothing with three difficulty tiers of initial state complexity in terms of coverage and visible corners, and (3) deep imitation learning of fabric smoothing policies which transfer to physical experiments on a da Vinci surgical robot when considering coverage performance across all tiers using color or depth input images.

Ii Related Work

Well-known research on robotic fabric manipulation [5, 40] uses bilateral robots and gravity to expose corners. Osawa et al. [31]

proposed a method of iteratively re-grasping the lowest hanging point of a fabric to flatten and classify fabrics. Subsequently, Kita et al. 

[17, 18] used a deformable object model to simulate fabric suspended in the air, allowing the second gripper to grasp at a desired point. Follow-up work generalized to a wider variety of initial configurations of new fabrics. In particular, Maitin-Shepard et al. [25], Cusumano-Towner et al. [7], and Doumanoglou et al. [9] identified and tensioned corners to fold laundry or to bring clothing to desired positions. These methods rely on gravity to reveal corners of the fabric. We consider the setting where a single armed robot adjusts a fabric strewn across a surface without lifting it entirely in midair, which is better suited for larger fabrics or when robots have a limited range of motion.

Reinforcement Learning (RL) [47] has potential for manipulating deformable objects. In folding, Matas et al. [26] assumed that fabric is flat, and Balaguer et al. [2] began with fabric gripped in midair to loosen wrinkles. In contrast, we consider the problem of bringing fabric from a highly rumpled configuration to a flat configuration. Using model-based RL, Ebert et al. [10] were able to train robots to fold pants and fabric. This approach, however, requires executing a physical robot for many thousands of actions and then training a video prediction model. In surgical robotics, Thananjeyan et al. [48] used RL to learn a tensioning policy to cut gauze, with one arm pinching at a pick point to let the other arm cut. We focus on cases where the initial fabric state may be highly rumpled and disordered.

In among the most relevant prior on fabric smoothing, Willimon et al. [53] present an algorithm that pulls at eight fixed angles, and then uses a six-step stage to identify corners from depth images using the Harris Corner Detector [14]. They present experiments on three simulated trials and one physical robot trial. Sun et al. [45] followed up by attempting to explicitly detect and then pull at wrinkles. They measure wrinkledness as the average absolute deviation in a local pixel region for each point in a depth map of the fabric [37] and apply a force perpendicular to the largest wrinkle. Sun et al. evaluate on eight fixed, near-flat fabric starting configurations in simulation. In subsequent work, Sun et al. [46] improved the detection of wrinkles by using a shape classifier as proposed in Koenderink and van Doorn [19]. Each point in the depth map is classified as one of nine shapes, and they use contiguous segments of certain shapes to define a wrinkle. While Sun et al. were able to generalize the method beyond a set of hard-coded starting states, it was only tested on nearly flat fabrics in contrast to the highly rumpled configurations we explore.

This paper extends prior work by Seita et al. [42] that only estimated a pick point and pre-defined the pull vector. In contrast, we learn the pull vector and pick point simultaneously. Second, by developing a simulator, we generate far more training data, and perform systematic experiments comparing depth and color image inputs.

Iii Problem Statement

Given a deformable fabric and a flat fabric plane, each with the same rectangular dimensions, we consider the task of manipulating the fabric from a start state to a state that maximally covers the fabric plane.

Concretely, let be the full state of the fabric at time with positions of all its points (see Section IV). Let represent the image observation of the fabric at time , where as an image with pixels, and channels for depth images, or for color images. Let be the set of actions the robot may take (see Section IV-A). The objective is coverage , the percentage of the fabric plane covered by .

We frame this as imitation learning [1, 30], where a demonstrator provides data in the form of paired observations and actions . From , the robot’s goal is to learn a policy that maps an observation to an action, and executes sequentially until a coverage threshold or iteration termination threshold is reached.

Iv Fabric and Robot Simulator

Fig. 2: FEM fabric simulation. Left: a wireframe rendering, showing the grid of points and the spring-mass constraints. Right: the corresponding image with the white fabric plane. The coverage is 73%, measured as the percentage of the fabric plane covered.

We implemented a Finite Element Method (FEM) [4] fabric simulator and interface with an OpenAI gym environment design [6]. The fabric (Figure 2) is represented as a grid of point masses, connected by three types of springs [36]:

  • Structural: between a point mass and the point masses to its left and above it.

  • Shear: between a point mass and the point masses to its diagonal upper left and diagonal upper right.

  • Flexion: between a point mass and the point masses two away to its left and two above it.

Each point mass is acted upon by both an external gravitational force which is calculated using Newton’s Second Law and a spring correction force


for each of the springs representing the constraints above, where is a spring constant, and are positions of any two point masses connected by a spring, and is the default spring length. We update the point mass positions using Verlet integration [51]. Verlet integration computes a point mass’s new position at time , denoted with , as:


where is the position, is the velocity, is the acceleration from all forces, and is a timestep. Verlet integration approximates where is the position at the last time step, resulting in


The simulator adds damping to simulate loss of energy due to friction, and scales down , leading to the final update:


where is a damping term, which we tuned to 0.02 based on visually inspecting the simulator.

We apply a constraint from Provot [36] by correcting point mass positions so that spring lengths are at most 10% greater than at any time. We also implement fabric-fabric collisions following [3] by adding a force to “separate” two points if they are too close.

The simulator provides access to the full fabric state , which contains the exact positions of all points, but does not provide image observations which are more natural and realistic for transfer to physical robots. To obtain image observations of a given fabric state, we create a triangular mesh and render using Blender (

). Blender is open-source software that can render images and simulate lighting and camera positions.

Iv-a Actions

We define an action at time as a 4D vector which includes the pick point represented as the coordinate over the fabric plane to grasp, along with the pull direction. The simulator implements actions by grasping the top layer of the fabric at the pick point. If there is no fabric at , the grasp misses the fabric. After grasping, the simulator pulls the picked point upwards and towards direction and , deltas in the and direction of the fabric plane. In summary, actions are defined as:


representing the pick point coordinates and the pull vector ( relative to the the pick point.

Iv-B Starting State Distributions

Fig. 3: Initial fabric states drawn from the distributions specified in Section IV-B, with tiers grouped by columns. The first two rows show representative simulated color and depth images, respectively, while the last two rows show examples of real images from a mounted Zivid One Plus camera, after smoothing and de-noising.

The performance of a smoothing policy depends heavily on the distribution of starting fabric states. We randomize the starting state to generate three difficulty tiers, with initial coverage based on 2000 simulations:

  • Tier 1, % Coverage (High): starting from a flat fabric, we make two short, random pulls to slightly perturb the fabric. All fabric corners remain visible.

  • Tier 2, % Coverage (Medium): we let the fabric drop from midair on one side of the fabric plane, perform one random grasp and pull across the plane, and then do a second grasp and pull to cover one of the two fabric corners furthest from its plane target.

  • Tier 3, % Coverage (Low): starting from a flat fabric, we grip at a random pick point and pull high in the air, drag in a random direction, and then drop, usually resulting in one or two corners hidden.

Figure 3 shows examples of color and depth images of fabric initial states in simulation and real physical settings for all three tiers of difficulty. The supplementary material contains additional examples.

V Baseline Policies

We propose five baseline policies for fabric smoothing.

V-1 Random

As a naive baseline, we test a random policy that uniformly selects random pick points and pull directions.

V-2 Highest (Max )

This policy, tested in Seita et al. [42] grasps the highest point on the fabric. We get the pick point by determining , the highest of the points from . To compute the pull vector, we obtain the target coordinates by considering where ’s coordinates would be if the fabric is perfectly flat. The pull vector is then the vector from ’s current position to that target.

V-3 Wrinkle

Sun et al. [45]

propose a two-stage algorithm to first identify wrinkles and then to derive a force parallel to the fabric plane to flatten the largest wrinkle. The process repeats for subsequent wrinkles. We implement this method by finding the point in the fabric of largest local height variance. Then, we find the neighboring point with the next largest height variance, treat the vector between the two points as the wrinkle, and pull perpendicular to it.

V-4 Oracle

This policy uses complete state information from to find the fabric corner furthest from its fabric plane target, and pulls it towards that target. When a corner is occluded and underneath a fabric layer, this policy will grasp the point directly above it on the uppermost fabric layer, and the resulting pull usually decreases coverage.

V-5 Oracle-Expose

When a fabric corner is occluded, and other fabric corners are not at their targets, this policy picks above the hidden corner, but pulls away from the fabric plane target to reveal the corner for a subsequent action.

Tier Method Coverage Actions
1 Random 25.0 +/- 14.6 2.43 +/- 2.2
1 Highest 66.2 +/- 25.1 8.21 +/- 3.2
1 Wrinkle 91.3 +/- 7.1 5.40 +/- 3.7
1 Oracle 95.7 +/- 2.1 1.76 +/- 0.8
1 Oracle-Expose 95.7 +/- 2.2 1.77 +/- 0.8
2 Random 22.3 +/- 12.7 3.00 +/- 2.5
2 Highest 57.3 +/- 13.0 9.97 +/- 0.3
2 Wrinkle 87.0 +/- 10.8 7.64 +/- 2.8
2 Oracle 94.5 +/- 5.4 4.01 +/- 2.0
2 Oracle-Expose 94.6 +/- 5.0 4.07 +/- 2.2
3 Random 20.6 +/- 12.3 3.78 +/- 2.8
3 Highest 36.3 +/- 16.3 7.89 +/- 3.2
3 Wrinkle 73.6 +/- 19.0 8.94 +/- 2.0
3 Oracle 95.1 +/- 2.3 4.63 +/- 1.1
3 Oracle-Expose 95.1 +/- 2.2 4.70 +/- 1.1
Table I: Results from the five baseline policies discussed in Section V. We report final coverage and the number of actions per trajectory. All statistics are from 2000 trajectories, with tier-specific starting states. Both oracle policies (in bold) perform the best.
Fig. 4: Example simulated trajectory of the oracle corner policy, from left to right. The policy uses the exact corner location and pulls the one furthest from its target on the white fabric plane. Overlaid circles and arrows represent the action taken after the given state. The starting state (leftmost image) is drawn from Tier 3. In the second action, the fabric corner furthest from the target is slightly underneath the fabric, and the demonstrator pulls at the fabric’s top layer. Nonetheless, the subsequent pull (third image) is then able to reveal that fabric corner. The oracle policy took five actions before triggering the 92% coverage threshold in the rightmost image.

Vi Simulation Results for Baseline Policies

We evaluate baseline fabric smoothing policies by running each for 2000 trajectories in simulation. Each trajectory draws a randomized fabric starting state from one of three difficulty tiers (Section IV-B), and lasts for a maximum of 10 actions. Trajectories can terminate earlier under two conditions: (1) if a pre-defined coverage threshold is obtained, or (2) the fabric is out of bounds over a certain threshold. For (1) we use 92% as the threshold, which produces visually smooth fabric (e.g., see the last image in Figure 4) and avoids demonstrator data being dominated by taking actions of short magnitudes at the end of trajectories. For (2) we define a fabric as out of bounds if it has any point which lies at least 25% beyond the fabric plane relative to the full distance of the edge of the plane. This threshold allows the fabric to go slightly off the fabric plane, though we do not allow a pick point to lie outside the fabric plane.

Table I indicates that both oracle policies attain nearly identical performance and have the highest coverage among the baseline policies, with about 95% across all tiers. The wrinkles policy is the next best policy in simulation, with 91.3%, 87.0%, and 73.6% final coverage for the three respective tiers, but requires substantially more actions per trajectory.

One reason why the oracle policy still performs well with occluded corners is that the resulting pulls can move those corners closer to their fabric plane targets, making it easier for subsequent actions to increase coverage. Figure 4 shows an example trajectory from the oracle policy on a tier 3 starting state. The second action pulls at the top layer of the fabric above the corner, but the resulting action still moves the occluded corner closer to its target.

Vii Imitation Learning with DAgger

We use the oracle (not oracle-expose) policy to generate demonstrations and corrective labels. For each tier, we generate 2000 trajectories from the demonstrator and use that as offline data. We train a fabric smoothing policy in simulation using imitation learning on synthetic images. When behavior cloning [34, 33] on demonstrator data, the robot’s policy will learn the demonstrator’s actions on states in the training data, but generalize poorly outside the data distribution [20]. To address this, we use Dataset Aggregation (DAgger) [38], which requests the demonstrator to label the states the robot encounters when running its learned policy. A limitation of DAgger is the need for continued access to the demonstrator’s policy, rather than just offline data. The oracle corner-pulling demonstrator is cheap to query, so in practice this does not cause problems.

Vii-a Policy Training Procedure

The imitation learning code uses OpenAI baselines [8] to make use of its parallel environment support. We run the fabric simulator in ten parallel environments, which helps to alleviate the major time bottleneck when training, and pool together samples in a shared dataset.

We use domain randomization [49] during training. For color images, we randomize the fabric color by selecting RGB values uniformly at random across intervals that include shades of blue, purple, pink, red, and gray. We also vary the shading of the fabric plane. For both color and depth images, we randomize the image brightness with gamma corrections [35]

, and randomize the camera pose with independent Gaussian distributions for each of the position and orientation components.

We first train with a “behavior cloning (BC) phase” where we minimized the

error on the offline demonstrator data, and then use a “DAgger phase” which rolls out the agent’s policy and applies DAgger. We used 500 epochs of behavior cloning based on when the network’s

error roughly converged on a held-out validation dataset. Further training details are in the supplementary material.

Vii-B Simulation Experiments

For all simulated training runs, we evaluate on 50 new tier-specific starting states that were not seen during training. Figure 5 shows results across all tiers, suggesting that after behavior cloning, DAgger improves final coverage performance by 6.1% (averaging over six runs). In addition, color policies attain better coverage in simulation than depth policies with gains of 10.8%, 8.3%, and 10.9% across respective tiers, which may be due to high color contrast between the fabric and fabric plane in the color images, as opposed to the depth images (see Figure 3).

In all difficulty tiers, the color policies get higher final coverage performance than the wrinkles policy (from Table I): 94.8% over 91.3%, 89.6% over 87.0%, and 91.2% over 73.6%, respectively, and gets close to the corner pulling demonstrator despite only having access to image observations. The depth policies outperform the wrinkles policy only on tier 3, with 80.3% versus 73.6% coverage.

Viii Physical Experiments

Fig. 5: Coverage over 50 simulated trajectories at checkpoints (shown with “X”) during behavior cloning (left) and DAgger (right), which begins right after the last behavior cloning epoch. Results, from top to bottom, are for tier 1, 2, and 3 starting states. We additionally annotate with dashed lines the average starting coverage and the demonstrator’s average final coverage.

The da Vinci Research Kit (dVRK) surgical robot [15] is a cable-driven surgical robot with imprecision as reviewed in prior work [24, 43]. We use a single arm with an end effector that can be opened to 75°, or a gripper width of 10mm. We set a fabric plane at a height and location that allows the end-effector to reach all points on it. To prevent potential damage to the grippers, the fabric plane is foam rubber, which allows us to liberally set the gripper height to be lower and avoids a source of height error present in [26]. For the fabric, we cut a 5x5 inch piece from a Zwipes 735 Microfiber Towel Cleaning Cloth with a blue color within the distribution of domain randomized fabric colors. We mount a Zivid One Plus RGBD camera 0.9 meters above the workspace, which is used to obtain color and depth images.

Viii-a Physical Experiment Protocol

We manually create starting fabric states similar to those in simulation for all tiers. Given a starting fabric, we randomly run one of the color or depth policies for one trajectory for at most 10 steps (as in simulation). Then, to make comparisons fair, we “reset” the fabric to be close to its starting state, and ran the other policy.

During preliminary trials, the dVRK gripper would sometimes miss the fabric by 1-2 mm, which is within the calibration error. To counter this, we measure structural similarity [52] of the image before and after an action to check if the robot moved the fabric. If it did not, the next action is adjusted to be closer to the center of the fabric plane, and the process repeats until the robot touches fabric.

Viii-B Physical Experiment Results

Fig. 6: An example trajectory (reproduced from Figure 1) taken by a learned policy trained on color images from tier 3 starting states. (Images are taken from the camera view used to record videos.) The leftmost image shows the starting state of the fabric, set to be highly wrinkled with at least the bottom right fabric corner hidden. The policy takes seven actions in this episode, with pick points and pull vectors indicated by the overlaid black arrows. Despite the highly wrinkled starting state, the policy is able to smooth fabric to get above 92% coverage as shown at the rightmost image.
(1) Start (2) Final (3) Max (4) Actions
T1 C 78.4 +/- 4.4 96.2 +/- 2.3 96.2 +/- 2.3 1.8 +/- 1.4
T1 D 77.9 +/- 3.6 78.8 +/- 23.6 90.0 +/- 9.5 5.5 +/- 4.1
T2 C 58.5 +/- 5.9 87.7 +/- 13.3 92.7 +/- 4.4 6.3 +/- 3.2
T2 D 58.7 +/- 5.3 64.9 +/- 19.7 85.7 +/- 8.0 8.3 +/- 3.1
T3 C 46.2 +/- 3.7 75.0 +/- 17.9 79.9 +/- 13.5 8.7 +/- 2.0
T3 D 47.0 +/- 3.4 63.2 +/- 9.1 74.7 +/- 9.6 10.0 +/- 0.0
Table II: Physical experiments. We ran 20 trajectories for each of the tier 1 (T1), tier 2 (T2), and tier 3 (T3) fabric conditions, with color (C) and depth (D) policies. We report: (1) starting coverage, (2) final coverage, with the highest values in each tier in bold, (3) maximum coverage at any point after the start state, and (4) the number of actions per trajectory.

We run 20 trajectories for each combination of input modality (color or depth) and tiers, resulting in 120 total as shown in Table II. We report starting coverage, ending coverage, maximum coverage across the trajectory after the initial state, and the number of actions. The maximum coverage allows for a more nuanced understanding of performance, because policies can take strong initial actions that achieve high coverage (e.g., above 80%) but a single counter-productive action at the end can substantially lower coverage.

Results suggest that, despite not being trained on real images, the learned policies can smooth fabric in the physical world. All policies improve over the starting coverage across all tiers, for both color and depth policies. Final coverage averaged across all tiers is 86.3% and 69.0% for color and depth, respectively, with net coverage gains of 25.2% and 7.8% over starting coverage. In addition, the color policy deployed on tier 1 starting states was able to hit the 92% coverage threshold 20 out of 20 times.

Qualitatively, the color-trained policy is effective at “fine-tuning” by taking several short pulls to trigger at least 92% coverage. For example, Figure 6 shows a trajectory taken by a color policy trained on tier 3 starting states, where it is able to smooth the highly wrinkled fabric in seven actions.

The depth policies do not perform as well, but this is in large part because the depth policy sometimes takes counterproductive actions after several reasonable actions. Depth policies may have lower performance due to uneven texture on the fabric we use, which is difficult to replicate in simulation.

Viii-C Experiments With Yellow Fabrics

Fig. 7: An example trajectory taken by a tier 1, color-trained policy on yellow fabric. The first action (left image) picks to the upper left and pulls the fabric away from the plane (black arrow). The process repeats for several actions and, as shown in the last image, the fabric barely covers the plane.
(1) Start (2) Final (3) Max (4) Actions
T1 C 81.5 +/- 3.6 71.7 +/- 25.2 89.6 +/- 6.2 7.6 +/- 2.9
T1 D 83.1 +/- 2.9 85.9 +/- 15.3 91.9 +/- 5.3 4.6 +/- 4.4
Table III: Physical experiments for 5 trajectories with yellow fabric. We report the same statistics as in Table II for the two same policies (T1 C and T1 D) trained on tier 1 starting states.

To further test color versus depth policies, we used the same two policies trained on tier 1 starting states and deployed them on yellow fabric. The color distribution “covered” by domain randomization included shades of blue, purple, pink, red, and gray, but not yellow. We recreated five starting fabric conditions where, with a blue fabric, the color policy attained at least 92% coverage in just one action.

Results in Table III indicate poor performance from the color policy, as coverage decreases from 81.5% to 71.7%. Only two out of five trajectories resulted in at least 92% coverage. We observed behavior shown in Figure 7 where the policy fails to pick at a corner or to pull in the correct direction. The depth policy is invariant to colors, and is able to achieve higher ending coverage of 85.9%. This is higher than the 78.8% coverage reported in Table II due to relatively easier starting states.

Viii-D Failure Cases

Fig. 8: A poor action from a depth-trained policy. Given the state shown in the left image, the policy picks a point near the center of the fabric. The resulting pick and pull causes a major decrease in coverage.

The policies, particularly those trained with depth images, are susceptible to pulling near the center of the fabric for fabrics that are already nearly smooth, as shown in Figure 8. This results in poor coverage and may lead to cascading errors (as in Figure 7). One cause may be that there are several fabric corners that are equally far from their targets, which creates ambiguity in which corner should be pulled. It may be worthwhile to formulate corner picking with a mixture model to resolve this ambiguity.

Ix Conclusion and Future Work

We investigate baseline and learned policies for fabric smoothing. Using a low fidelity fabric simulator and a custom environment, we train policies in simulation using DAgger with a corner pulling demonstrator. We use domain randomization to transfer policies to a surgical robot. When testing on fabric of similar color to that used in training, color-based policies achieve higher coverage than depth-based policies, but depth could be more valuable in practice for unseen colors.

In future work, we will test on fabric shapes and configurations where corner pulling policies may get poor coverage. We plan to apply deep reinforcement learning, using the simulation environment for color and depth images with DDPG [23] and other state-of-the-art RL methods to potentially learn richer policies that can explicitly reason over multiple time steps and varying geometries. We will utilize higher-fidelity fabric simulators such as ARCSim [28]. Finally, we would like to extend the method beyond fabric coverage to tasks such as folding and wrapping, and will apply it to ropes, strings, and other deformable objects.


This research was performed at the AUTOLAB at UC Berkeley in affiliation with Honda Research Institute USA, the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS “People and Robots” (CPAR) Initiative, and by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, NSF National Robotics Initiative Award 1734633. The authors were supported in part by Siemens, Google, Amazon Robotics, Toyota Research Institute, Autodesk, ABB, Samsung, Knapp, Loccioni, Intel, Comcast, Cisco, Hewlett-Packard, PhotoNeo, NVidia, and Intuitive Surgical. Daniel Seita is supported by a National Physical Science Consortium Fellowship. We thank Jackson Chui, Michael Danielczuk, Shivin Devgon, and Mark Theis.


  • [1] B. D. Argall, S. Chernova, M. Veloso, and B. Browning (2009) A Survey of Robot Learning From Demonstration. Robotics and Autonomous Systems 57. Cited by: §III.
  • [2] B. Balaguer and S. Carpin (2011) Combining Imitation and Reinforcement Learning to Fold Deformable Planar Objects. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
  • [3] D. Baraff and A. Witkin (1998) Large Steps in Cloth Simulation. In ACM SIGGRAPH, Cited by: §IV.
  • [4] K.J. Bathe (2006) Finite Element Procedures. Prentice Hall. External Links: Link Cited by: §IV.
  • [5] J. Borras, G. Alenya, and C. Torras (2019) A Grasping-centered Analysis for Cloth Manipulation. arXiv:1906.08202. Cited by: §II.
  • [6] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba (2016) OpenAI Gym. External Links: arXiv:1606.01540 Cited by: §A-A, §IV.
  • [7] M. Cusumano-Towner, A. Singh, S. Miller, J. F. O’Brien, and P. Abbeel (2011) Bringing Clothing Into Desired Configurations with Limited Perception. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
  • [8] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov (2017) OpenAI Baselines. GitHub. Note: Cited by: §VII-A.
  • [9] A. Doumanoglou, A. Kargakos, T. Kim, and S. Malassiotis (2014) Autonomous Active Recognition and Unfolding of Clothes Using Random Decision Forests and Probabilistic Planning. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
  • [10] F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine (2018) Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control. arXiv:1812.00568. Cited by: §II.
  • [11] Z. Erickson, H. M. Clever, G. Turk, C. K. Liu, and C. C. Kemp (2018) Deep Haptic Model Predictive Control for Robot-Assisted Dressing. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
  • [12] Z. Erickson, M. Collier, A. Kapusta, and C. C. Kemp (2018) Tracking Human Pose During Robot-Assisted Dressing using Single-Axis Capacitive Proximity Sensing. In IEEE Robotics and Automation Letters (RA-L), Cited by: §I.
  • [13] Y. Gao, H. J. Chang, and Y. Demiris (2016) Iterative Path Optimisation for Personalised Dressing Assistance using Vision and Force Information. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
  • [14] C. Harris and M. Stephens (1988) A Combined Corner and Edge Detector. In In Proceedings of the Fourth Alvey Vision Conference, Cited by: §II.
  • [15] P. Kazanzides, Z. Chen, A. Deguet, G. Fischer, R. Taylor, and S. DiMaio (2014) An Open-Source Research Kit for the da Vinci Surgical System. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I, §VIII.
  • [16] D. P. Kingma and J. Ba (2015) Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR), Cited by: §C-A.
  • [17] Y. Kita, T. Ueshiba, E. S. Neo, and N. Kita (2009) A Method For Handling a Specific Part of Clothing by Dual Arms. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
  • [18] Y. Kita, T. Ueshiba, E. S. Neo, and N. Kita (2009) Clothes State Recognition Using 3D Observed Data. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
  • [19] J. J. Koenderink and A. J. Van Doorn (1992) Surface shape and curvature scales. Image and vision computing 10 (8), pp. 557–564. Cited by: §II.
  • [20] M. Laskey, J. Lee, R. Fox, A. Dragan, and K. Goldberg (2017) DART: Noise Injection for Robust Imitation Learning. In Conference on Robot Learning (CoRL), Cited by: §VII.
  • [21] Y. Li, X. Hu, D. Xu, Y. Yue, E. Grinspun, and P. K. Allen (2016) Multi-Sensor Surface Analysis for Robotic Ironing. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
  • [22] Y. Li, Y. Yue, D. X. E. Grinspun, and P. K. Allen (2015) Folding Deformable Objects using Predictive Simulation and Trajectory Optimization. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
  • [23] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra (2016) Continuous Control with Deep Reinforcement Learning. In International Conference on Learning Representations (ICLR), Cited by: §IX.
  • [24] J. Mahler, S. Krishnan, M. Laskey, S. Sen, A. Murali, B. Kehoe, S. Patil, J. Wang, M. Franklin, P. Abbeel, and K. Goldberg (2014) Learning Accurate Kinematic Control of Cable-Driven Surgical Robots Using Data Cleaning and Gaussian Process Regression.. In IEEE Conference on Automation Science and Engineering (CASE), Cited by: §VIII.
  • [25] J. Maitin-Shepard, M. Cusumano-Towner, J. Lei, and P. Abbeel (2010) Cloth Grasp Point Detection Based on Multiple-View Geometric Cues with Application to Robotic Towel Folding. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
  • [26] J. Matas, S. James, and A. J. Davison (2018) Sim-to-Real Reinforcement Learning for Deformable Object Manipulation. Conference on Robot Learning (CoRL). Cited by: §C-A, §II, §VIII.
  • [27] S. Miller, J. van den Berg, M. Fritz, T. Darrell, K. Goldberg, and P. Abbeel (2012) A Geometric Approach to Robotic Laundry Folding. In International Journal of Robotics Research (IJRR), Cited by: §I.
  • [28] R. Narain, A. Samii, and J. F. O’Brien (2012) Adaptive Anisotropic Remeshing for Cloth Simulation. In ACM SIGGRAPH Asia, Cited by: Appendix A, §IX.
  • [29] OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba (2018) Learning Dexterous In-Hand Manipulation. arXiv:1808.00177. Cited by: §C-B.
  • [30] T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, and J. Peters (2018) An Algorithmic Perspective on Imitation Learning. Foundations and Trends in Robotics 7. Cited by: §III.
  • [31] F. Osawa, H. Seki, and Y. Kamiya (2007) Unfolding of Massive Laundry and Classification Types by Dual Manipulator. Journal of Advanced Computational Intelligence and Intelligent Informatics 11 (5). Cited by: §II.
  • [32] J. K. Parker, R. Dubey, F. W. Paul, and R. J. Becker (1983) Robotic Fabric Handling for Automating Garment Manufacturing. Journal of Manufacturing Science and Engineering 105. Cited by: §I.
  • [33] D. A. Pomerleau (1991)

    Efficient Training of Artificial Neural Networks for Autonomous Navigation

    Neural Comput. 3. Cited by: §VII.
  • [34] D. A. Pomerleau (1989) Alvinn: An Autonomous Land Vehicle in a Neural Network. Technical report Carnegie-Mellon University. Cited by: §VII.
  • [35] C. Poynton (2003) Digital video and hdtv algorithms and interfaces. 1 edition, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. External Links: ISBN 1558607927 Cited by: §VII-A.
  • [36] X. Provot (1995) Deformation Constraints in a Mass-Spring Model to Describe Rigid Cloth Behavior. In Graphics Interface, Cited by: §IV, §IV.
  • [37] A. Ramisa, G. Alenya, F. Moreno-Noguer, and C. Torras (2012) Using Depth and Appearance Features for Informed Robot Grasping of Highly Wrinkled Clothes. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
  • [38] S. Ross, G. J. Gordon, and J. A. Bagnell (2011) A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In

    International Conference on Artificial Intelligence and Statistics (AISTATS)

    Cited by: §I, §VII.
  • [39] F. Sadeghi and S. Levine (2017) CAD2RL: Real Single-Image Flight without a Single Real Image. In Robotics: Science and Systems (RSS), Cited by: §I.
  • [40] J. Sanchez, J. Corrales, B. Bouzgarrou, and Y. Mezouar (2018) Robotic Manipulation and Sensing of Deformable Objects in Domestic and Industrial Applications: a Survey. In International Journal of Robotics Research (IJRR), Cited by: §II.
  • [41] J. Schrimpf and L. E. Wetterwald (2012) Experiments Towards Automated Sewing With a Multi-Robot System. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
  • [42] D. Seita, N. Jamali, M. Laskey, R. Berenstein, A. K. Tanwani, P. Baskaran, S. Iba, J. Canny, and K. Goldberg (2019)

    Deep Transfer Learning of Pick Points on Fabric for Robot Bed-Making

    In International Symposium on Robotics Research (ISRR), Cited by: §A-B, §II, §V-2.
  • [43] D. Seita, S. Krishnan, R. Fox, S. McKinley, J. Canny, and K. Goldberg (2018) Fast and Reliable Autonomous Surgical Debridement with Cable-Driven Robots Using a Two-Phase Calibration Procedure. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §VIII.
  • [44] S. Shibata, T. Yoshimi, M. Mizukawa, and Y. Ando (2012) A Trajectory Generation of Cloth Object Folding Motion Toward Realization of Housekeeping Robot. In International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Cited by: §I.
  • [45] L. Sun, G. Aragon-Camarasa, P. Cockshott, S. Rogers, and J. P. Siebert (2014)

    A Heuristic-Based Approach for Flattening Wrinkled Clothes

    Towards Autonomous Robotic Systems. TAROS 2013. Lecture Notes in Computer Science, vol 8069. Cited by: §B-B, §B-B, §II, §V-3.
  • [46] L. Sun, G. Aragon-Camarasa, S. Rogers, and J. P. Siebert (2015) Accurate Garment Surface Analysis using an Active Stereo Robot Head with Application to Dual-Arm Flattening. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
  • [47] R. S. Sutton and A. G. Barto (2018) Introduction to Reinforcement Learning. 2nd edition, MIT Press, Cambridge, MA, USA. Cited by: §II.
  • [48] B. Thananjeyan, A. Garg, S. Krishnan, C. Chen, L. Miller, and K. Goldberg (2017) Multilateral Surgical Pattern Cutting in 2D Orthotropic Gauze with Deep Reinforcement Learning Policies for Tensioning. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I, §II.
  • [49] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel (2017) Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §C-B, §I, §VII-A.
  • [50] E. Torgerson and F. Paul (1987) Vision Guided Robotic Fabric Manipulation for Apparel Manufacturing. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
  • [51] L. Verlet (1967) Computer Experiments on Classical Fluids: I. Thermodynamical Properties of Lennard−Jones Molecules. Physics Review 159 (98). Cited by: §IV.
  • [52] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004-04) Image Quality Assessment: From Error Visibility to Structural Similarity. Trans. Img. Proc.. Cited by: §VIII-A.
  • [53] B. Willimon, S. Birchfield, and I. Walker (2011) Model for Unfolding Laundry using Interactive Perception. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
  • [54] P. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata (2017)

    Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning

    In IEEE Robotics and Automation Letters (RA-L), Cited by: §I.

Appendix A Fabric Environment

The fabric simulator is implemented in Python with Cython for increased speed. The simulator is low fidelity compared to more accurate simulators such as ARCSim [28]

, but has the advantage of being easier to adapt to the smoothing task we consider. Some relevant hyperparameters are shown in Table 


A-a Actions

The fabric smoothing environment is implemented with a standard OpenAI gym [6] interface. Each action is broken up into six stages: (1) a grasp, (2) a pull up, (3) a pause, (4) a linear pull towards a target, (5) a second pause, and (6) a drop. Steps (2) through (6) involve some number of iterations, where each iteration changes the coordinates of “pinned” points on and then calls one “update” method for the fabric simulator to adjust the other, non-pinned points.

A-A1 Grasp

We implement a grasp by first simulating a gripper moving downwards from a height higher than the highest fabric point, which simulates grasping only the top layer of the fabric. At a given height , to decide which points in are grasped given pick point , we use a small sphere centered at with a radius of 0.003 units, where units are scaled so 1 represents the length of a side of the fabric plane. If no points are gripped, then we lower until a point is within the radius. In practice, this means usually 2-5 out of the points on the cloth are grasped for a given pick point. Once any of the fabric’s points are within the gripper’s radius, those points are considered fixed, or “pinned”.

A-A2 Pull Up

For 50 iterations, the pull adjusts the -coordinate of any pinned point by units, and keeps their and coordinates fixed. In practice, tuning is important. If it is too low, an excessive amount of fabric-fabric collisions can happen, but if it is too high, then coverage substantially decreases. In future work, we will consider dynamically adjusting the height change so that it is lower if current fabric coverage is high.

A-A3 First Pause

For 80 iterations, the simulator keeps the pinned points fixed, and lets the non-pinned points settle.

A-A4 Linear Pull to Target

To implement the pull, we adjust the and coordinates of all pinned points by a small amount each time step (leaving their coordinates fixed), in accordance with the deltas in the action. The simulator updates the position of the non-pinned points based on the implemented physics model. This step is run for a variable amount of iterations based on the pull length.

A-A5 Second Pause

For 300 iterations, the simulator keeps the pinned points fixed, and lets the non-pinned points settle. This period is longer than the first pause because normally more non-pinned points are moving after a linear pull to the target compared to a pull upwards.

A-A6 Drop

Finally, the pinned points are “un-pinned” and thus are allowed to lower due to gravity. For 1000 iterations, the simulator lets the entire cloth settle and stabilize for the next action.

Hyperparameter Value
Number of Points
Damping 0.020
Spring Constant 5000.0
Self-collision thickness 0.020
Height change per iteration 0.0025
Table IV: Fabric simulator hyperparameters. The spring constant is in Equation 1 and damping is in Equation 4.

A-B Starting State Distributions

We provide details on how we generate starting states from the three distributions we use (see Section IV-B).

  • Tier 1. We perform a sequence of two pulls with pick point randomly chosen on the fabric, pull direction randomly chosen, and pull length constrained to be short (about 10-20% of the length of the fabric plane). If coverage remains above 90% after these two pulls we perform a third short (random) pull.

  • Tier 2. For this tier only, we initialize the fabric in a vertical orientation with tiny noise in the direction perpendicular to the plane of the fabric. Thus the first action in Tier 2 initialization is a vertical drop over one of two edges of the plane (randomly chosen). We then randomly pick one of the two corners at the top of the dropped fabric and drag it approximately toward the center for about half of the length of the plane. Finally we grip a nearby point and drag it over the exposed corner in an attempt to occlude it, again pulling for about half the length of the plane.

  • Tier 3. Initialization consists of just one high pull. We choose a random pick point, lift it about 4-5 times as high as compared to a normal action, pull in a random direction for 10-25% of the length of the plane, and let the fabric fall, which usually creates less coverage and occluded fabric corners.

The process induces a distribution over starting states, so the agent never sees the same starting fabric state.

These starting state distributions do not generally produce a setting when we have a single corner fold that is visible on top of the fabric, as that case was frequently shown and reasonably approached in prior work [42].

Appendix B Details on Baseline Policies

We describe the implementation of the analytic methods from Section V in more detail.

B-a Highest (Max )

We implement this method by using the underlying state representation of the fabric , and not the images . For each , we iterate through all fabric point masses and obtain the one with the highest z-coordinate value. This provides the pick point. For the pull vector, we deduce it from the location of the fabric plane where it would be located if the fabric were perfectly flat.

To avoid potentially getting stuck repeatedly pulling the same point (which happens if the pull length is very short and the fabric ends up “resetting” to the prior state), we select the five highest points on the fabric, and then randomize the one to pick.

B-B Wrinkles

The implementation approximates the method in Sun et al. [45]; implementing the full algorithm is difficult due to its complex, multi-stage nature and the lack of open source code. Their wrinkle detection method involves computing variance in height in small neighborhoods around each pixel in the observation image ,

-means clustering on the pixels with average variance above a certain threshold value, and hierarchical clustering on the clusters found by

-means to obtain the largest wrinkle. We approximate their wrinkle detection method by isolating the point of largest local variance in height using . Empirically, this is accurate at selecting the largest wrinkle. To estimate wrinkle direction, we find the neighboring point with the next largest variance in height.

We then pull perpendicular to the wrinkle direction. Where Sun et al. [45] constrains the perpendicular angle to be one of eight cardinal directions (north, northeast, east, southeast, south, southwest, west, northwest), we find the exact perpendicular line and its two intersections with the edges of the fabric. We choose the closer of these two as the pick point and pull to the edge of the plane.

B-C Oracle

For the oracle policy, we assume we can query the four corners of the fabric and know their coordinates. Since we know the four corners, we know which of the fabric plane corners (i.e., the targets) to pull to. We pick the pull based on whichever fabric corner is furthest from its target.

B-D Oracle Expose

The oracle expose policy is an extension to the oracle policy. In addition to knowing the exact position of all corners, the policy is also aware of the occlusion state of all corners. The occlusion state is a 4D boolean vector which indicates a 1 if a given corner is visible to the camera from the top-down view or a 0 if it is occluded. The oracle expose policy will try to solve all visible corners in a similar manner to the oracle policy using its extended state knowledge. If all four corners are occluded, or all visible corners are within a threshold of their target positions, the oracle expose policy will perform a revealing action on an occluded corner. We implement the revealing action as a fixed length pull 180 degrees away from the angle to the target position. This process is repeated until the threshold coverage is achieved.

Appendix C Details on Imitation Learning

C-a DAgger Pipeline

Hyperparameter Value
Parallel environments 10
Stepps per env, between gradient updates 20
Gradient updates after parallel steps 240
Minibatch size 128
Demonstrator (offline) trajectories 2000
Policy learning rate 1e-4
Policy regularization parameter 1e-5
Behavior Cloning epochs 500
DAgger steps after Behavior Cloning 50000
Table V: Hyperparameters for the main DAgger experiments.

We collected demonstrations by running the oracle corner policy (not oracle expose) for 2000 trajectories for each of the three starting state tiers. We then run behavior cloning on this offline data for 500 epochs before running DAgger.

Each DAgger “iteration” rolls out 10 parallel environments for 20 steps each (hence, 200 total new samples) which are labeled by the oracle corner policy. These are added to a growing dataset of samples which includes the demonstrator’s original offline data. After 20 steps per parallel environment, we draw 240 minibatches of size 128 each for training. Then the process repeats with the agent rolling out its new policy. DAgger hyperparameters are in Table V. In practice, the regularization for the policy impacted performance significantly. We use 1e-5 and saw poor performance with 1e-3 and 1e-4. The total number of DAgger steps was limited to 50,000 due to compute and time limitations; training for substantially more steps is likely to yield further improvements.

The policy neural network architecture is similar to the one in Matas et al. [26] with four convolutional layers, each with 32 filters of size , followed by dense layers of size 256 each, for a total of 3.44 million parameters. The parameters, in more detail, are (ignoring biases for simplicity):

policy/convnet/c1   864 params (3, 3, 3, 32)
policy/convnet/c2   9216 params (3, 3, 32, 32)
policy/convnet/c3   9216 params (3, 3, 32, 32)
policy/convnet/c4   9216 params (3, 3, 32, 32)
policy/fcnet/fc1    3276800 params (12800, 256)
policy/fcnet/fc2    65536 params (256, 256)
policy/fcnet/fc3    65536 params (256, 256)
policy/fcnet/fc4    1024 params (256, 4)
Total model parameters: 3.44 million

As input, the policy consumes images of size , and produces a 4D vector with a hyperbolic tangent applied to make components within . We optimize using Adam [16] with learning rate and use regularization of .

Fig. 9: Correcting the dVRK when it slightly misses fabric. Left: the dVRK executes a pick point (indicated with the red circle) but barely misses the fabric. Middle: it detects that the fabric has not changed, and the resulting action is constrained to be closer to the center and touches the fabric (light pink circle). Right: the pull vector results in high coverage.

C-B Domain Randomization and Simulated Images

Fig. 10: Representative simulated color images of examples of starting fabric states drawn from the distributions specified in Section IV-B. All images are of dimension . Top row: tier 1. Middle row: tier 2. Bottom row: tier 3. Domain randomization is applied on the fabric color, the shading of the white background plane, the camera pose, and the overall image brightness, and then we apply uniform random noise to each pixel.
Fig. 11: Representative simulated depth images of examples of starting fabric states drawn from the distributions specified in Section IV-B, shown in a similar manner as in Figure 10. Images are of dimension with the depth values repeated across the three channels. Domain randomization is applied on the camera pose and the image brightness, and then we apply uniform random noise to each pixel.

To transfer a policy to a physical robot, we use domain randomization [49] during training. We randomize fabric colors, the shading of the fabric plane, and camera pose. We do not randomize the simulator’s parameters as done in OpenAI et al. [29] and leave this to future work.

Figures 10 and 11 show examples of simulated images with domain randomization applied. We specifically applied the following randomization, in order:

  • For color images only, we apply color randomization. The cloth background and foreground colors are set at default RGB values of and , respectively, creating a default blue color. With domain randomization, we create a random noise vector of size three where each component is independently drawn from and then add it to both the background and foreground colors. Empirically, this creates images of various shades “centered” at the default blue value.

  • For both color and depth images, we apply camera pose randomization, with Gaussian noise added independently to the six components of the pose (three for position using meters, three for orientation using degrees). Gaussians are drawn centered at zero with standard deviation 0.04 for positions and 0.9 for degrees.

  • After Blender produces the image, we next adjust the brightness via OpenCV gamma corrections111, with separately tuned values for color and depth images, and with representing no brightness change. We draw for depth images (to make the images darker to match physical images) and draw for color images.

Only after the above are applied, do we then independently add uniform noise to each pixel. For each full image with pixel values between 0 and 255, we draw a uniform random variable

between -15 and 15. We then draw additive noise for each pixel independently.

Appendix D Experiment Setup Details

D-a Image Processing Pipeline

Fig. 12: Representative examples of color images from the physical camera used for the surgical robot experiments, so there is no domain randomization applied here. We manipulated fabrics so that they appeared similar to the simulated states. The color images here correspond to the depth images in Figure 13.
Fig. 13: Representative examples of depth images from the physical camera used for the surgical robot experiments, so there is no domain randomization applied here. We manipulated fabrics so that they appeared similar to the simulated states. The depth images here correspond to the color images in Figure 12.

The original color and depth images come from the mounted Zivid One Plus RGBD camera, and are processed in the following ordering:

  • For depth images only, we apply in-painting to fill in missing values (represented as “NaN”s) in depth images based on surrounding pixel values.

  • Color and depth images are then cropped to be images that allow the entire fabric plane to be visible, along with some extra background area.

  • For depth images only, we clip values to be within a minimum and maximum depth range, tuned to provide depth images that looked reasonably similar to ones processed in simulation. We convert images to three channels by triplicating values across the channels. We then scale pixel values to be within and apply the OpenCV equalize histogram function for all three channels.

  • For depth and color images, we apply bilateral filtering and then de-noising, both implemented using OpenCV functions. These help smooth the uneven fabric texture without sacrificing cues from the corners.

Figures 12 and 13 show examples of real fabric images that policies take as input, after processing. These are then passed as input to the policy neural network.

D-B Physical Experiment Setup and Procedures

To map from neural network output to a position with respect to the robot’s frame, we calibrate the positions by using a checkerboard on top of the fabric plane. We move the robot’s end effectors with the gripper facing down to each corner of the checkerboard and record positions. During deployment, for a given coordinate frame, we perform bilinear interpolation to figure out the robot position from the four surrounding known points. After calibration, the robot reached positions on the fabric plane to 1-2 mm of error. Figure 

9 shows a visualization of the heuristic we employ to get the robot to grasp fabric when it originally misses by 1-2mm.