Log In Sign Up

DextAIRity: Deformable Manipulation Can be a Breeze

by   Zhenjia Xu, et al.

This paper introduces DextAIRity, an approach to manipulate deformable objects using active airflow. In contrast to conventional contact-based quasi-static manipulations, DextAIRity allows the system to apply dense forces on out-of-contact surfaces, expands the system's reach range, and provides safe high-speed interactions. These properties are particularly advantageous when manipulating under-actuated deformable objects with large surface areas or volumes. We demonstrate the effectiveness of DextAIRity through two challenging deformable object manipulation tasks: cloth unfolding and bag opening. We present a self-supervised learning framework that learns to effectively perform a target task through a sequence of grasping or air-based blowing actions. By using a closed-loop formulation for blowing, the system continuously adjusts its blowing direction based on visual feedback in a way that is robust to the highly stochastic dynamics. We deploy our algorithm on a real-world three-arm system and present evidence suggesting that DextAIRity can improve system efficiency for challenging deformable manipulation tasks, such as cloth unfolding, and enable new applications that are impractical to solve with quasi-static contact-based manipulations (e.g., bag opening). Video is available at


page 1

page 2

page 3

page 4

page 7

page 8

page 12

page 13


FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding

High-velocity dynamic actions (e.g., fling or throw) play a crucial role...

Iterative Residual Policy: for Goal-Conditioned Dynamic Manipulation of Deformable Objects

This paper tackles the task of goal-conditioned dynamic manipulation of ...

Cloth Funnels: Canonicalized-Alignment for Multi-Purpose Garment Manipulation

Automating garment manipulation is challenging due to extremely high var...

AutoBag: Learning to Open Plastic Bags and Insert Objects

Thin plastic bags are ubiquitous in retail stores, healthcare, food hand...

Active Animations of Reduced Deformable Models with Environment Interactions

We present an efficient spacetime optimization method to automatically g...

Visual Haptic Reasoning: Estimating Contact Forces by Observing Deformable Object Interactions

Robotic manipulation of highly deformable cloth presents a promising opp...

Learning Closed-loop Dough Manipulation Using a Differentiable Reset Module

Deformable object manipulation has many applications such as cooking and...

I Introduction

Many common everyday objects are impractical to manipulate via direct contact; however, they can be manipulated indirectly via air. From blowing leaves on the street to inflating molten glass, people purposefully control airflow to effectively change the state, such as pose or shape, of objects. In this paper, we seek to imbue robots with a similar capability, which we term DextAIRity. As a form of contact-less dynamic manipulation, DextAIRity provides a set of unique advantages over conventional contact-based quasi-static manipulation:

  • [leftmargin=3.5mm]

  • Dense force application. Instead of applying force through sparse contact positions, DextAIRity allows the system to simultaneously apply dense forces to a 3D space. This property is particularly beneficial for under-actuated objects – including deformables – since it allows a robot to apply forces to those out-of-contact surfaces (e.g., Fig. 1 A,B). As a result, systems that are under-actuated with contact-based manipulation can become more controllable when manipulated via streams of air.

  • Expanded workspace. Since DextAIRity does not require direct contact to manipulate objects, it can apply forces to objects that are distant from the robot and effectively expand its workspace. This property particularly is useful when the target object has a large volume or surface area – spreading a large piece of cloth for instance – and enables small robots to manipulate large items.

  • High-speed interactions without high-speed robots. Since the high-speed interactions are produced by the emitted airflow, these actions are much safer than the actual robot movements at a similar velocity and easier to achieve without the need of high-end industrial hardware.

Fig. 1: DextAIRity manipulates deformable objects by controlling an active airflow. We demonstrate DextAIRity with two tasks that are particularly challenging for traditional contact-based manipulation: unfolding a large piece of cloth (top) and opening a soft bag and maintaining its opened state (bottom). By controlling the blower’s direction, the system can apply dense forces on out-of-contact surfaces (A and B) to efficiently achieve its goal.
Fig. 2: System and Task Setup. Our system setup consists of (a) three UR5 robot arms, two of which are equipped with parallel-jaw grippers and one with a commodity centrifugal air pump. (b) shows a top-down view of the workspace and robots’ reach range for the cloth unfolding task. (c) shows a side view of the workspace and the robots’ action space for the bag opening task.

However, despite the potential advantages of air-based manipulation, it is an open and challenging problem. First, accurately modeling the aerodynamics between the airflow and the environment is computationally costly, making model-based approaches impractical. Second, the precise airflow parameters, the shape and volume of an air-stream, are not easily observable or controlled, often resulting in highly stochastic dynamics and unexpected action effects.

Both challenges motivate a self-supervised closed-loop solution for DextAIRity that could learn and improve from data. When airflow is applied to deformable cloth, the deformation on the cloth provides insight into the state information of the unobservable airflow. By combining this information with a closed-loop policy, the system can continually adjust its blowing action based on visual feedback. Because the supervision signal for the learning algorithm can be directly computed from the visual observation, the system is fully trained with self-supervised trial-and-error — without human demonstration or annotation. We demonstrate the effectiveness of DextAIRity through two challenging tasks: (1) unfold a cloth to maximize its coverage and (2) open a bag to maximize its volume. For both tasks, we use a three-arm robotic system consisting of two arms with parallel grippers and one arm equipped with commodity centrifugal air pump (Fig. 1).

The primary contribution of this work is to suggest a new approach for deformable object manipulation utilizing directed airstreams, DextAIRity. Our simulation and real-world experiments suggest that DextAIRity is able to improve efficiency for challenging manipulation tasks (e.g., unfolding an large pieces of unknown cloth) and enable new applications that were impractical with conventional contact-based manipulations (e.g., bag opening). We also discuss the potential limitations and necessary considerations of deploying DextAIRityin real-world applications. Code and robot videos:

Ii Related Work

Ii-a Quasi-static deformable object manipulation

Manipulating deformable objects is a long-standing challenge in robotics. Methods have been developed for manipulating ropes [sundaresan2020learning], smoothing fabric [seita2020deep, lin2021VCD], folding cloth [wu2019learning, lee2020learning, ganapathi2021learning, weng2021fabricflownet], lifting bags [howard1996prototype, howard2000intelligent, seita_bags_iros_2021], and inserting rigid object into deformable containers [weng2021graph, seita2021learning]

. However, all the above methods are limited to using quasi-static pick-and-place actions. While this may be sufficient for rearranging rigid objects, this action space is generally inefficient when manipulating deformable objects. Since a deformable object has near-infinite degrees of freedom, and quasi-static actions can only be used to manipulate an object through contact (grasped area), these systems often require many, even hundreds of, interactions to achieve the goal. Tasks can sometimes be rendered impossible due to a robot’s limited reach range and sparse contact positions (one contact point per gripper).

Ii-B Dynamic deformable object manipulation

In contrast to quasi-static manipulation, dynamic manipulation [mason1993dynamic] additionally leverages robot-produced acceleration forces to manipulate objects. This formulation allows the system to manipulate out-of-contact regions of the deformable object by building up an object’s momentum with high-velocity actions [wang2020swingbot, zhang2021robots, zeng2020tossingbot, casting, DensePhysNet]. For example, Ha and Song [ha2021flingbot] propose a learning-based approach using high-speed flinging actions to unfold a severely crumpled cloth with greater efficiency than comparable quasi-static methods. However, effectively using dynamic actions often come with restrictive system requirements, such as tracking the full (dense) state of cloth [jangir2019dynamic] or using a high-speed camera and high-speed robots [balaguer2011combining, yamakawa2011dynamic, shibata2010robotic]. In many of these approaches, robots move with high acceleration using high controller gains, resulting in expensive hardware requirements and systems that pose a danger to things and people in their surroundings. Taken together, these attributes introduce substantial barriers to real-world deployment. In contrast, our approach is able to produce high-speed interactions using emitted air, which is inherently safer and inexpensive to implement.

Ii-C Manipulation with air

The idea of using airflow to generate directed force has been adopted in many aspects of robotic mechanical design. For example, airflow has been used in robot propulsion systems to produce thrust[wasbari2017review], or as a pneumatic actuator to control a soft robot’s geometry and articulation [rus2015design, sanan2009robots]. Air manipulation is also used in many robot gripper designs, such as suction grippers [zhakypov2018origami, yamaguchi2013development, bamotra2018fabrication] and non-contact grippers, using the Bernoulli principle [erzincanli1998design, ozcelik2002non, ozcelik2005examination, davis2008end]. Early works also investigated systems that levitated rigid objects using controllable airflow [nordine1982aerodynamic, tootchi2019modeling, escano2005position] including non-contact conveyance systems [konishi1994conveyance, konishi1996two, konishi1999development, pister1990planar, biegelsen2000airjet]. In contrast to these prior approaches, which primarily focus on mechanical hardware design, our goal is to learn effective manipulation policies using off-the-shelf hardware (UR5 robots and a $20 air pump). Moreover, instead of being limited to highly structured environments and objects with known physical properties, our algorithm is able to generalize and adapt to novel deformable cloths and bags using visual feedback.

Fig. 3: Simulating Air-Cloth Interactions. Cloth is simulated as a spring-mass system, and airflow is simulated as a stream of invisible particles. Our policy only takes the color image rendered by OpenGL as input. This simulation environment is only used for training cloth unfolding task but not bag opening task.

Iii System and Task Setup

Our system setup consists of three 6-DoF UR5 robot arms, two of which are equipped with parallel-jaw grippers (a Schunk WSG50 and an OnRobot RG2), and one equipped with a commodity centrifugal air pump 111Amazon link: The air pump is able to produce on-demand streams of focused air with a maximum flow rate of . Our setup also includes three RGB-D cameras (highlighted in Fig. 2): a top-down Azure Kinect, a front-view Realsense D415, and a side-view RealSense D415.

Iii-a Task Configurations

We use two tasks to examine the effectiveness of DextAIRity: one focuses on manipulating 2-D objects (unfolding cloth or garments), and the other extends to manipulating 3-D objects (opening a deformable bag).

Cloth unfolding: The objective of this task is to increase the overall coverage of the cloth as measured by a top-down camera. This task is a typical first step for many cloth manipulation tasks such as cloth folding [maitin2010cloth] or bed making [seita2018bedmaking]. As shown in Fig. 2b, the workspace is defined as a square in the x-y plane. A crumpled cloth is randomly dropped on the workspace and when fully unfolded can reach up to . Because the UR5 robot has a reach range, none of these robots can cover the entire workspace or the surface of a fully unfolded cloth.

Bag opening: The objective of this task is to open a bag and maintain that opened state. It is a common first step in many downstream applications, such as filling a bag with objects [seita_bags_iros_2021, seita2021learning]. The bag opening state is determined by thresholding the visible surface area (in pixels) of the bag observed by the side-view camera, where the threshold is selected for each bag to accommodate different bag sizes. In our experiments, we assume that the closed bag is already grasped by two of the UR5 arms, and the grasping position in the y-z plane is uniformly randomly sampled in a 0.2m 0.14m rectangle (the yellow region in Fig. 2c). To make the task more challenging, the distance between the two grasping points is randomly selected in , where is the bag’s maximum opening width and is . The grippers can also tilt at a random angle, , along the x-axis to randomize the opening direction. Similar to the cloth unfolding task, the policy infers blowing actions from top-down images in a closed-loop manner. We permit the execution of up to 4 blowing actions per episode before considering it a failure.

Iii-B Simulation environment

We use a simulated environment to train our cloth unfolding policy. The environment is built on top of the PyFleX bindings to Nvidia FleX [ha2021flingbot, lin2020softgym, li2018learning]. In addition to simulating cloth and robot end-effectors, our simulation environment also provides a model of the blowing action. The blowing effect is simulated as a stream of invisible particles shot out from the blower. Air particles can be deflected from both cloth mesh and table. In each simulation step, 19 particles are uniformly shot out at in a cone shape (Fig. 3). The particles will then interact with the cloth using PyFleX’s contact dynamics. Each blow lasts 150 simulation steps, where 150 is set empirically to make sure the cloth can reach to a relatively stable state (i.e., coverage fluctuates randomly within a small range).

Our particle-based wind effect has the key property of simulating non-uniform air effects on local regions of the cloth (for example, the targeted blown location). While PyFleX has a build in “wind” option, it is only able to apply global force to objects and was not suitable for policy training.

It is important to note that our blowing simulation uses a heavily simplified physics model that does not reflect accurate aerodynamics, such as Bernoulli or Eddy effects, which are critical for applications like bag opening. Therefore, while we show the simulation is reasonable for training a cloth unfolding policy, it is not sufficient for bag opening tasks. As a result, we directly train the bag opening policy with real-world data.

Fig. 4: Approach Overview. (a) From a top-down observation, the Grasping Network predicts scores for each grasping action(i.e., center and rotation). The one with highest score will be selected for execution. Robots will grasp, stretch, and place the cloth down following blowing steps. (b) At each blowing step, the blowing network takes the top-down observation as input and infers blowing scores for each action candidate (blower position and rotation). The blowing action with the highest score will be executed.

Iv Method

The key idea of DextAIRity is to leverage the interactions between active airflow and deformable objects to achieve efficient manipulation. Crucially, while controlled airflow provides the system with additional control over out-of-contact object regions, the deformation on the object also provides visual feedback about the otherwise unobserved airflow, allowing the system to continually adjust its actions in a closed-loop manner. In the following section, we will first describe our method in the context of cloth unfolding. We will then present the required modifications for bag opening.

Iv-a Open-loop Grasping Policy

To perform the cloth unfolding task, the system needs first to infer how to pick up cloth from the table. We extend the grasping framework from Flingbot [ha2021flingbot] with a modified action parameterization to improve grasp quality.

Edge-coincident grasping action parameterization: Flingbot used a dual-arm grasping action parameterized by a two end points on a line segment with center , angle , and width . From these parameters, the two grasping positions are computed, allowing efficient computation grasping positions while satisfying collision-avoidance constraints. On top of this formulation, we further constrain the grasping positions to be on the edges of the cloth. We found this constraint significantly reduced the chance of grasping multiple layers of the fabric, which is a typical failure case for Flingbot. To implement this constraint, we directly extend the line segment (defined by and ) to intersect with the cloth mask, and two furthest intersection points are selected as the grasping positions ( and ). The z-coordinate of these two grasping points are computed from depth. The object is segmented using background subtraction and if the selected center falls in the background, or the distance between two grasping positions is smaller than a minimum safety distance (, defined empirically), no grasp will be executed, and the episode will terminate with a state reset.

Grasping network: To predict grasping parameters, we employ a spatial action map representation [wu2020spatial, zeng2018learning, ha2021flingbot] to leverage the translational and rotational equivariance between the actions and the physical transformations of the cloth. The top-down color image is used to produce a batch of 8 candidates, differing by rotation about the z-axis, before being fed into the grasping network, which then outputs a grasping score for each pixel corresponding to predicted cloth coverage after execution. As a result, a pixel in the image rotated by directly maps to a corresponding grasping action. The grasping policy then selects the action associated with the highest predicted score for execution. We use DeepLabv3 [chen2017rethinking] with random initialization as network architecture. We will describe training and supervision details in §IV-C.

Iv-B Closed-loop Blowing Policy

After the the cloth is picked up, it is stretched taut using the front-view camera [ha2021flingbot]. The robot will then move the cloth to above the table surface to prepare for execution of a blowing action. Initially the blowing is directed towards the center of the workspace and is then adjusted 4 times via a closed-loop blowing policy. Each blowing step lasts after movement and the blower is kept on during all blowing steps.

Blowing action parameterization: The blowing action is parameterized by position , where represents translation along the x-axis (left to right) and orientation , where is a rotation angle around the z-axis (determining orientation in the x-y plane). Other blower parameters are fixed during the interaction – the blower’s nozzle is above table and away from the gripper holding position with a pitch angle.

Blowing network: To select effective blowing actions, we train a blowing network that takes the top-down color image and a blowing action as input, and predicts a score (i.e., the final coverage) for that action [xu2022umpnet]. At each step, we uniformly sample blowing actions to evaluate and select the maximum one to execute (Fig. 4b, upper half). The blowing network consists of an image encoder (7-layer convolution network) and an action encoder (3-layer MLP), followed by a 3-layer MLP to produce the final score.

Iv-C Training procedure

Both the grasping and blowing networks are trained via self-supervised epsilon-greedy exploration. In each training episode, five grasping actions are executed, where each grasping action is followed by four blowing actions. Each blowing action is automatically labeled with the coverage of the cloth after execution, while each grasping step is supervised by observed coverage at the end of blowing execution. To compute coverage, the system obtains a cloth mask via background subtraction. Both networks are supervised via MSE Loss between predicted and real coverage.

Note that the performance of the two modules are highly coupled – grasping score is dependent on the following blowing steps, and blowing performance is affected by how the cloth is grasped. This coupling can make training unstable. To solve this issue, we designed simple heuristic policies for grasping and blowing to allow independent pre-training for each module before combining them for further fine-tuning. The heuristic blowing policy is to place the blower in the middle of the workspace facing forward. Because this heuristic policy can unfold the cloth somewhat, it provides a reasonable starting place from which to bootstrap training. The heuristic grasping policy uniformly samples 100 grasping position pairs on the cloth and selects the pair with the largest distance. Both pre-training and fine-tuning are performed in simulation, which take 300 and 200 epochs respectively. Each epoch contains 32 episodes and 64 optimization steps with a batch size of 16 for the grasping network and 128 for the blowing network. Two FIFO replay buffers (size=30000) are used to store training data. Both networks are implemented in PyTorch

[paszke2019pytorch] and trained using the Adam optimizer with a learning rate of 1e-4 and a weight decay of 1e-6.

Iv-D Modification for Bag Opening

In the bag opening task, we assume the bag is already lifted up at a random position and the algorithm only need to infer the blowing actions. We use the same blowing network architecture as in the unfolding task, but with a few modifications in action parameterization, reward signal, and directly train this policy on real-world data.

The blowing action is parameterized by , where () represents the blower’s position in the y-z plane (the blue region in Fig. 2c), while corresponds to the pitch angle of the blower. At each blowing step, a top-down depth observation as input and the blowing action with the highest score will be executed. A binary reward, based on opening state, is computed by thresholding the surface area of the bag observed via the side-view camera. Note that the threshold is specific to each bag to accommodate different bag sizes. We choose to use binary reward for training due to our observation that, for this task using air, bags seldom exist in an intermediate state (half-open, for instance), and a binary reward is much more data-efficient for training. To reduce noise in reward computation, at the beginning of each blowing action the blower turns on for after movement to ensure the bag is in a relatively stable state. Then, 400 images (20 fps) are captured by the side camera in the following and the binary reward is determined by the averaged bag area over these images. We directly train the blowing network with data collected by real-world robots via a random exploration. In total, we collected 4,400 (4,000 for training, 400 for validation) interactions over the course of 10 hours and trained the blowing network 50 epochs with a standard Cross Entropy loss.

V Evaluation

We evaluate DextAIRity on cloth unfolding §V-A and bag opening tasks §V-B. For both tasks, we evaluate task completion rate and ability to generalize to unseen cloths and bags on a real-world robot platform.

Fig. 5: Cloths and Bags used in Real-world Experiments.
Fig. 6: Cloth unfolding coverage v.s. steps.

V-a Cloth unfolding

Performance is measured by cloth coverage at the end of each episode. The coverage statistic is normalized by the maximum possible coverage of the cloth in a manually flattened configuration. Each episode contains at most 5 interaction steps, where each step includes both grasping and blowing actions, and the policy terminates an episode when it predicts a pivot position outside the cloth mask.

Simulation Task Generation: We generate five tasks for training and evaluation in simulation:

  • (Train) Normal Rect contains rectangular cloths that are smaller in size than the robot’s reach range. Edge lengths are uniformly sampled from   to   .

  • (Test) Large Rect / X-Large Rect contain rectangular cloths with at least one side larger than the reach range. Edge lengths are uniformly sampled from   to   for Large,   to   for X-Large.

  • (Test) Shirts / Dresses contain a subset of shirt and dress meshes from the CLOTH3D dataset [bertiche2020cloth3d]. The average areas are and for shirts and dresses respectively, which are significantly larger than the shirt meshes used in FlingBot [ha2021flingbot] whose average area is only .

To reduce sim2real gaps, non-textured cloths are colored randomly. Cloth mass is sampled from   to   and the stiffness of stretching, bending, and shearing is fixed at , , and respectively. To generate a severely crumpled initial configuration, the cloth is grasped at a random position, held at a random height between [, ], and dropped on the table to settle .

Real-World Tasks: Fig. 5 shows pictures of the four testing cloths used in our real-world experiments:

  • Large Rect, which is and .

  • X-Large Rect, which is and .

  • Tshirt, which is and .

  • Dress, which is and .

Note that the cloth items used in this experiment are all larger than those in FlingBot [ha2021flingbot], where the largest rectangle cloth is and t-shirt is .

Ablations: We compared with the following systems:

  • Pick&Place [lee2020learning]: predicts a single-arm grasping position and movement direction for quasi-static pick-and-place.

  • FlingBot [ha2021flingbot]: predicts a dual-arm grasping action for a dynamic flinging primitive.

  • FlingBot+: improved FlingBot uses our grasping policy to generate edge-coincident grasps.

  • DextAIRity-fixed: uses a fixed open-loop blowing action directed at the center of the workspace).

  • DextAIRity: the system uses a learned blowing policy to produce closed-loop blowing actions from visual feedback. This is the full non-ablated method we propose.

Rectangle CLOTH3D
Large X-Large Shirt Dress
Pick&Place 45.9 / 14.4 36.9 / 11.1 35.7 / 11.5 32.5 / 9.1
FlingBot 92.6 / 61.1 58.8 / 32.9 59.6 / 35.4 49.1 / 25.6
FlingBot+ 100.0 / 68.6 58.6 / 32.7 65.4 / 41.2 51.2 / 27.7
DextAIRity-fixed 86.1 / 54.6 79.9 / 54.0 67.8 / 43.5 65.9 / 42.4
DextAIRity 96.6 / 65.1 97.1 / 71.3 74.6 / 50.3 72.4 / 48.9
TABLE I: Simulation unfolding results (final / delta coverage).

Comparison with contact-based manipulation. Here, we compare [DextAIRity] with state-of-the-art manipulation methods using quasi-static [Pick&Place] and dynamic [FlingBot] actions. In [Large Rect], cloth size is within or slightly larger than the robot arm’s reach range. [DextAIRity] and [FlingBot] achieve similar performance (over 90% coverage after 3 steps), while [Pick&Place] achieves is only 45.9%.

We also investigate these approaches’ performance on cloth that is significantly larger than the robot’s reach range. In Tab. I, final coverage of [FlingBot] drops significantly when dealing with X-Large Rect (58.8 %), while [DextAIRity] can still achieve a very large coverage (97.1%). The failure of [FlingBot] is due to its limited move speed, which needs to be prohibitively high to fully unfold a cloth much larger than robot arm’s reach range. However, such high velocity is often dangerous and may not be feasible for many robots (e.g., the UR5 has a maximum speed of  1m/s). In contrast, airflow can easily gain a high initial speed and apply forces to surfaces that are far away, resulting in an efficient system that is still safe to operate around humans.

Overall, we find that quasi-static pick-and-place actions are generally inefficient for cloth unfolding and, while dynamic actions such as flinging can drastically improve efficiency, however, existing approaches still poses significant limitations when handling large or heavy cloth. Our experimental evaluation suggests that DextAIRity is a promising approach for quickly and efficiently unfolding for large cloth items without the need of high-speed movements and large robots.

Fig. 7: Qualitative results of cloth unfolding. Grasp predictions are visualized on the top-down image with cloth coverage labeled on the top right (row 1,2,4). The 3rd and 5th rows show the breakdown of fling and blowing actions respectively. DextAIRity learns to grasp cloth corners and blow toward the unfolded part of the cloth to maximize coverage. While FlingBot discovered a similar grasping strategy, it can only half-unfold the long dress due to the speed constraints. Please see supplementary video for more results.

Effectiveness of learned blowing policy. Compared with [DextAIRity-fixed], [DextAIRity] achieved higher final coverage(Tab. I). Qualitative results in Fig. 7 indicate that the blowing policy learns to detect and blow towards unfolded regions of the cloth, increasing effectiveness. As a result, coverage of X-Large Rect increases +23.0%, +13.3%, +1.6%, and +0.3% at each blow step compared to the fixed-policy ablation. In practice, while four blowing actions are executed after each grasping step, two were often sufficient.

Large Rect X-Large Rect Shirt Dress
Pick&Place 36.2 / 13.1 38.0 / 13.3 40.2 / 14.7 41.4 / 12.3
FlingBot 61.4 / 34.8 56.1 / 29.7 65.1 / 36.6 60.9 / 28.7
DextAIRity 95.2 / 62.3 90.2 / 60.3 86.6 / 54.7 89.6 / 56.8
TABLE II: Real-world unfolding results (final / delta coverage).

Benefit of edge-coincident grasp. Compared with [FlingBot], [FlingBot+] achieves slightly better unfolding efficiency with the same fling primitive (Fig. 6a). Although the edge-coincident grasping policy provides a marginal improvement to final performance, it improves training efficiency significantly because the system no longer needs to learn the grasp width parameter. Training of [FlingBot+] takes only 300 epochs, while [FlingBot] requires over 2,000 epochs to converge.

Generalization to unseen cloth types In this experiment, we investigate how well these approaches, trained on only rectangular cloths, generalize to novel cloth categories (shirts and dresses). Qualitative results in the real world (Fig. 7) suggest that even on out of distribution clothing, our learned grasping policy attempts to grasp cloth corners and the blowing policy preferentially directs air towards still crumpled regions of cloth. While this behavior is in-line with human intuition, we found the degree to which it generalized to novel cloth surprising. Similar to the in-distribution experiments, coverage plots in Fig. 6 show the learned policy typically requires only 2 actions to reach the maximum coverage (around 70% in simulation and 80% in real).

Evaluation on a physical robot. We directly evaluate the trained model with our real-world setup. To promote policy transfer from simulation to reality, we perform background removal and substitute a uniform-colored background consistent with the simulation environment. Tab. II shows performance averaged over 10 test episodes; our policy achieves over 80% on all cloth types, outperforming [FlingBot] and [Pick&Place] by roughly 60% and 40% respectively. Our qualitative comparison, Fig. 7, suggests [FlingBot] can successfully unfold shirts with width within the reach range but it fails (see the pink dress) when items become much longer. Even with the maximum fling speed, the dress can only be unfolded in half instead of fully unfolded. In contrast, both cloths can be successfully unfolded via [DextAIRity] using a commodity air blower. The running time of these three primitives is 3.6s (blow4), 2.9s (fling), and 1.8s(place). With comparable execution time per step, [DextAIRity] achieves the best performance with fewer interactions. Please see our video for more results.

Fig. 8: Qualitative results of bag opening. Bag state (normalized area if the bag is not opened) is labeled on the bottom right. Red arrows in Shake column indicate moving directions of end-effectors. Selected blowing actions are labeled as white lines. DextAIRity generates blowing actions based on input observation and refines the action in a closed-loop manner. Please see supplementary video for more results.
Fig. 9: Bag opening success rate v.s. steps.

V-B Bag opening

Task-performance for bag opening is measured by two metrics: 1) success rate: , and 2) normalized bag area: . is the number of testing cases, is the bag area of case , is the area threshold, and is the sign function. Note that the bag is considered open as long if its area is larger than the threshold, hence normalized area has a maximum value of 1.

Ablations: We compare our system with the following alternative approaches for bag opening:

  • [leftmargin=3mm]

  • Shake: moves the bag back-and-forth by rotating last joint and records the largest bag area during the process.

  • DextAIRity-fixed: A fixed policy which blows toward the center of the workspace.

  • DextAIRity: A closed-loop policy that predicts blowing actions based on visual observations. In each episode, we run the policy 4 times or until the bag is opened.

Training and testing bags: Fig. 5 shows our training and testing bags. We trained on a white cotton bag with size and a mass; we evaluate the learned policy on three bags with very different size and materials: [Small] a cotton bag (, ), [Medium] a plastic bag (, ), and [Large] a plastic bag (, ).

Results Success rate and normalized area of both the training bag and novel testing bags are shown in Tab. III and Fig. 9. We found that dynamic action [Shake] generally fails to open the bag while [DextAIRity-fixed] achieved a roughly 50% success rate on the testing bags. In contrast, [DextAIRity] achieves 60% success rate at the first interaction step and achieved a final success rate, after 4 blowing steps, of 88%.

Small Bag Yellow Bag Blue Bag
Shake 0.00 / 0.56 0.00 / 0.68 0.00 / 0.65
DextAIRity-fixed 0.40 / 0.86 0.52 / 0.92 0.56 / 0.94
DextAIRity 0.98 / 0.99 0.94 / 0.99 0.84 / 0.99
TABLE III: Real-world bag opening (succ rate / normalized area).

The improvement comes from two factors: 1) the information from visual observations (e.g. bag position and orientation) is utilized to infer a more effective blowing action (and explaining the performance gain during the first interaction step); 2) closed-loop manipulation allows the system to refine the blowing action adaptively and compensate for errors in the learned model. Qualitative results in Fig. 8 show that [DextAIRity] tends to blow horizontally in the first step and adopts a more top-down blowing action in the subsequent steps. This strategy allows the system to first cover more space using a horizontal blow and collect observations on how the bag will react to the blowing action. Then, in the following steps, the policy will generate a more targeted, and thus more effective, blowing action based on these visual observations. In addition (similarly to our findings in the cloth-unfolding domain) our policy trained on the cotton bag was able to generalize to highly novel bags with different size and material.

Fig. 10: Failure Cases. (a) A corner is inadvertently rolled up due to Eddy effects. (b) Multiple layers of the fabric are mistakenly grasped. (c) Two layers of the bag stick together and prevented air ingress.

Vi Limitations and Practical Considerations

While in this paper we demonstrate the effectiveness of directed air to manipulate deformable objects, we discuss a few limitations and practical considerations of deploying DextAIRity in real-world applications. First, during task execution, the air pump or blower is not particularly quiet. This issue could be mitigated with better sound isolation designs but could be problematic in some environments. Second, commodity air pumps do not turn on and off instantaneously, which might be an issue for applications requiring fast impulse forces. Third, accurate aerodynamic simulation is non-trivial; depending on the target application, real-world data may be required to successfully train the system, which can be expensive and time-consuming to collect (though we do note that it would not generally require human annotation).

In addition to these general constraints for air-based manipulation, Fig. 10 shows a few typical failure cases of our specific system: a) In the unfolding task, the corner of a fabric can be rolled up by the air due to unmodeled aerodynamics effects, for instance eddy. This failure highlights the complexity of using aerodynamics as part of a manipulation strategy. b) While the edge-coincident grasp significantly reduced the frequency of grasping multiple layers of fabric, these poor grasps still occasionally occurred (Fig. 10 b). c) In the bag opening task, the bag can sometimes get in challenging states where two layers of fabric stuck together and prevented air ingress regardless of blow angle. To address this case, the robot would require a coordinated policy that simultaneously adjusts all three arms [ha2020learning], suggesting an exciting avenues for future explorations.

Vii Conclusion

We propose a new method for deformable object manipulation that utilizes active airflow, which we term DextAIRity. We demonstrate the effectiveness of DextAIRity through two challenging deformable object manipulation tasks and we deployed the algorithm on a real-world three-arm system. Experiments suggest that DextAIRity can improve system efficiency for challenging manipulation tasks like cloth unfolding and enable new applications that were not possible with quasi-static contact-based manipulations such as bag opening. We hope that the proposed algorithm, tasks, and results will help in spur the field to explore broader and more powerful forms of non-contact-based manipulation.


We would like to thank Huy Ha, Dale McConachie, Naveen Kuppuswamy for their helpful feedback and fruitful discussions. This work was supported by the Toyota Research Institute, NSF CMMI-2037101 and NSF IIS-2132519. We would like to thank Google for the UR5 robot hardware. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.