Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds

by   Bao Thach, et al.

If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn to define a visual servo controller that provides Cartesian pose changes to the robot end-effector causing the object to deform towards its target shape. Crucially, we demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training and outperforms comparison methods for both the generic shape control and the surgical task of retraction.



There are no comments yet.


page 1

page 3

page 4

page 6


DeformerNet: A Deep Learning Approach to 3D Deformable Object Manipulation

In this paper, we propose a novel approach to 3D deformable object manip...

Towards Latent Space Based Manipulation of Elastic Rods using Autoencoder Models and Robust Centerline Extractions

The automatic shape control of deformable objects is a challenging (and ...

Learn the Manipulation of Deformable Objects Using Tangent Space Point Set Registration

Point set registration is a powerful method that enables robots to manip...

Sequential Topological Representations for Predictive Models of Deformable Objects

Deformable objects present a formidable challenge for robotic manipulati...

Adaptive Shape Servoing of Elastic Rods using Parameterized Regression Features and Auto-Tuning Motion Controls

In this paper, we present a new vision-based method to control the shape...

Automated stability testing of elastic rods with helical centerlines using a robotic system

Experimental analysis of the mechanics of a deformable object, and parti...

Predicting the Physical Dynamics of Unseen 3D Objects

Machines that can predict the effect of physical interactions on the dyn...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Manipulation of 3D deformable objects stands at the heart of many tasks we wish to assign to autonomous robots. For example, home-assistance robots must be able to manipulate objects such as sponges, mops, bedding, and food to help people with day-to-day life. Robots operating in warehouses should safely handle deformable containers such as bags and boxes in order to package outgoing orders. Factory robots benefit from the ability to remove deformable objects from containers. Most critically, surgical assistive robots are required to safely and precisely manipulate deformable tissue and organs.

However, 3D deformable object manipulation presents many challenges [38]. The shape of deformable objects require a potentially infinite number of degrees of freedom (DOF) to describe, compared to only 6 DOF for rigid objects. As a result, deriving low-dimensional but accurate and expressive state representations for deformable objects is difficult. An additional challenge compared with simpler linear deformable objects such as ropes and cloth arises as elastic 3D deformable objects cannot be released without returning to their initial configuration. Further, deformable objects frequently have complex dynamics [44], making the process of deriving a model laborious and potentially computationally intensive. These issues all present themselves in the specific problem we examine in this work: 3D deformable object shape control. The shape control problem requires a robot to manipulate the internal DOF of a 3D deformable object to reach a desired shape.

Fig. 1: Example initial and final configurations of object shape control via shape servo with DeformerNet on a physical robot using a laparoscopic tool. We visualize in red the goal point clouds given to the controller.

While rigid-body manipulation has received a large amount of study [22], due to the challenges listed above, autonomous 3D deformable object manipulation currently still remains an under-researched area [38, 10]

—despite its potential relevance and need. Existing work for 3D deformable shape control leverages hard-coded feature vectors to describe deformable object state 

[9], which struggles to represent large sets of shapes. While learning-based methods show great promise in both rigid [18, 26] and deformable object manipulation [10, 47], these methods require a large amount of training data. Due to the difficulty of accurately simulating deformable objects, existing methods for shape control rely on data gathered via real-world setups, limiting the efficacy of learning-based approaches. Further, the ability to successfully manipulate deformable material is heavily dependent on where the robot grasps an object, however current works do not provide methods for selecting grasping points conditioned on the desired post-grasp manipulation.

In this work, we take steps toward addressing each of these gaps in the context of 3D deformable shape control. Our method takes as input a partial-view point cloud representation of a 3D deformable object and a desired goal shape. We build our method around a novel neural-network architecture, DeformerNet, which is trained on a large amount of data gathered via a recently-developed high-fidelity deformable object simulator, Isaac Gym [15, 10, 21]. Our method first reasons over the initial and target shape to select a manipulation point. Following selection of this grasp point, DeformerNet takes as input the current and target point clouds of the object, embeds the shape into a low-dimensional latent space representation, and computes a change in end-effector position that moves the object closer to the goal shape. The robot executes this motion and proceeds in a closed-loop fashion generating commands from DeformerNet until reaching the desired goal shape. Figure 1 shows the initial and final configurations from an example manipulation using DeformerNet on a physical robot. Our results provide the first empirical demonstration of the importance of manipulation point selection for 3D shape control.

We focus our evaluation on the surgical robotics domain. We task a robot with manipulating three classes of object primitives into a variety of goal shapes using a laparoscopic tool. Unlike the preliminary results presented in our previous workshop paper [40]

we vary the dimensions and the stiffness properties of the objects. We demonstrate effective manipulation on test objects both in simulation and on a physical robot. Importantly we show that our method can manipulate objects that fall both inside and outside the distributions of object shape and stiffness seen in training. We show that our DeformerNet outperforms both a sampling-based strategy and a model-free reinforcement learning approach on the shape control task.

We additionally present a strategy for applying our method to the common surgical task of retraction where we simplify the need of a target shape to only specifying a plane which the deformable tissue needs to be on one side of. We demonstrate successful retraction both in simulation and on the physical robot. We make available all code and data associated with this paper at https://sites.google.com/view/deformernet/home.

Ii Related Work

Many approaches leverage machine learning with point cloud sensing to manipulate 3D rigid objects 

[27, 26, 4, 18, 17, 5]. Authors proposed various neural network architectures to encode object shape to achieve varying tasks such as grasp planing [26, 18, 17, 5], collision checking [27], shape completion [5]

, and object pose estimation 

[4]. In this work, we build upon these concepts to apply a learning-based approach which reasons over point cloud sensing with learned feature vectors to manipulate 3D deformable objects.

Solutions to 3D deformable object shape control [38] can be categorized into learning-based and learning-free approaches. Among the learning-free methods, a series of papers [32, 30, 33] define a set of geometric features on the object as the state representation. The authors use this representation to perform visual servoing with adaptive linear controller. These methods only work for known objects with distinct texture and cannot generalize to a diverse set of objects. This formulation controls the displacements of individual points which does not fully reflect the 3D shape of the object. For precise control, one must use a large number of feature points, making control highly susceptible to noise and occlusion. Other learning-free works [36, 31, 48] represent the object shape using 2D image contours; this severely limits the space of controllable 3D deformations.

Among learning-based 3D shape control methods, Hu et al. [9] represents the current state-of-the-art work in 3D shape control. Specifically, they use extended FPFH [37] to extract a feature vector from an input point cloud and learn to predict deformation actions via a neural network to control objects to desired shapes. However, we show that this architecture over-simplifies the complex dynamics of 3D deformable objects and thus struggles to learn to control to a diverse set of target shapes [40].

There has also been work on shape control of deformable objects that exhibit lower dimensional behavior, e.g., 1D objects such as rope, and 2D objects, such as cloth [44, 23, 47, 19, 24]. These methods typically either directly learn a policy using model-free RL that map RGB images of the object to robot actions [44, 23] or learn predictive models of the object under robot actions [47, 19, 20, 24]. These 1D and 2D works do not scale to the 3D deformable object shape control problem, either because they leverage lower dimensional object or sensing (e.g. RGB images) representation or the inherent physical differences between 1D, 2D, and 3D objects (e.g. 3D elastic tissue will return to its initial shape after released).

With respect to surgical robotics, several learning-based approaches have been applied to other surgical tasks including suturing [42, 3], cutting [41, 28], tissue tracking [16], and simulation [46]. In this work we apply our method to surgical retraction. Attanasio et al. [2]

propose the use of surgeon-derived heuristic motion primitives to move tissue flaps identified by a vision system. In 

[11], a grasp location and planar retraction trajectory is computed with a linearized potential energy model leveraging online simulation. In [25], a logic-based task planner is leveraged which guarantees interpretability, however this work focuses on manipulating a single thin tissue sheet and does not show shape or material property generalization or validation on a physical robot. Nagy et al. [29] propose the use of stereo vison accompanied by multiple control methods, however the method assumes a thin tissue layer and a clear view of two tissue layers. Pore et al. [34] introduce a model-free reinforcement learning method which learns safe motions for a robot’s end effector during retraction, however it does not explicitly reason over the deformation of the tissue. We compare against a similar approach, using the same model-free reinforcement learning algorithm, but adapted to our task to explicitly reason over the tissue state.

Iii Problem Formulation

We address the problem of robotically manipulating a 3D deformable object from an initial shape to a goal shape. In this context, 3D refers to triparametric or volumetric objects [38] which have no dimension significantly smaller than the other two, unlike uniparametric (e.g., rope) and biparametric objects (e.g., cloth).

We define the shape of the 3D volumetric object to be manipulated as , noting that it will change over time as the robot manipulates it and the object interacts with the environment. As typical robots cannot directly sense , we consider a partial-view point cloud as a subset of the points on the surface of , due to the prevalence of sensors that produce point clouds. We define the point cloud representing the initial shape of the object as , the goal shape for the object as , and the shape of the object at a given intermediate point in time .

We note that the successful manipulation of a deformable object depends on the point on the object the robot grasps, i.e., the manipulation point (see Fig. 2). As such, we present the first problem as the selection of a manipulation point, which we define as .

Fig. 2: Importance of manipulation point (MP) selection. Leftmost: goal shape; Red box: successful MP; Blue box: failed MP.

Having grasped the object, the robot can change that object’s shape by moving its end-effector and in turn moving the manipulation point of the object. We define a manipulation action as a change in the manipulation point, formally . The resulting problem then becomes to define a policy , which maps the point cloud representing the object shape and the goal point cloud to an action vector describing the change in manipulation point that drives the object toward the goal shape, i.e., . The repeated application of a successful policy results in a manipulation point trajectory, which when executed by the robot, results in transforming the object from its initial shape to a goal shape.

Iv Method

In this section we explain the details of our proposed approach. We first explain our shape servo [31] approach to create a feedback policy for 3D deformable object shape control. Following this we give details of the DeformerNet network architecture at the heart of our shape servo policy. Finally in this section we present our approach to selecting a manipulation point, conditioned on the goal configuration, used by the robot while performing shape control.

Iv-a Shape Servo Control with DeformerNet

The shape servo formulation [31, 9] uses visual feedback, here in the form of partial-view point clouds of the object being manipulated, as input to a policy that computes a robot action that attempts to instantaneously bring the current shape, closer to the target shape, .

Following the notation from Sec. III we seek to construct a shape servo policy of the form

. We decompose our policy into two stages: (1) a feature extraction stage and (2) a deformation controller (c.f. Fig. 

3 top).

The feature extractor takes a point cloud as input and outputs a shape feature vector we define as . We use two parallel feature extraction channels taking as input and and generating feature vectors and respectively. We then take the difference of these two to define the feature displacements: .

Our deformation control function, , takes this feature displacement as input and outputs the desired instantaneous change in end-effector position, hence: .

The composite shape servo policy thus takes the form . We then use a resolved rate controller to compute the desired joint velocities following the desired end-effector displacement output by our shape servo policy .

Training this model takes a straightforward supervised approach. We simply record the robot manipulating an object of interest, set the terminal object point cloud as , select any previous point cloud from the trajectory as and the associated end-effector displacement between the two configurations as . We give further details of this training procedure in Sec. V.

Fig. 3: (Top) Architecture of DeformerNet; (Bottom) architecture of the feature extraction module.

Iv-B DeformerNet Architecture Details

As described previously, DeformerNet consists of two stages: feature extraction and deformation control. Our feature extractor uses three successive PointConv [43] convolutional layers that successively output clouds of dimension (64, 512), (128, 256) and ultimately a 256-dimension vector that acts as the shape feature. We downsample the input current, , and goal point clouds, , to 1024 points using the furthest point sampling method from [35] before inputting them into the network. We provide full details of the architecture in the bottom of Fig. 3.

The deformation control stage takes this 256-dimension differential feature vector

and passes it through a series of fully-connected layers (128, 64, and 32 neural units, respectively). The fully-connected output layer produces the desired 3D displacement. We use ReLU activation function and group normalization 

[45] for all convolutional and fully-connected layers except for the linear output layer.

We use the standard mean squared error loss function for training

DeformerNet. We adopt the Adam optimizer and a decaying learning rate which starts at

and decreases by 1/10 every 50 epochs.

Iv-C Manipulation point prediction

As discussed above and shown in Fig. 2 the location at which the robot grasps the object greatly influences whether the robot can reach a target shape. As such we present here an approach to selecting an appropriate manipulation point prior to performing the shape control task. Recall we wish to find a manipulation point on the surface of the object, . However, we must infer this location given the initial and target point clouds, prior to acting. We propose the use of a keypoint-based heuristic to select the manipulation point. Our preliminary work [40] showed this heuristic slightly outperformed a regression-based approach.

Our heuristic follows a simple idea, points that move more should generally be closer to the manipulation point. Assume we have a set of keypoint matches between the initial and goal point cloud. We define the associated keypoint displacements as . We then estimate the manipulation point as the location defined by the displacement-weighted average of the keypoints with largest displacement.

We use an unsupervised keypoint detection algorithm based on the Transporter Network of [13]. The original Transporter network defines an unsupervised reconstruction loss between source and target image pairs from a video sequence. To adapt transporters to our 3D manipulation point prediction problem, we leverage pairs of source-target point clouds collected in simulation to train the model. We convert the point cloud data to an organized, array-like point cloud format to make them compatible with the original Transporter network architecture.

V Experiments and results

We evaluate our method in both simulation, via the Isaac Gym environment [15], and on a real robot. For both simulation and real robot experiments, training data for the learned models are generated in Isaac Gym. In Isaac Gym, we use a simulation of a patient-side manipulator of the daVinci research kit (dVRK) [12] robot to manipulate objects (see Fig. 4 (right)). For the real robot experiments, we use a Baxter research robot with a laparoscopic tool attached to its end effector and an Azure Kinect camera generating point clouds of the deformable object (see Fig. 1). In both cases, we affix one end of the deformable object to the environment and task the robot with manipulating it via one grasp point.

Fig. 4:

(Left) We train on random interpolations of these shapes. (Right) Experimental setup showing a patient-side manipulator of the dVRK in Issac gym.

V-a Goal-Oriented Shape Servoing

We evaluate our method’s ability to deform the object to the goal point cloud. In our previous workshop paper [40], we reported the performance of our method when the model was trained and tested on one object geometry and Young modulus and demonstrated that our method outperforms a current state-of-the-art method for learning-based 3D shape servoing by Hu et al. [9].

V-A1 Training Data Generation

We expand on this evaluation in this work by first evaluating our method’s ability to control the shape of a variety of 3D deformable object shape primitives, including hemispheres, rectangular boxes, and cylinders (see Fig. 4). For each primitive, we investigate three different stiffness values (represented by Young’s modulus): 1 kPa, 5 kPa, and 10kPa, which represent stiffness properties similar to those seen across different biological tissues [8, 7]. The three shape primitives, each with three stiffness values result in a total of nine object types for evaluation.

For each of the nine object types, we create a training dataset of objects with geometries sampled uniformly at random from interpolations between the sizes of the shapes in Fig. 4

. In addition, each object for training is assigned a Young modulus sampled from a Gaussian distribution with means and standard deviations of (1kPa, 0.2kPa), (5kPa, 1kPa), and (10kPa, 1kPa) for the 1 kPa, 5 kPa, and 10 kPa test scenarios, respectively.

We generate each training dataset by randomly sampling 300 pairs of initial object configurations and manipulation points. For each pair, the robot deforms the object to 10 random shapes for a total of 3000 random trajectories. We record partial-view point clouds of the object and the robot’s end-effector positions at multiple checkpoints during the execution of this trajectory using the depth camera available inside the Issac gym environment. We form supervised data input-output pairs for training DeformerNet. The input, consists of a point cloud along the trajectory at any arbitrary time , as well as the point cloud at the end of this trajectory. We compute the output, , as the displacement between the end-effector position at time and the end of trajectory. We sample 10,000 such pairs of data points for training our model.

V-A2 Generalization Performance

We are interested in evaluating the performance of our method on test scenarios that are both inside and outside the training distributions in simulation. To generate test scenarios outside the training distribution, we sample objects with random dimensions smaller than the minimum and larger than the maximum of each of our primitive-shaped objects. We additionally sample Young moduli with values 2-4 standard deviations from the mean of the training distribution moduli. For each test scenario we select 10 random objects from inside the training distribution and 10 from outside the training distribution. We then sample 10 random goal shapes for each of the 20 test objects. We select the manipulation point for testing using our keypoint-based heuristic with keypoints.

We use Chamfer distance as our primary evaluation metric to describe how close the final manipulated object’s point cloud is to the goal point cloud. Chamfer distance computes the average distance of each point in one point cloud to the closest point in the other point cloud,

. Fig. 5

visualizes the result of each object type with a boxplot recorded over the 20 test objects with 10 goal shapes each. The box represents the quartiles, the center line the median, and the whiskers represent min and max final Chamfer distance. For visualization purpose, we also provide a sample snapshot of the robot performing shape servoing to a goal shape in Fig. 


Fig. 5: Distribution of Chamfer distance after shape servoing. “Inside” reports for object inside the training distribution and “outside” for random objects outside the training distribution.

The experiment results show that our method is capable of generalizing what it learns from training to adapt to geometries, material properties, and goal shapes it has never seen before, both inside and outside the training distribution, although predictably with some fall off in performance outside the training distribution.

V-A3 Baseline Comparisons

We also compare the performance of our method against Rapidly-exploring Random Tree (RRT) [14] and model-free Reinforcement Learning (RL) for the 3D shape servo problem. Here we restrict the task to be trained and tested on a single box object as described in [40] and use only one manipulation point throughout training and testing.

For the RRT implementation, we define the configuration space as the joint angles of the dVRK manipulator. We define a goal region as any object point cloud that has Chamfer distance less than some tolerance from the goal point cloud. We use the finite element analysis model [21] in the Isaac Gym [15] simulator to derive the forward model for RRT.

We use proximal policy optimization (PPO) [39] (as in [34]) with hindsight experience replay (HER) [1] for model-free RL. We use our DeformerNet architecture for the actor and critic network except for the critic output being set to single scalar to encode the value function. Each episode we condition the policy on a newly sampled goal shape. We train the RL agent with 100,000 samples—10 times the amount of data provided to DeformerNet.

We evaluate DeformerNet, RRT, and model-free RL with 10 random goal shapes. Fig. 6 shows the success rate of the three methods at different levels of goal tolerance.

Fig. 6: Success rate comparison of DeformerNet to RRT and RL baselines for varying levels of goal tolerance.

We clearly see that even with 10 times the training data compared to our method, the model-free RL agent achieves a significantly lower success rate compared to the other two methods. We also note that while RRT succeeds comparably to our method at looser goal tolerances, at tighter goal tolerances RRT fails more often. Further, unlike our method, RRT does not incorporate feedback during execution. As such RRT will not be able to recover if the object shape deviates from the plan. While one might think to perform replanning, we note that RRT requires several orders of magnitude more computation time required than our shape servoing approach. For instance, at a tolerance of 0.4 (where both our method and RRT achieve 100% success), over the 10 test goal shapes, the lowest computation time required by RRT was 3.3 minutes, the highest was 121.6 minutes, mean was 38.7 minutes, and standard deviation was 40.1 minutes. Our DeformerNet, however, only requires a pass through the neural network which takes minimal time. As a result, for this task, we note a significant success rate improvement for our method over model-free RL, a success rate improvement at strict goal tolerance values over RRT, and a significant computation time improvement over RRT in all cases.

Fig. 7: Sample manipulation sequences of DeformerNet in different setups. The sparse red clouds visualize the target shapes of the object. First row: with simulated dVRK in Isaac Gym (0.18 m final Chamfer dist.). Second row: with physical robot and real goal point clouds (0.30m final Chamfer dist.). Third row: with physical robot and simulated goal point clouds (0.39m final Chamfer dist.).
Fig. 8: Top row: simulated retraction experiment setup (left) and a sample successful retraction sequence with target plane visualized in blue. Bottom row: visualization of target planes for physical robot retraction (left) and a successful sequence with target plane in red.

V-A4 Physical Robot Goal-Oriented Shape Servoing

We next evaluate our method’s ability to perform shape servoing on the real robot, while having been trained entirely in simulation. The experimental setup (shown in Fig. 1) leverages a foam box affixed on one side to a table. We segment the object’s point cloud out from the rest of the scene by fitting a plane to the table with RANSAC [6] and selecting the points above this planes. We filter out the black table clamp and the laparoscopic tool using pixel intensity.

We generate three distinct goal shapes (Fig. 10 (left)) by manually moving the object to random shapes with the laparoscopic tool and recording the resulting point cloud. Figure 9 describes the success rate of the 15 trials over different goal tolerance levels. Figure 7 visualizes a typical manipulation sequence.

Fig. 9: Success rate of DeformerNet with physical robot when given real goal point clouds and simulated goal point clouds.
Fig. 10: Left: goal point clouds from real sensor recordings. Right: goal point clouds generated in simulation.

To showcase our method’s robustness, we additionally evaluate on 3 goal point clouds obtained entirely from the simulator (Fig. 10 (right)). Figure 9 visualizes the success rate of the 15 trials over different goal tolerance levels. A sample visualization is provided in Fig. 7. Overall we note a slight drop in quantitative performance in the real world compared to simulation, while qualitatively still succeeding.

V-B Surgical Retraction

We next evaluate our method’s ability to perform a mock surgical retraction task, in which a thin layer of tissue is positioned on top of a kidney. We task the robot with grasping the tissue layer and lifting it up to expose the underlying area. Figure 8 (top, left) shows the simulation environment composed of a kidney model with a deformable tissue layer placed over it and fixed to the kidney on one side. We train DeformerNet on a box object similar in dimensions to the tissue layer, but without the kidney present.

Instead of requiring the operator (e.g. surgeon) to provide an explicit shape for the robot to servo the tissue to, we instead just require them to define a plane which the tissue should be folded to one side of. An example plane can be seen in Fig. 8. We use a simple algorithm to infer a goal point cloud for the object based on this target plane. We use RANSAC [6] to find a dominant plane in the object cloud and then find the minimum rotation to align this plane with the target plane. We then apply this estimated transformation to any points not lying on the correct side of the plane and set this as the target cloud along with the points currently satisfying the goal. If after reaching the goal point cloud any part of the object still resides on the wrong side of the plane, we shift the target plane further into the goal region along the plane’s normal vector and repeat the entire process.

To evaluate we sample 100 random planes with differing orientations in simulation and task the method with moving the tissue layer beyond the plane. Our approach reveals the kidney underneath with a success rate of 95%.

We also evaluate retraction on the physical robot. We affix a thin layer of foam to the table and task the robot with moving the object via the laparoscopic tool beyond a target plane. We evaluate on 3 different planes (see Fig. 8), and for each plane conduct 5 trials. We observe a 100% success rate across the 15 trials. We provide visualizations of representative retraction experiments in Fig. 8.

Vi Conclusions

In this paper we have presented a novel-approach to closed-loop 3D deformable object shape control. Crucially we demonstrate through rigorous simulated and physical-robot experiments that shape servoing with DeformerNet can manipulate objects with novel material properties or shape while only requiring a partial-view 3D point cloud as input. We further demonstrate how our shape servoing approach can be adapted to the task of surgical retraction, where a much simpler goal representation in the form of a separating plane needs only be provided. Our future work aims to extend our manipulation approach to more surgical tasks beyond retraction as well as to task of manipulating deformable 3D objects common to homes and warehouses. Finally, we wish to move beyond our greedy, visual servoing approach to provide more explicit planning for longer-horizon tasks.


This work was supported in part by NSF Award #2024778.


  • [1] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba (2017) Hindsight experience replay. arXiv preprint arXiv:1707.01495. Cited by: §V-A3.
  • [2] A. Attanasio, B. Scaglioni, M. Leonetti, A. F. Frangi, W. Cross, C. S. Biyani, and P. Valdastri (2020-10) Autonomous Tissue Retraction in Robotic Assisted Minimally Invasive Surgery – A Feasibility Study. IEEE Robotics and Automation Letters 5 (4), pp. 6528–6535. External Links: ISSN 2377-3766, Document Cited by: §II.
  • [3] Z. Chiu, F. Richter, E. K. Funk, R. K. Orosco, and M. C. Yip (2021) Bimanual Regrasping for Suture Needles using Reinforcement Learning for Rapid Motion Planning. IEEE Intl. Conf. on Robotics and Automation. Cited by: §II.
  • [4] X. Deng, Y. Xiang, A. Mousavian, C. Eppner, T. Bretl, and D. Fox (2020)

    Self-supervised 6D Object Pose Estimation for Robot Manipulation

    IEEE Intl. Conf. on Robotics and Automation, pp. 3665–3671. Cited by: §II.
  • [5] M. V. der Merwe, Q. Lu, B. Sundaralingam, M. Matak, and T. Hermans (2020) Learning Continuous 3D Reconstructions for Geometrically Aware Grasping. In IEEE International Conference on Robotics and Automation (ICRA), External Links: Link Cited by: §II.
  • [6] M. Fischler and R. Bolles (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (6), pp. 381–395. Cited by: §V-A4, §V-B.
  • [7] A. M. Handorf, Y. Zhou, M. A. Halanski, and W. Li (2015) Tissue stiffness dictates development, homeostasis, and disease progression. Organogenesis 11 (1), pp. 1–15. External Links: Link Cited by: §V-A1.
  • [8] B. Hinz (2012) Mechanical aspects of lung fibrosis: a spotlight on the myofibroblast. Proc Am Thorac Soc 9 (3), pp. 137–47. External Links: Link Cited by: §V-A1.
  • [9] Z. Hu, T. Han, P. Sun, J. Pan, and D. Manocha (2019) 3-D Deformable Object Manipulation Using Deep Neural Networks. IEEE Robotics and Automation Letters 4 (4), pp. 4255–4261. External Links: Link Cited by: §I, §II, §IV-A, §V-A.
  • [10] I. Huang, Y. Narang, C. Eppner, B. Sundaralingam, M. Macklin, T. Hermans, and D. Fox (2021) DefGraspSim: Simulation-based grasping of 3D deformable objects. In RSS Workshop on Deformable Object Simulation in Robotics (DO-Sim), External Links: Link Cited by: §I, §I.
  • [11] R. Jansen, K. Hauser, N. Chentanez, F. Van Der Stappen, and K. Goldberg (2009) Surgical retraction of non-uniform deformable layers of tissue: 2D robot grasping and path planning. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4092–4097. Cited by: §II.
  • [12] P. Kazanzides, Z. Chen, A. Deguet, G. S. Fischer, R. H. Taylor, and S. P. DiMaio (2014-05)

    An open-source research kit for the da Vinci® Surgical System

    In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 6434–6439. External Links: Document Cited by: §V.
  • [13] T. D. Kulkarni, A. Gupta, C. Ionescu, S. Borgeaud, M. Reynolds, A. Zisserman, and V. Mnih (2019) Unsupervised learning of object keypoints for perception and control. Advances in Neural Information Processing Systems 32, pp. 10724–10734. External Links: Link Cited by: §IV-C.
  • [14] S. M. LaValle and J. J. Kuffner Jr (2001) Randomized kinodynamic planning. Intl. Journal of Robotics Research 20 (5), pp. 378–400. Cited by: §V-A3.
  • [15] J. Liang, V. Makoviychuk, A. Handa, N. Chentanez, M. Macklin, and D. Fox (2018) GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning. arXiv:1810.05762. External Links: Link Cited by: §I, §V-A3, §V.
  • [16] J. Lu, A. Jayakumari, F. Richter, Y. Li, and M. C. Yip (2021)

    SuPer Deep: A Surgical Perception Framework for Robotic Tissue Manipulation using Deep Learning for Feature Extraction

    IEEE Intl. Conf. on Robotics and Automation. Cited by: §II.
  • [17] Q. Lu, M. V. der Merwe, and T. Hermans (2020) Multi-Fingered Active Grasp Learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), External Links: Link Cited by: §II.
  • [18] Q. Lu, M. V. der Merwe, B. Sundaralingam, and T. Hermans (2020) Multi-Fingered Grasp Planning via Inference in Deep Neural Networks. IEEE Robotics & Automation Magazine (Special Issue on Deep Learning and Machine Learning in Robotics) 27 (2), pp. 55–65. External Links: Link Cited by: §I, §II.
  • [19] X. Ma, S. Chen, D. Hsu, and W. S. Lee (2020) Contrastive variational model-based reinforcement learning for complex observations. Conference on Robot Learning. Cited by: §II.
  • [20] X. Ma, D. Hsu, and W. S. Lee (2021) Learning Latent Graph Dynamics for Deformable Object Manipulation. arxiv. External Links: Link Cited by: §II.
  • [21] M. Macklin, K. Erleben, M. Müller, N. Chentanez, S. Jeschke, and V. Makoviychuk (2019) Non-Smooth Newton Methods for Deformable Multi-Body Dynamics. ACM Trans. on Graphics. Cited by: §I, §V-A3.
  • [22] M. T. Mason (2018) Toward Robotic Manipulation. Annual Review of Control, Robotics, and Autonomous Systems 1 (1), pp. 1–28. Cited by: §I.
  • [23] J. Matas, S. James, and A. J. Davison (2018) Sim-to-real reinforcement learning for deformable object manipulation. Conference on Robot Learning, pp. 734––743. Cited by: §II.
  • [24] D. McConachie and D. Berenson (2017) Bandit-Based Model Selection for Deformable Object Manipulation. arXiv:1703.10254. Cited by: §II.
  • [25] D. Meli, E. Tagliabue, D. Dall’Alba, and P. Fiorini (2021) Autonomous tissue retraction with a biomechanically informed logic based framework. arXiv preprint arXiv:2109.02316. Cited by: §II.
  • [26] A. Mousavian, C. Eppner, and D. Fox (2019) 6-dof graspnet: Variational grasp generation for object manipulation.

    Intl. Conf. on Computer Vision

    , pp. 2901–2910.
    Cited by: §I, §II.
  • [27] A. Murali, A. Mousavian, C. Eppner, C. Paxton, and D. Fox (2020) 6-DOF Grasping for Target-driven Object Manipulation in Clutter. IEEE Intl. Conf. on Robotics and Automation, pp. 6232–6238. Cited by: §II.
  • [28] A. Murali, S. Sen, B. Kehoe, A. Garg, S. McFarland, S. Patil, W. D. Boyd, S. Lim, P. Abbeel, and K. Goldberg (2015-05) Learning by Observation for Surgical Subtasks: Multilateral Cutting of 3D Viscoelastic and 2D Orthotropic Tissue Phantoms. In IEEE Intl. Conf. on Robotics and Automation, pp. 1202–1209. External Links: ISSN 1050-4729, Document Cited by: §II.
  • [29] T. D. Nagy, M. Takács, I. J. Rudas, and T. Haidegger (2018) Surgical subtask automation—Soft tissue retraction. In 2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000055–000060. Cited by: §II.
  • [30] D. Navarro-Alarcon, Y. Liu, J. G. Romero, and P. Li (2014) On the visual deformation servoing of compliant objects: uncalibrated control methods and experiments. Intl. Journal of Robotics Research 33 (11), pp. 1462–1480. Cited by: §II.
  • [31] D. Navarro-Alarcon and Y. Liu (2018) Fourier-Based Shape Servoing: A New Feedback Method to Actively Deform Soft Objects into Desired 2-D Image Contour. IEEE Trans. on Robotics 34 (1), pp. 272–1279. External Links: Link Cited by: §II, §IV-A, §IV.
  • [32] D. Navarro-Alarcon, Y. Liu, J. G. Romero, and P. Li (2013) Visually servoed deformation control by robot manipulators. IEEE Intl. Conf. on Robotics and Automation, pp. 5259–5264. External Links: Link Cited by: §II.
  • [33] D. Navarro-Alarcon, H. M. Yip, Z. Wang, Y. Liu, F. Zhong, T. Zhang, and P. Li (2016) Automatic 3-D Manipulation of Soft Objects by Robotic Arms With an Adaptive Deformation Model. IEEE Trans. on Robotics 32 (2), pp. 429–441. External Links: Link Cited by: §II.
  • [34] A. Pore, D. Corsi, E. Marchesini, D. Dall’Alba, A. Casals, A. Farinelli, and P. Fiorini (2021) Safe Reinforcement Learning using Formal Verification for Tissue Retraction in Autonomous Robotic-Assisted Surgery. arXiv:2109.02323. Cited by: §II, §V-A3.
  • [35] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation.

    IEEE Conf. on Computer Vision and Pattern Recognition

    External Links: Link Cited by: §IV-B.
  • [36] J. Qi, G. Ma, J. Zhu, P. Zhou, Y. Lyu, H. Zhang, and D. Navarro-Alarcon (2021)

    Contour Moments Based Manipulation of Composite Rigid-Deformable Objects with Finite Time Model Estimation and Shape/Position Control

    arXiv:2106.02424. External Links: Link Cited by: §II.
  • [37] R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu (2010) Fast 3D recognition and pose using the Viewpoint Feature Histogram. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 2155–2162. External Links: Link Cited by: §II.
  • [38] J. Sanchez, J. A. C. Ramon, B. Bouzgarrou, and Y. Mezouar (2018) Robotic Manipulation and Sensing of Deformable Objects in Domestic and Industrial Applications: A Survey. Intl. Journal of Robotics Research 37 (7), pp. 688–716. External Links: Link Cited by: §I, §I, §II, §III.
  • [39] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §V-A3.
  • [40] B. Thach, A. Kuntz, and T. Hermans (2021) DeformerNet: A Deep Learning Approach to 3D Deformable Object Manipulation. In RSS Workshop on Deformable Object Simulation in Robotics (DO-Sim), External Links: Link Cited by: §I, §II, §IV-C, §V-A3, §V-A.
  • [41] B. Thananjeyan, A. Garg, S. Krishnan, C. Chen, L. Miller, and K. Goldberg (2017) Multilateral surgical pattern cutting in 2d orthotropic gauze with deep reinforcement learning policies for tensioning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2371–2378. Cited by: §II.
  • [42] J. van den Berg, S. Miller, D. Duckworth, H. Hu, A. Wan, K. Goldberg, and P. Abbeel (2010-05) Superhuman Performance of Surgical Tasks by Robots Using Iterative Learning from Human-Guided Demonstrations. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 2074–2081. External Links: Document, ISBN 978-1-4244-5038-1 Cited by: §II.
  • [43] W. Wu, Z. Qi, and L. Fuxin (2019) PointConv: Deep Convolutional Networks on 3D Point Clouds. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 9613–9622. External Links: Link Cited by: §IV-B.
  • [44] Y. Wu, W. Yan, T. Kurutach, L. Pinto, and P. Abbeel (2020) Learning to manipulate deformable objects without demonstrations. Robotics: Science and Systems. External Links: Link Cited by: §I, §II.
  • [45] Y. Wu and K. He (2018) Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19. Cited by: §IV-B.
  • [46] J. Xu, B. Li, Y. L. Bo Lu, Q. Dou, and P. Heng (2021) SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning. arXiv:2108.13035. Cited by: §II.
  • [47] W. Yan, A. Vangipuram, P. Abbeel, and L. Pinto (2020) Learning predictive representations for deformable objects using contrastive estimation. In Conference on Robot Learning, External Links: Link Cited by: §I, §II.
  • [48] J. Zhu, D. Navarro-Alarcon, R. Passama, and A. Cherubini (2021) Vision-based manipulation of deformable and rigid objects using subspace projections of 2d contours. Robotics and Autonomous Systems 142, pp. 103798. External Links: ISSN 0921-8890, Document, Link Cited by: §II.