I Introduction
Performing dexterous manipulation policies benefits from a robust estimate of the pose of the object held inhand. Despite recent advances in pose estimation and tracking using vision feedback
[49, 31, 12], inhand object pose tracking still presents a challenge due to significant occlusions. As such, works that require inhand object poses are currently limited to experiments where the object is mostly visible or rely on multiple cameras [3], or the handobject transform is fixed or known [41, 52]. To mitigate the issue of visual occlusions, previous works have studied object pose estimation via contacts or tactile feedback, often by using particle filters and knowledge of the object geometry and contact locations. These techniques have been mostly applied to the staticgrasp setting, where the object is stationary and ingrasp. Extending these techniques to tracking object poses during inhand manipulation is difficult, requiring modeling of complex objecthand contact dynamics.To tackle the problem of inhand object tracking during robot manipulation, we propose combining a GPUaccelerated, highfidelity physics simulator [34] as the forward dynamics model with a samplebased optimization framework to track object poses with contacts feedback (Figure 1). First, we initialize a concurrent set of simulations with the initial states of the real robot and the initial pose of the real object, which may be obtained from a visionbased pose registration algorithm assuming the object is not in occlusion in the beginning. The initial poses of the simulated objects are slightly perturbed and reflect the uncertainty of the visionbased pose registration algorithm. The GPUaccelerated physics simulator runs many concurrent simulations in realtime on a single GPU. As a given policy controls the real robot to approach, grasp, and manipulate the object inhand, we run the same robot control commands on the simulated robots. We collect observations of the real robot and the simulated robots, which include terms like the magnitude and direction of contacts on the robot hand’s contact sensors. Then, a samplebased optimization algorithm periodically updates the states and parameters of all simulations according to a cost function that captures how well the observations of each simulation matches with those of the real world. In addition, the algorithm also updates simulation parameters, such as mass and friction, to further improve the simulations’ dynamics models of the real world. At any point in time, the object pose estimate is the pose of the robotobject system.
To evaluate the proposed algorithm, we collected a total of inhand manipulation trajectories with different objects in simulation and in the real world. For experiments, we use the Kuka IIWA7 arm with the 4finger Wonik Robotics Allegro hand as the endeffector, with each finger outfitted with a SynTouch BioTac contact sensor. Object manipulation trajectories are human demonstrations collected via a handtracking teleoperation system. Because we have groundtruth object poses in simulation, we performed detailed ablation studies in simulation experiments to study the properties of the proposed algorithm. For realworld experiments, we use a visionbased algorithm to obtain the object pose in the first and last frame of the collected trajectories, where the object is not in occlusion. The pose in the first frame is used to initialize the simulations, and the pose in the last frame is used to evaluate the accuracy of the proposed contactbased algorithm.
Ii Related Works
Prior works have studied identifying inhand objectpose with vision only, usually by first segmenting out the robot or human hand in an image before performing pose estimation [10, 25]. However, visiononly approaches degrade in performance for larger occlusions. Another approach is to use tactile feedback to aid object pose estimation. Tactile perception can identify object properties such as materials and pose [33], as well as provide feedback during object manipulation [9, 29, 28].
For the task of planar pushing where the object is visible, prior works have studied tracking object poses using particle filtering with contacts and vision feedback [26]. The authors of [30] compared a variety of dynamics models and particle filter techniques, and they found that adding noise to applied forces instead of the underlying dynamics yielded more accurate tracking results. One work combined tactile feedback with a visionbased object tracker to track object trajectories during for planar pushing tasks [27], and another applied incremental smoothing and mapping (iSAM) to combine global visual pose estimations with local contact pose readings [50].
For inhand object pose estimation with tactile feedback, many prior works have explored this problem in the “staticgrasp” context, where the robot hand grasps an object and localizes the object pose without moving. These works can be separated into two groups: 1) using point contact locations and 2) using a full tactile map to extract local geometry information around the contacts.
To use contact location feedback for pose estimation, many methods use a variation of Bayesian or particle filtering [11, 39, 6, 51, 24, 4, 48, 2, 13]. In [19] the authors perform filtering jointly over visual features, hand joint positions, forcetorque readings, and binary contact modes. Similar techniques can be applied to pose estimation when the object is not held by the robot hand as well by using force probes [36, 42].
To use tactile maps for pose estimation, earlier works used large, lowresolution tactile arrays to sense contacts in a grid [37, 9], while more recent works use highresolution tactile sensors mounted on robot finger tips. For example, the algorithm in [5] searches for similar local patches on an object surface to localize the object with respect to the contact location, and the one in [22] fuses GelSight data with a point cloud perceived by a depth sensor before performing pose estimation.
By contrast to the case of staticgrasps, our work tackles the more challenging problem of tracking inhand object pose during manipulation. Prior works have also explored this context. In [43] the authors propose an algorithm that combines contact locations with Dense Articulated Realtime Tracking (DART) [44], a depth imagebased object tracking system, while in [38] the algorithm fuses contact locations with color visual features, joint positions, and forcetorque readings. The former algorithm is sensitive to initialization of the object poses, especially when the object appears small in the depth image. The latter work conducted experiments where the objects were mostly visible, so matching visual features alone would give reasonable pose estimates. In addition, neither work explicitly models the dynamics of the robotobject interaction, which limits the type of manipulation tasks during which the object pose can be tracked. To address these challenges, our approach does not assume access to robust visual features during manipulation. Instead, it uses a physics simulator to model both the kinematics and the dynamics of the robotobject system.
Iii Method
Iiia Problem Statement
We consider the problem of tracking the pose of an object held inhand by a robot manipulator during object manipulation. At some time , let the object pose be . We define a physics dynamics model , where is the state of the world (position and velocities of rigid bodies and of joint angles in articulated bodies), the robot controls (we use desired joint positions as the action space), and the fixed parameters of the simulation (e.g., mass and friction).
For a simulation model that exactly matches reality given the perfect initializations of , , and , pose estimation requires only playing back the sequence of actions applied to the robot in the simulation. However, given our imperfect forward model and noisy pose initializations, pose estimation using our method can be improved via observation feedback.
Let be the number of joints the robot has and
the number of its contact sensors. We define the observation vector
as the concatenation of the joint position configuration of the robot , the position and rotation of the robot’s contact sensors (located on the fingertips), the force vectors of the sensed contacts , the unit vector in the direction of the translational slippage on the contact surface , and the binary direction of the rotational slippage on the contact surface , where indexes into the th contact sensor. The general inhand pose estimation problem is given the current and past observations , robot controls , and the initial pose, find the most probable current object pose
.IiiB Proposed Approach
In this work, we leverage a GPUaccelerated physics simulator as the forward dynamics model to concurrently simulate many robotobject environments to track the inhand object pose, and we use derivativefree, samplebased optimizers to jointly tune the state and parameters of these simulations to improve tracking performance (Algorithm 1). First, we obtain an estimate of the initial object pose via a visionbased object pose estimator. We assume this pose estimator can give reliable initial pose estimate when the robot is not in contact with the object and when the object is not occluded, i.e., before grasping. Then, given the initial object pose estimate and robot configuration, we initialize concurrent simulations, and at every timestep we copy the real robot actions to all simulations. Note that the object pose can change when the hand establishes contact, and this will be modeled by the simulator. Let the object pose and the observation of the th simulation be and , the ground truth observations be . Given a cost function , we say the current best pose estimate at time is the pose of the th simulation, , where the th simulation is the one that incurs the lowest average cost across some past time window :
(1)  
(2) 
The costs are used to periodically update the simulations and their parameters. This enables better alignment with the real robotobject system.
IiiC Cost Function
The desired cost function sufficiently correlates with object pose differences during inhand object manipulation such that a lower cost corresponds to better pose estimations. The cost function we use has the form of:
(3) 
For the first term in the cost function, comparing ’s between the simulated and realworld robots is useful even if they share the same , because can be different depending on the collision constraints imposed by the current pose of the object in contact with the robot hand, which might make it physically impossible for a joint to reach a commanded target angle.
A contact sensor is in contact if its force magnitude is greater than a threshold. is when the binary contact state of the th contact sensor of the th simulation agrees with that of the real contact sensor and otherwise. Similarly, is when the th contact sensor of the th simulation agrees with the real contact sensor in whether or not the sensor is undergoing translational slippage, otherwise; is the same but for rotational slippage.
For any two vectors, gives the difference of their magnitudes, and gives the angle between them. For any two rotations and , gives the angle of the axisangle representation of .
The weights of the cost terms, s, are determined such that the corresponding mean magnitude of each term is roughly normalized to .
IiiD Addressing Uncertainty and the SimtoReal Gap
There are two sources of uncertainty regarding object pose estimation via simulation: 1) the initial pose estimation from the visionbased pose estimator is noisy and 2) there is a mismatch between the simulated and realworld dynamics, partly caused by imperfect modeling and partly caused by the unknown realworld physics parameters .
To address the first issue of initial pose uncertainty, 1) we perturb the initial pose estimations across the different simulations by sampling from a distribution centered around the visionbased estimated pose , and 2) we increase the number of simulations ( in our experiments). If
is arbitrarily large, then it is with high probability that the true initial pose will be sufficiently represented in the set of simulations, and a welldesigned cost function will select the correct simulation with the correct pose. To perform sampling over initial object poses, we sample the translation and rotation separately. Translation is sampled from an isotropic normal distribution, while rotation is sampled by drawing zeromean, isotropic tangent vectors in
and then applying it to the mean rotation [15].To address the second issue of mismatch between simulated and realworld physics (the “simtoreal” gap), we propose using derivativefree, samplebased optimization algorithms to tune during pose tracking. Specifically, after every time steps, we pass the average costs of all simulations during this window along with the simulation state and parameters to a given optimizer. The optimizer determines the next set of simulations with their own updated parameters. The simulations in the next set are sampled from simulations from the current set, with some added perturbations to the simulation parameters and object pose. Such exploration maintains the diversity of the simulations, preventing them from getting stuck in suboptimal simulation parameters or states due to noisy observations.
Although it is desirable to have converge to the true , this is not necessary to achieve good pose estimation. In addition, due to differences in simulated and realworld dynamics, we do not expect the optimal for reducing to be their corresponding realworld values.
To optimize the parameters of the simulations and make their simulated states more closely track that of the real world, we evaluate three derivativefree, samplebased optimizers:
IiiD1 Weighted Resampling (WRS)
WRS forms a probability mass function (PMF) over the existing simulation states and samples times with replacement from that distribution to form the next set of simulations. To form the PMF, WRS applies softmax over the simulation costs:
(4) 
Here,
is a temperature hyperparameter that determines the sharpness of the distribution. After resampling, we perform exploration on all simulations by perturbing 1) the simulation parameters
and 2) the object pose.Simulation parameters are perturbed by sampling from an isotropic normal distribution around the previous parameters: , where is predefined. The subscript denotes the optimizer update step (after update steps the simulation has ran for a total of time steps).
For object pose perturbation, adding noise to the pose directly while the object is held inhand is impractical; most delta poses would result in mesh penetration and are hence invalid. This issue was noted in [14, 30], and like those works we perturb the objects by applying perturbation forces to the object in each simulation, with each force sampled from a zeromean isotropic normal distribution .
IiiD2 Relative Entropy Policy Search (REPS)
Unlike [7], which also uses REPS [35] to tune simulation parameters to address the simtoreal gap, we use a samplebased variant of REPS that computes weights for each simulation and samples from a distribution formed by the softmax of those weights. Whereas WRS uses a fixed parameter to shape the distribution, REPS solves for an adaptive temperature parameter that best improves the performance of the overall distribution subject to , a constraint on the KLdivergence between the old and updated sample distributions.
To use REPS, we reformulate the costs as rewards by setting . We compute at every step by optimizing the dual function , and then we use to form the PMF:
(5)  
(6) 
After resampling, every simulation is perturbed in the same manner as in WRS.
IiiD3 PopulationBased Optimization (PBO)
Inspired by PopulationBased Training (PBT) [23], this algorithm first ranks all simulations by their average costs and finds the top simulations with the lowest costs. Then, it 1) exploits by replacing the remaining simulations with copies of the ones, sampled with replacement, and 2) explores by perturbing the simulations in the same way as WRS.
PBO effectively uses a shaped cost that depends only on the relative ordering of the simulation costs and not their magnitudes, potentially making the optimizer more robust to noisy costs.
IiiE Hyperparameters
Each of the proposed optimizers has a distributionshaping hyperparameter used to balance exploration with exploitation. There are 5 additional hyperparameters for our proposed framework:

 the time steps the algorithm waits for every update.

 the number of concurrent simulations.

 the initial normal distribution over simulation parameters.

 the diagonal covariance matrix for the normal distribution over initial pose perturbation.

and  the diagonal covariances of normal distributions of perturbations used for exploration.
A larger is generally better than a smaller , with the caveat that the resulting simulation is slower and may not be practical in application. should be large enough such that the actual initial pose is well represented in the initial pose distribution. However, should be increased with a larger and the covariance of to ensure that the density of the samples is high enough to capture a wider distribution.
We note two additional tradeoffs with these hyperparameters. One is the explorationexploitation tradeoff in the context of optimizing for , and the other is the tradeoff between optimizing for and for . Making or wider will increase the speed at which the set of simulation parameters “move,” and the optimizer will explore more than it exploits. Increasing improves the optimization for as the optimizer has more samples to evaluate each simulation. However, updating the simulation parameters too slowly might lead to drift in pose estimation if the leastcost simulation is sufficiently different from the real world, potentially leading to divergent behavior. The worstcase divergent behavior occurs when force perturbation or some simulation parameters lead to an irrecoverable configuration, where the object falls out of the hand or brings the object into a pose such that small force perturbations cannot bring it back to the correct pose. It is acceptable if a few samples become divergent. Their costs will be high, so they will be discarded and replaced by ones that are not divergent during optimizer updates.
There are connections between our approach and previous works that use particle filtering. However, prior works were mostly applied in the staticgrasp setting where the forward model of the particle filter is a constant model. Instead, we track the object during manipulation and use a physics simulator as the forward model. In addition to tracking the object pose, the proposed algorithm also identifies the context of the forward model by tuning the simulation parameters , which are not affected by the forward or observation models. We focus on optimizers based on discrete samples and not continuous distributions, such as the popular Covariance Matrix Adaptation Evolution Strategy (CMAES) [17], because we cannot easily sample inhand object poses from some distribution due to complicated meshpenetration constraints imposed by contacts.
Iv Experiments
We evaluate the performance of our proposed approach with both simulation and the realworld experiments using an Allegro Hand mounted on a Kuka IIWA7 robot arm. Inhand object manipulation trajectories are first collected with a handtracking teleoperation system, and we evaluate pose estimation errors by running our proposed algorithms offline against the collected trajectories. These trajectories start and end with the object not in the hand and not in occlusion. Because we have access to ground truth object poses in simulation experiments, we perform detailed ablation studies in simulation to study the effects of different hyperparameters on algorithm performance. While we can compare pose estimation errors during the entire trajectory in simulation experiments, this is not possible in the real world. For realworld experiments, we use PoseRBPF [12], a recent RGBD, particlefilter based pose estimation algorithm to obtain initial and final object poses. We treat these initial and final object poses as ground truth and compare the final pose with the one predicted by our proposed algorithm.
We mount the 4finger 16DoF Allegro hand on the 7Dof Kuka IIWA7 robot arm. To obtain contact feedback in the real world, we attach SynTouch BioTac sensors to each of the finger tips. While BioTac sensors does not explicitly give force or slippage information, past works have studied how to extract such information from the sensor’s raw electrode readings to predict contact force [16, 46], slip direction [1], and grasp stability [45, 8, 47]. For realworld experiments, we use the trained model from [46] to estimate force vectors , but currently we do not estimate slippage from the BioTac sensors, so the cost function in realworld experiments do not contain the slippage terms. Simulations were conducted on a computer with an Nvidia GTX 1080 Ti GPU, Intel i78700K CPU, and 16GB of memory.
We use 3 objects from the YaleColumbiaBerkeley (YCB) objects dataset (the spam can, foam brick, and toy banana), with models, textures, and point clouds obtained from the dataset published in [49]. These objects were chosen because they fit the size of the Allegro hand and are light enough so that robust precision grasps can be formed (we emptied the spam can to reduce its weight).
For each object, in both simulation and realworld experiments, we give 2 demonstrations of 2 types of manipulation trajectories: 1) pick and place with fingergrasp and inhand object rotation, and 2) the same but with finger tips breaking and reestablishing contact during the grasp (finger gaiting). This gives a total of 24 trajectories for analysis for both simulation and realworld experiments. In both trajectory types the object undergoes translational and rotational slippage from both inertial forces and pushcontacts with the table. Each trajectory lasts about a minute. Given that we can run the pose estimation algorithm at about 30Hz, we obtain a total of about 2k frames per trajectory.
The teleoperation system is described in detail in a concurrent work under review. The input to the system is a point cloud of the hand of the human demonstrator. Then, a neural network based on PointNet++
[40] maps the point cloud to an estimate of the hand’s pose relative to the camera as well as the joint angles of the hand. These estimates along with an articulated hand model [18] and the original point cloud are then given to DART, which performs tracking by refining upon the neural network estimates. Finally, to perform kinematic retargetting, we solve an optimization problem that finds the Allegro hand joint angles that result in finger tip poses close to those of the human hand.In addition to our proposed optimizers (WRS, REPS, PBO), we also evaluate the following two baselines: Open Loop (OLP) and Identity (EYE). OLP tracks the object pose with simulation. EYE is initialized with a set of noisy initial poses and always picks the pose of the lowestcost simulation, but it does not perform any resampling or optimizer updates.
Similar to previous works [49, 31, 12], we use Average Distance Deviation (ADD) [20]
as the evaluation metric. ADD computes the average distance between corresponding points in the object point cloud situated at the ground truth pose and at the predicted pose. Unlike
[49, 31, 12], we do not use its symmetric variant, ADDS, which does not penalize pose differences across object symmetries (e.g. for poses of a sphere that share the same translation, any rotation difference gives error). This is desirable for resolving visual ambiguities for pose registration but not for tracking.Iva Simulation Experiments
For simulation experiments we build upon our previous work in GPUAccelerated robotics simulation [32]. The arm and hand in the simulation is controlled via a jointangle PD controller, and we tuned the controller’s gains so that the joint angle step responses are similar to those of the real robot. To speed up simulation, we simplify the collision meshes of the robot and objects. This is done first by applying TetWild [21] which gives a mesh with triangles that are more equilateral, then with Quadric Edge Collapse Decimation ^{1}^{1}1https://help.sketchfab.com/hc/enus/articles/205852789MeshLabDecimatingamodel. Each simulation generates at most contacts during manipulation, and we run simulations at Hz.
We performed simulation experiments with varying amounts of initial pose noise. Three levels were tested: “Low” has a translation standard deviation of mm and a rotation standard deviation of radians. “Med” is mm and radians, and “High” is mm radian.
See Figure 2 for a comparison of the optimizers on tracking inhand object poses across all the simulation trajectories. ADD increases as the initial pose error increases, and the mean ADD for the optimizerbased methods tends to be lower. While EYE sometimes achieves comparable mean ADD with the optimizer methods, the latter ones generally have much smaller error variance and max error. This result is expected as the optimizers focus the distribution of simulations towards better performing ones over time. In the medium noise case, REPS and PBO achieve the best ADD with a mean of mm and mm respectively.
See Figure 3 for results of ablation studies in simulation performed over the hyperparameters governing exploration distance (how much simulations are perturbed), the number of parallel simulations, and whether or not contact and slip detection feedback is used in the cost function.
IvB RealWorld Experiments
We evaluate our algorithm on realworld trajectories similar to those collected in simulation. We use PoseRBPF to register the object pose in the first and last frames of a trajectory. The initial pose estimate is used to initialize the simulations, while the last one is used to evaluate the accuracy of our contactsbased pose tracking algorithm. Unlike simulation experiments, the realworld experiments initialize the object pose by sampling from the distribution over object poses from PoseRBPF, so the initial pose samples correspond to the uncertainties of the visionbased pose estimation algorithm.
See Figure 4 for realworld experiment results. The ADDs are higher than those from simulation experiments. This is due to both that the realworld dynamics is more dissimilar with simulations than are simulations with different parameters, and that realworld observations are noisier than those in simulations. We observe that no optimizer is able to track the toy banana for the realworld data. The object’s long moment arm and low friction coefficient makes its slippage behavior difficult to model precisely. This is a failure mode of our algorithm, where if all of the simulations become divergent (e.g. the banana rotates in the wrong direction, or falls out of hand), then the algorithm cannot recover in subsequent optimizer updates. The best ADD achieved with Foam is mm by PBO, and with Spam is mm by REPS.
V Conclusion
We introduce a samplebased optimization algorithm for tracking inhand object poses during manipulation via contact feedback and GPUaccelerated robotic simulation. The parallel simulations concurrently maintains many beliefs about the real world and model object pose changes that are caused by complex contact dynamics. The optimization algorithm tunes simulation parameters during object pose tracking to further improve tracking performance. In future work, we plan to integrate contact sensing with visionbased pose tracking intheloop.
Acknowledgment
The authors thank Renato Gasoto, Miles Macaklin, Tony Scudiero, Jonathan Tremblay, Stan Birchfield, Qian Wan, and Mike Skolones for their help with GPUaccelerated robotic simulations, Xinke Deng and Arsalan Mousavian for their help with PoseRBPF, and Balakumar Sundaralingam and Tucker Hermans for their help with BioTac sensors. Jacky Liang is in part funded by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 1745016.
References
 [1] (2018) Direction of slip detection for adaptive grasp force control with a dexterous robotic hand. In 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pp. 21–27. Cited by: §IV.
 [2] (2017) Tactilebased inhand object pose estimation. In Iberian Robotics conference, Cited by: §II.
 [3] (2018) Learning dexterous inhand manipulation. arXiv preprint arXiv:1808.00177. Cited by: §I.
 [4] (2015) Global estimation of an object’s pose using tactile sensing. Advanced Robotics. Cited by: §II.
 [5] (2016) Inhand object pose estimation using covariancebased tactile to geometry matching. IEEE Robotics and Automation Letters. Cited by: §II.
 [6] (2013) Online inhand object localization. In IROS, Cited by: §II.
 [7] (2019) Closing the simtoreal loop: adapting simulation randomization with real world experience. ICRA. Cited by: §IIID2.
 [8] (2016) Bigs: biotac grasp stability dataset. In ICRA 2016 Workshop on Grasping and Manipulation Datasets, Cited by: §IV.
 [9] (2014) Learning robot tactile sensing for object manipulation. In IROS, Cited by: §II, §II.
 [10] (2016) Using vision for preand postgrasping object localization for soft hands. In International Symposium on Experimental Robotics, Cited by: §II.
 [11] (2010) A measurement model for tracking handobject state during dexterous manipulation. In ICRA, Cited by: §II.
 [12] (2019) PoseRBPF: a raoblackwellized particle filter for 6d object pose tracking. RSS. Cited by: §I, §IV, §IV.
 [13] (2018) Inhand grasping pose estimation using particle filters in combination with haptic rendering models. International Journal of Humanoid Robotics. Cited by: §II.
 [14] (2011) Physical simulation for monocular 3d model based tracking. In ICRA, Cited by: §IIID1.
 [15] (2013) Lie groups for 2d and 3d transformations. URL http://ethaneade.com/lie.pdf, revised Dec. Cited by: §IIID.
 [16] (2016) Hierarchical fingertip space: a unified framework for grasp planning and inhand grasp adaptation. IEEE Transactions on robotics 32 (4), pp. 960–972. Cited by: §IV.
 [17] (2003) Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cmaes). Evolutionary computation 11 (1), pp. 1–18. Cited by: §IIIE.
 [18] (2019) Learning joint reconstruction of hands and manipulated objects. In CVPR, Cited by: §IV.
 [19] (2011) Fusion of stereo vision, forcetorque, and joint sensors for estimation of inhand object location. In ICRA, Cited by: §II.

[20]
(2012)
Model based training, detection and pose estimation of textureless 3d objects in heavily cluttered scenes.
In
Asian conference on computer vision
, pp. 548–562. Cited by: §IV.  [21] (201807) Tetrahedral meshing in the wild. ACM Trans. Graph. 37 (4), pp. 60:1–60:14. External Links: ISSN 07300301, Link, Document Cited by: §IVA.
 [22] (2017) Tracking objects with point clouds from vision and touch. In ICRA, Cited by: §II.
 [23] (2017) Population based training of neural networks. arXiv preprint arXiv:1711.09846. Cited by: §IIID3.
 [24] (2013) Efficient touch based localization through submodularity. In ICRA, Cited by: §II.
 [25] (2019) Learning to estimate pose and shape of handheld objects from rgb images. arXiv preprint arXiv:1903.03340. Cited by: §II.
 [26] (2015) Pose estimation for planar contact manipulation with manifold particle filters. The International Journal of Robotics Research. Cited by: §II.
 [27] (2019) Joint inference of kinematic and force trajectories with visuotactile sensing. arXiv preprint arXiv:1903.03699. Cited by: §II.
 [28] (2019) Making sense of vision and touch: learning multimodal representations for contactrich tasks. arXiv preprint arXiv:1907.13098. Cited by: §II.
 [29] (2014) Localization and manipulation of small parts using gelsight tactile sensing. In IROS, Cited by: §II.
 [30] (2015) A comparative study of contact models for contactaware state estimation. In IROS, Cited by: §II, §IIID1.
 [31] (2018) DeepIM: deep iterative matching for 6d pose estimation. In ECCV, Cited by: §I, §IV.

[32]
(2018)
GPUaccelerated robotic simulation for distributed reinforcement learning
. CoRL. Cited by: §IVA.  [33] (2017) Robotic tactile perception of object properties: a review. Mechatronics. Cited by: §II.
 [34] (2019) Nonsmooth newton methods for deformable multibody dynamics. arXiv preprint arXiv:1907.04587. Cited by: §I.

[35]
(2010)
Relative entropy policy search.
In
TwentyFourth AAAI Conference on Artificial Intelligence
, Cited by: §IIID2.  [36] (2011) Global localization of objects via touch. IEEE Transactions on Robotics. Cited by: §II.
 [37] (2011) Object mapping, recognition, and localization from tactile geometry. In ICRA, Cited by: §II.
 [38] (2018) Fusing joint measurements and visual features for inhand object pose estimation. IEEE Robotics and Automation Letters. Cited by: §II.
 [39] (2011) Using bayesian filtering to localize flexible materials during manipulation. IEEE Transactions on Robotics. Cited by: §II.
 [40] (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §IV.
 [41] (2018) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. RSS. Cited by: §I.
 [42] (2017) Touch based localization of parts for high precision manufacturing. In ICRA, Cited by: §II.
 [43] (2015) Depthbased tracking with physical constraints for robot manipulation. In ICRA, Cited by: §II.
 [44] (2014) DART: dense articulated realtime tracking.. In RSS, Cited by: §II.
 [45] (2015) Force estimation and slip detection/classification for grip control using a biomimetic tactile sensor. In Humanoids, pp. 297–303. Cited by: §IV.
 [46] (2019) Robust learning of tactile force estimation through robot interaction. ICRA. Cited by: §IV.
 [47] (2018) Inhand object stabilization by independent finger control. arXiv preprint arXiv:1806.05031. Cited by: §IV.
 [48] (2017) Memory unscented particle filter for 6dof tactile localization. IEEE Transactions on Robotics. Cited by: §II.

[49]
(2018)
PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes
. RSS. Cited by: §I, §IV, §IV.  [50] (2018) Realtime state estimation with tactile and visual sensing. application to planar manipulation. In ICRA, Cited by: §II.
 [51] (2013) A dynamic bayesian approach to realtime estimation and filtering in grasp acquisition. In ICRA, Cited by: §II.
 [52] (2019) Dexterous manipulation with deep reinforcement learning: efficient, general, and lowcost. ICRA. Cited by: §I.
Comments
There are no comments yet.