Tactile Mapping and Localization from High-Resolution Tactile Imprints

04/24/2019 ∙ by Maria Bauza, et al. ∙ 0

This work studies the problem of shape reconstruction and object localization using a vision-based tactile sensor, GelSlim. The main contributions are the recovery of local shapes from contact, an approach to reconstruct the tactile shape of objects from tactile imprints, and an accurate method for object localization of previously reconstructed objects. The algorithms can be applied to a large variety of 3D objects and provide accurate tactile feedback for in-hand manipulation. Results show that by exploiting the dense tactile information we can reconstruct the shape of objects with high accuracy and do on-line object identification and localization, opening the door to reactive manipulation guided by tactile sensing. We provide videos and supplemental information in the project's website http://web.mit.edu/mcube/research/tactile_localization.html.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The correlation between hand dexterity and the spatial-and-pressure resolution of its tactile sensors has been of interest for a long time [1]. In the 19th century, Weber explored spatial acuity with the “two-point touch threshold”, i.e., the shortest distance that can be perceived as two separate pressure points. Later, Max von Frey studied the sensitivity to different levels of applied pressure [2]. It comes with no surprise that the regions with finer spatial sensor resolution, and those that are more sensitive to pressure, are the tips of our fingers and the tip of our tongue, both known for their dexterity.

This work builds from a recent interest in image-based tactile sensors such as GelSlim [3] or GelSight [4] which, by virtue of using a soft gel skin and a camera as transducer, achieve very high spatial acuity and pressure sensitivity, yielding highly discriminative tactile signals.

In this paper we study the use of tactile imprints as dense descriptors of contact, and demonstrate an approach to reconstructing the tactile shape of an object to facilitate tactile localization (Fig. 1). This work is part of an approach to robotic manipulation that has at its core:

  • Dense tactile descriptors.

    High-resolution tactile imprints are dense local descriptors of touch. This opens the door to a large set of classification and regression techniques from machine learning.

  • Crisp tactile memory.

    Touching something and recognizing with confidence that it has been touched before. This is essential for data association in estimation/reconstruction problems.

  • Fine differentiation. Detection of small differences between very similar tactile imprints can be used for part verification or comparison.

  • Contact force distribution. The deformation of the sensor skin under contact encodes the spatial distribution of internal contact forces. These are key for precise object manipulation [5, 6].

These suggest an approach to manipulation where tactile information does not play a complementary role, but rather a main driver of dexterous manipulation.

Fig. 1: Tactile mapping and localization. This work addresses the problem of in-hand object identification and localization using tactile sensing. Given a new tactile imprint from the tactile sensor in the robot’s finger, we use the precomputed (offline) tactile map of an object to identify and find its location inside the grasp.

In this paper we demonstrate that we can combine tactile imprints with robot kinematics to build a tactile map of an object for localization. To do so, we present 3 contributions:

  • Local shape estimation: we use tactile imprints to estimate the shape of the contact patch using CNNs. We validate the algorithm in Section IV with known contact geometries that yield sub-millimeter accuracy.

  • Global tactile mapping: we fuse the tactile imprints and the kinematics (gripper pose and opening) of multiple grasps of a fixed object to reconstruct its global tactile shape. This includes the object geometric shape as well as a discrete representation of its tactile imprints. We validate the algorithm in Section V by recovering the main dimensions of known objects with an error lower than 5%.

  • Object tactile localization: Figure 1 illustrates how we combine tactile imprints with an estimation of the shape of the contact patch to identify and localize a grasped object. Our ICP-based algorithm uses tactile imprints for coarse data association, and contact shape for fine refinement. We validate the process in Section VI with controlled quantitative grasping experiments and in Section VII with qualitative results for unstructured picking.

Ii Related Work

Effective robotic manipulation usually requires a good understanding of the manipulated objects. To this end, many estimation techniques have been proposed to recover object properties or to track and identify them. With good visibility, visual information can be enough to recover and track most objects [7, 8, 9]. However, in the context of robotic manipulation, occlusions are often unavoidable specially when the robot actively interacts and manipulates objects. To mitigate this problem, some works have explored tactile sensing to better estimate object properties or motion [10, 11, 12, 13, 14]. Most of these works heavily rely on visual information and tend to use tactile information for some form of contact detection.

Recent works such as Luo et al. [13] and Falco et al. [14] provide good shape reconstructions or do object localization, but remain restricted to mostly planar objects or use tactile sensors that are bulky and impede dexterous manipulations. To enable more realistic robotic manipulation, some works have used tactile sensors that are more naturally integrated into robotic arms and hands to explore object properties.

There is extensive work based on tactile sensing that aims at recovering the shape of manipulated objects [15, 16, 17, 18, 19, 20, 21, 22, 23]. Among the most recent works, Jamali et al. [18] and Driess et al. [20] actively search for regions that need to be explored based on tactile readings. However, their sensors are low resolution which makes the reconstructed shapes hard to use for other applications. Our approach instead uses a high resolution sensor that not only reconstructs the local contact shape, but reuses their tactile map to enable object identification and localization. We refer the reader to Luo et al. [24] for a more extensive review on tactile estimation of objects properties such as shape.

There has also been work on estimating the location of objects using tactile sensing [25, 26, 27, 28, 29, 30]. While some of these works use tactile sensing only as an indicator of contact [26], others such as Aggarwal and Kirchner [30] or Pezzementi et al. [28] aim to recover both the object motion and its shape. These works are tailored to mostly planar objects and use low resolution tactile sensors. Our approach instead generalizes to 3D objects without relying on prior knowledge.

Our work is also related to the SLAM estimation problem [31, 32, 33]. The application of SLAM to manipulation of 3D objects however has been challenging because of lack of clean information for data association. Our approach relies on building a dense tactile map of each object offline that enables coarse data association and can later be used for multiple manipulation tasks to both identify objects and locate their pose w.r.t. to the tactile sensor.

In this work we use GelSlim [3], a tactile sensor based on Gelsight [34] that provides high resolution tactile imprints in the form of images. The original Gelsight sensor has been used for object localization of small objects [35], to complement a vision-based tracker [36], or recently to recover 3D shapes using also vision and prior shape models [37]. However, its design is bulky for practical use in complex manipulation tasks. Instead, GelSlim is integrated in a slim finger that facilitates manipulation [38, 39]. Leveraging GelSlim’s high resolution, we show that our approach can reconstruct tactile maps of objects and use them efficiently to identify and recover their location.

Fig. 2: Local shape estimation. To estimate the shape of an object at contact, we built a system that automatically collects data and maps tactile imprints to heightmaps of the local shape. From left to right: a) objects used for training, b) robot collecting data by frontally touching random locations of an object, 3) tactile image recorded during the touch, and 4) heightmap of the object’s shape at contact obtained using a trained CNN.

Iii Approach: Tactile mapping and localization

We present a framework to reconstruct a tactile map of an object to identify and locate it in-hand. Figure 1 illustrates the process where we combine an off-line reconstructed tactile shape of a flashlight with an on-line imprint to locate its position after being grasped. The framework is composed of three steps to address three main challenges:

  • Local shape estimation. Given a tactile imprint we estimate the local shape of an object during contact. We address this with a self-supervised approach that trains a CNN (Convolutional Neural Net) to map tactile imprints to local contact shapes. The process is described in Section IV.

  • Global tactile and shape mapping. Given a set of tactile imprints from an object and their respective local shapes, we use the kinematics of the robot to reconstruct a global tactile shape. Section V explains this off-line controlled process.

  • Identification and localization. Given an on-line tactile imprint, we compare it to previously reconstructed tactile shapes to identify the object, and to initialize an ICP-based algorithm to locate it within its global tactile map (Sections VI and VII).

Iv Local shape estimation

In this section we explain how to recover the local shape of an object from a tactile imprint. The local shape is given as a heightmap and aims to represent the local geometry of an object at contact. In Fig. 2, we use a data-driven approach to build a map between tactile imprints and local shapes.

We automate data collection using a setup that builds upon the system described in [38, 39] which features a robotic arm (ABB IRB 1600id) with a parallel jaw gripper (WSG-50 Weiss) and a GelSlim tactile sensor at each finger [3]. In this work, we only use one tactile sensor, but our approach readily extends to both. During data collection, objects are rigidly attached to an external platform (Fig. 2) and palpated at different locations and orientations.

To find a map between tactile images and local shapes, we used the 5 controlled geometries in Fig. 2 to ensure a diverse set of tactile imprints. These objects provide ground truth heightmaps as they produce identifiable local shapes such as the section of a sphere, a semicone, a semipyramid or a semicone that has a cavity at its center (hollow) that can be easily localized in the imprint to facilitate the autonomous labelling of each tactile imprint. For each object, we collected 600 pairs of tactile imprints and heightmaps, while holding out 100 for testing.

Network Architecture. Given that tactile images and heightmaps are 2D arrays, we leverage standard CNNs. The basic architecture of the CNN we use is a sequential model of 10 convolutional layers with 64 filters each and a 3-by-3 sized kernel. To improve the robustness to illumination changes, we augment the data including random variations in the 3 channels of the tactile images. We also account for translations by adding two extra channels to the input with the and position of each pixel. A more in detail explanation of the data collection and training process can be found in the project’s website [40].

Evaluation. We evaluate the quality of the heightmaps on 100 test images per object. The error reported is the RMSE (Root Mean Squared Error) between the actual and predicted heightmaps w.r.t. to the size of the contact patch. To first order, we see that the RMSE decreases by adding more training data and that the average reconstruction accuracy reaches 0.1mm on the test data with only 500 datapoints and 0.0600.016mm using 2000 datapoints.

Sphere Cone1 Cone 2 Hollow Pyramid
All 0.066 0.077 0.062 0.070 0.067
Removed 0.108 0.167 0.073 0.145 0.137
TABLE I: Cross-validation between objects.
Cylinder Semicone Double-cylinder Cuboid Semipyramid
Object
Tactile shape
Parameters Radius Base radius Slope Big radius Small radius Joint Side 1 Side 2 Base side Slope
Real Value 25.0 20.0 7.5 16.0 11.0 60.0 25.0 20.0 20.0 7.5
Estimation 24.8 19.8 7.5 15.3 10.4 61.9 26.0 20.9 19.7 7.2
Relative error 2.9% 2.2% 0.1% 4.4% 5.5% 3.2% 4% 4.5% 1.5% 4%
TABLE II: Parameter estimation using shape reconstruction

Table I shows the average RMSE (mm) of the heightmap reconstruction on novel shapes by a cross-validation approach. For each training object, we remove it entirely from the training set, train a CNN with 2000 datapoints, and evaluate it both on the whole test set and just on the images from the removed object. Even for hold-out objects, the reconstruction accuracy is on the order of 0.1mm. We observe experimentally that objects like semicone1 or hollow that have unique geometric features, such as edges and holes, are more important to the training set. Smaller objects that leave a more reliable imprint on the sensor are also more informative for the training set.

Finally, Fig. 3 shows three examples of how we recover the local shape of novel complex objects. Results indicate that the map built to translate tactile imprints to heightmaps is accurate at recovering the local geometry of objects and can be extended to reconstruct their global shapes.

Fig. 3: Examples of local shapes. Tactile imprints of complex objects and the CNN-computed heightmaps of their local shapes.

V Global Tactile And Shape Mapping

This section describes and analyzes how to combine a set of tactile imprints from an object to recover its tactile shape. This method for shape recovery relies on the accuracy of the robot kinematics and the gripper, and the precision of the heightmaps described in the previous section.

We first explain how each tactile imprint of the tactile map can be localized in the world frame. From the heightmap, we construct a point cloud in the sensor’s frame using an accurate calibration of the intrinsic parameters of the sensor’s camera. We get this calibration by probing with the tactile sensor on known points w.r.t. the robot base. Then we localize the point cloud in the world’s frame by assuming a rigid and calibrated transformation between sensor, gripper and robot arm. Finally, we stitch all the point clouds obtained from the same object by adding them together into a single point cloud. The project’s website [40] provides a more detailed description of these steps.

To test this approach, we reconstructed the tactile shapes of the controlled objects in table II. The data collection process is similar to the one in Section IV where we fix the position and orientation of each object, and grasp it along many locations and orientations while recording the tactile imprints. For uniformity of resolution, the grasps follow an equispaced grid of 10mm of resolution along the planes defined by the gripper orientation w.r.t. the fixed object. We considered 3 orientations for the grasp: 0 and 20 in the vertical axis to promote diversity in the tactile imprints.

We evaluate the accuracy of the tactile maps by estimating the main dimensions of each object. We see in table II that the error in most parameters is less than a few millimeters. From our experience, a denser grid of tactile imprints improves the accuracy of the tactile map for some objects at the cost of longer exploration times. We opted for keeping the exploration equal for all the analyzed tactile maps to enable fair comparison among objects.

Vi Tactile Localization

Fig. 4: Tactile localization using CTI-ICP-N. Given a new tactile imprint, we find its location in an offline computed tactile map (0) by following the steps: (1) find the N touches from the map that are more similar to the new one, (2) create an auxiliary point cloud with these N touches that is a subset of the global one, and (3) use ICP to stitch the local point cloud from the new tactile imprint to the auxiliary one to locate its pose in the global shape.

Given the tactile map of an object, our goal is to effectively use it for robotic manipulation. To show this, we examine and evaluate how to localize the object based on correspondences between the tactile shape and local tactile imprints.

The proposed approach is described in Figure 4. We first recover the local shape in the sensor’s frame as a point cloud using its tactile imprint as in Section IV. Then we stitch this local point cloud to the global tactile shape of the object and infer how the resulting point cloud is located w.r.t to the tactile sensor. Finally, given the robot kinematics and the gripper opening we estimate the actual pose of the object in the world frame. We consider 3 algorithms to stitch the local point cloud: RANDOM, CTI, and our proposed approach CTI-ICP-N which uses CTI to provide a coarse approximation of the object pose and ICP to refine it.

RANDOM. Assumes that the sensor’s pose during contact is the same as one of the poses (randomly selected) associated with the tactile imprints used to build the global shape. This approach is naive but sets a baseline to assess the performance of other methods.

CTI. The pose of the tactile sensor for the new imprint corresponds with the closest tactile imprint

(CTI) used to construct the global shape. To compute the similarity between touches, we first map each tactile image to a feature vector using as encoder the predictions of a ResNet50 trained on ImageNet 

[41] without its last fully connected layers. Next we measure the cosine distance between images and only consider those whose gripper opening differs in less than 2mm from the current gripper opening. As a result, the new touch is stitched to the global shape as if it was recorded at the same location as its closest tactile imprint.

CTI-ICP-N This approach combines CTI with ICP (Iterative Closest Point) as described in Fig. 4. We start by finding the N closest images to the new tactile imprint, using the same metric as CTI. Then we build an auxiliary point cloud that includes the shapes from all those N closest imprints of the global shape. Finally we do ICP to relocate the local point cloud (initially at the pose of the closest imprint) w.r.t to the auxiliary point cloud to better estimate its pose.

The case where N is the total number of tactile imprints is equivalent to doing ICP between the local point cloud and the global one, but in practice does not provide better results than considering small values of N and it is more time consuming. We hypothesize this is because of the sparsity and size of the global shape that can be confusing to ICP. By cropping the global shape we indirectly provide focus to the ICP’s exploration which reduces the chance that it will get stuck in a local minimum. Figure 4 exemplifies how using low values of N prioritizes comparisons of the new point cloud with regions that are more similar to it making the approach more desirable both in terms of accuracy and computation.

Scissors Tape Brush Flashlight Gum Box
Object
Tactile shape
RANDOM 46.4 38.4 51.3 62.2 36.1
CTI 10.0 10.0 10.0 20.0 20.0
CTI-ICP-1 7.7 7.5 9.7 22.1 20.5
CTI-ICP-5 6.4 6.1 10.0 22.6 20.4
RMSE distribution
  • Median of the RMSEs (mm) and RMSE histograms between 0 and 80mm for CTI-ICP-5.

TABLE III: Localization error for different objects

Evaluation. To evaluate the approach, we considered five common objects and built their global maps as in Section V by rotating them 0, 90, 180 and 270. Table III shows the objects and their reconstructed tactile shapes. The objects vary in several dimensions including weight, length, width, symmetries and texture. For instance, the gum box is small and symmetric while the brush is longer and exhibits different tactile imprints depending on the grasp.

We use a cross-validation approach to evaluate the accuracy of the algorithms. For each object, we remove one of the touches used to build its global shape and aim to localize it back. We measure the error in the relocation by computing the RMSE between the original point cloud and its final position after estimating its pose. Results in table III show the median values of the RMSEs after removing each local point cloud once from the global tactile map. CTI, i.e., finding the closest tactile imprint to the removed one, has high accuracy as the median RMSE is on the order of 10mm for 3 of the 5 objects and 20mm for the other two. This is encouraging because the distance between grasps for the same object orientation is at least 10mm. For the gum box, we observed that even with a more dense grid it is difficult to achieve an error lower than 20mm because of its symmetries: tactile imprints separated by 20mm look identical at plain sight which makes their local shapes very similar.

Fig. 5: Distribution of errors.

We compare the distribution of the RMSEs from the hair brush for RANDOM, CTI and CTI-ICP-5 (CTI-ICP-1 was in between CTI and CTI-ICP-5). While the distribution is spread for RANDOM, both CTI and CTI-ICP-5 produce a great improvement on the prediction of the point cloud pose. CTI and CTI-ICP-5 still suffer from outliers making their medians almost the same, but CTI-ICP-5 has more density to the left than CTI. This suggests that CTI-ICP-5 can refine its predictions w.r.t. CTI and be more accurate when the initial guess is good.

Fig. 6: Error vs. number of point clouds We study the median of RMSEs for the scissors depending on the number of tactile imprints used to build the tactile map. With only 10% (around 50) of the point clouds, the median is around 20mm, but decreases to 10mm if we consider a 25% (around 100) of point clouds or more. As the number of point clouds increases, CTI does not improve, but CTI-ICP-1 and CTI-ICP-5 reduce the error by almost half by making local adjustments to the pose of the point cloud.

From the results, we also conclude that for objects with a wide distribution of contact patches, doing ICP can boost the localization accuracy. The median of RMSEs for the scissors and the tape already shows that applying CTI-ICP-5 leads to lower RMSEs than CTI. Figure 5 compares the distribution of RMSEs for RANDOM, CTI and CTI-ICP-5 for the hair brush. As expected, the distribution of errors is tighter and closer to zero for CTI and CTI-ICP-5. However, we observe that CTI-ICP-5 leads to a greater improvement in those cases where there is an initial good guess as its distribution is more titled to the left compared to CTI. While the improvements of CTI-ICP-5 against CTI can not be seen directly through the median of the RMSEs, the histograms show that CTI-ICP-5 accurately refines the pose of some local point clouds.

Size of the tactile shape. We study how the number of tactile imprints used to build the tactile shape affects the accuracy. To tackle this question, we built several global maps for the scissors using only a percentage (10, 25, 50 or 75%) of the touches and evaluated the accuracy when stitching one of the discarded touches to the incomplete tactile maps. Figure 6 shows the RMSE median for CTI, CTI-ICP-1 and CTI-ICP-5 depending on the number of point clouds. As expected, adding more point clouds helps reduce the error in localization. However, after considering a 25% percent of the point clouds the median RMSE for CTI stays at 10mm. In comparison, CTI-ICP-N approaches improve their accuracy with more touches as they adjust the position of the stitched point cloud reducing almost by half the error.

Vii In-hand identification and localization

Given a set of tactile shapes from explored objects and a new tactile imprint, our goal is to recover both the identity of the object and its location in-hand. We identify it by comparing the new tactile imprint to the ones used to create the global maps of each object, and assigning the identity from the most similar tactile imprint. Once identified, we stitch the new tactile imprint to its global shape as in Section VI and obtain its pose w.r.t. to the tactile sensor.

Figure 7 shows 3 examples of grasped objects and their tactile imprints. Our approach correctly identifies each object and estimates its location in-hand. The solution is fast enough to provide real time estimations of the pose as the only steps are: one CNN pass, a similarity comparison in a feature vector space and ICP with small point clouds.

Fig. 7: Object identification and localization. From each random grasp and tactile imprint, our approach identifies correctly the object and accurately estimates its position in-hand. The objects are identified among those in table III and localized using the tactile maps from that table.

Viii Discussion and Future work

This paper presents an approach to build a tactile shape and use it to identify and localize an object in-hand. Given a tactile imprint, we learn an accurate model of the local shape at contact. By combining several tactile imprints we build a tactile map of an object and use it for localization. We compare several algorithms for object location given a new tactile imprint and show that the accuracy of the estimates is tightly related to the resolution of the tactile shape, but can be improved combining CTI for coarse localization and ICP for fine refinement. As a result, this approach yields accurate tactile feedback that can be used in real manipulation tasks.

While our approach has the potential to significantly enhance in-hand manipulation through tactile feedback, there are many opportunities for improvement:

Using visual information. The approach presented is purely tactile-based, but that answers to the desire to prove a point: tactile is an enabler of manipulation. A practical system would also exploit visual feedback to create more complete maps. We believe this is possible as our tactile maps provide an estimation of the global shape in the form of point cloud.

Considering prior information. For many tasks, object information is already available offline. Using shape priors or object properties such as mass or texture can help our approach produce a more accurate tactile map.

Dealing with hard-to-localize objects. Some objects are harder or impossible to accurately localize with a single tactile imprint because of their symmetries or lack of texture. In these situations, our approach can still provide a set of possible poses for the object. The combination of multiple imprints could help to better estimate its actual pose.

Using multiple tactile sensors. We only use one tactile sensor, but extending to multiples is relatively straightforward. For a two-jaw gripper, two sensors would likely make our approach more discriminative and accurate at location.

Handling negative information. Our high resolution sensor does not only provide information about where contact happens, but also where it does not. However, our current solution does not use this information despite we believe adding these constraints could greatly enhance performance.

Improving tactile similarity. Super-sizing tactile data collection can provide many more tactile imprints and allow to train a better feature space to discriminate between them.

Improving stitching technique. Using ICP on the point clouds from the tactile imprints has proven more challenging than what we expected. We believe there are ways to mitigate this issue by choosing the right metrics to deal with the sparsity of tactile point clouds.

To make sure our approach is robust and useful for complex manipulation tasks, our next steps include addressing many of the points discussed above. We believe our method provides a reliable and accurate source of tactile feedback that will open a lot of research possibilities within a tactile approach to dexterous manipulation.

References

  • Jones and Lederman [2006] L. A. Jones and S. J. Lederman, Human hand function.    Oxford University Press, 2006.
  • Norrsell et al. [1999] U. Norrsell, S. Finger, and C. Lajonchere, “Cutaneous sensory spots and the “law of specific nerve energies”: history and development of ideas,” Brain research bulletin, vol. 48, no. 5, pp. 457–465, 1999.
  • Donlon et al. [2018] E. Donlon, S. Dong, M. Liu, J. Li, E. Adelson, and A. Rodriguez, “Gelslim: A high-resolution, compact, robust, and calibrated tactile-sensing finger,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
  • Yuan et al. [2017a] W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,” Sensors, vol. 17, no. 12, p. 2762, 2017.
  • Dong et al. [2018] S. Dong, D. Ma, E. Donlon, and A. Rodriguez, “Maintaining grasps within slipping bound by monitoring incipient slip,” CoRR, vol. abs/1810.13381, 2018. [Online]. Available: http://arxiv.org/abs/1810.13381
  • Ma et al. [2018] D. Ma, E. Donlon, S. Dong, and A. Rodriguez, “Dense Tactile Force Distribution Estimation using GelSlim and inverse FEM,” arXiv e-prints, p. arXiv:1810.04621, Oct 2018.
  • Newcombe et al. [2011] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Oct 2011, pp. 127–136.
  • Salas-Moreno et al. [2013] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison, “Slam++: Simultaneous localisation and mapping at the level of objects,” in

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , June 2013.
  • OpenAI et al. [2018] OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning dexterous in-hand manipulation,” 2018.
  • Björkman et al. [2013] M. Björkman, Y. Bekiroglu, V. Högman, and D. Kragic, “Enhancing visual perception of shape through tactile glances,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nov 2013, pp. 3180–3186.
  • Allen et al. [1999] P. K. Allen, A. T. Miller, P. Y. Oh, and B. S. Leibowitz, “Integration of vision, force and tactile sensing for grasping,” Int. J. Intelligent Machines, vol. 4, pp. 129–149, 1999.
  • Ilonen et al. [2014] J. Ilonen, J. Bohg, and V. Kyrki, “Three-dimensional object reconstruction of symmetric objects by fusing visual and tactile sensing,” The International Journal of Robotics Research, vol. 33, no. 2, pp. 321–341, 2014. [Online]. Available: https://doi.org/10.1177/0278364913497816
  • Luo et al. [2017a] S. Luo, W. Mou, K. Althoefer, and H. Liu, “Localizing the object contact through matching tactile features with visual map,” CoRR, vol. abs/1708.04441, 2017. [Online]. Available: http://arxiv.org/abs/1708.04441
  • Falco et al. [2017] P. Falco, S. Lu, A. Cirillo, C. Natale, S. Pirozzi, and D. Lee, “Cross-modal visuo-tactile object recognition using robotic active exploration,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on.    IEEE, 2017, pp. 5273–5280.
  • Strub et al. [2014] C. Strub, F. Wörgötter, H. Ritter, and Y. Sandamirskaya, “Using haptics to extract object shape from rotational manipulations,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 2014, pp. 2179–2186.
  • Luo et al. [2018] S. Luo, W. Mou, K. Althoefer, and H. Liu, “iclap: shape recognition by combining proprioception and touch sensing,” Autonomous Robots, pp. 1–12, 2018.
  • Martinez-Hernandez et al. [2013] U. Martinez-Hernandez, G. Metta, T. J. Dodd, T. J. Prescott, L. Natale, and N. F. Lepora, “Active contour following to explore object shape with robot touch,” in 2013 World Haptics Conference (WHC), April 2013, pp. 341–346.
  • Jamali et al. [2016] N. Jamali, C. Ciliberto, L. Rosasco, and L. Natale, “Active perception: Building objects’ models using tactile exploration,” in 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), Nov 2016, pp. 179–185.
  • Yi et al. [2016] Z. Yi, R. Calandra, F. Veiga, H. van Hoof, T. Hermans, Y. Zhang, and J. Peters, “Active tactile object exploration with gaussian processes,” in Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on.    IEEE, 2016, pp. 4925–4930.
  • Driess et al. [2017]

    D. Driess, P. Englert, and M. Toussaint, “Active learning with query paths for tactile object shape exploration,” in

    Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on.    IEEE, 2017, pp. 65–72.
  • Sommer et al. [2014] N. Sommer, M. Li, and A. Billard, “Bimanual compliant tactile exploration for grasping unknown objects,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014, pp. 6400–6407.
  • Zhang et al. [2017] M. M. Zhang, N. Atanasov, and K. Daniilidis, “Active end-effector pose selection for tactile object recognition through monte carlo tree search,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).    IEEE, 2017, pp. 3258–3265.
  • Mao et al. [2017] H. Mao, J. Xiao, M. M. Zhang, and K. Daniilidis, “Shape-based object classification and recognition through continuum manipulation,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).    IEEE, 2017, pp. 456–463.
  • Luo et al. [2017b] S. Luo, J. Bimbo, R. Dahiya, and H. Liu, “Robotic tactile perception of object properties: A review,” Mechatronics, vol. 48, pp. 54–67, 2017.
  • Petrovskaya and Khatib [2011] A. Petrovskaya and O. Khatib, “Global localization of objects via touch,” IEEE Transactions on Robotics, vol. 27, no. 3, pp. 569–585, 2011.
  • Koval et al. [2015] M. C. Koval, N. S. Pollard, and S. S. Srinivasa, “Pose estimation for planar contact manipulation with manifold particle filters,” The International Journal of Robotics Research, vol. 34, no. 7, pp. 922–945, 2015.
  • Moll and Erdmann [2004] M. Moll and M. A. Erdmann, “Reconstructing the shape and motion of unknown objects with active tactile sensors,” in Algorithmic Foundations of Robotics V.    Springer, 2004, pp. 293–309.
  • Pezzementi et al. [2011] Z. Pezzementi, C. Reyda, and G. D. Hager, “Object mapping, recognition, and localization from tactile geometry,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on.    IEEE, 2011, pp. 5942–5948.
  • Ottenhaus et al. [2016] S. Ottenhaus, M. Miller, D. Schiebener, N. Vahrenkamp, and T. Asfour, “Local implicit surface estimation for haptic exploration,” in Humanoid Robots (Humanoids), 2016 IEEE-RAS 16th International Conference on.    IEEE, 2016, pp. 850–856.
  • Aggarwal and Kirchner [2014] A. Aggarwal and F. Kirchner, “Object recognition and localization: The role of tactile sensors,” Sensors, vol. 14, no. 2, pp. 3227–3266, 2014.
  • Besl and McKay [1992] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611.    International Society for Optics and Photonics, 1992, pp. 586–607.
  • Dellaert and Kaess [2006] F. Dellaert and M. Kaess, “Square root sam: Simultaneous localization and mapping via square root information smoothing,” The International Journal of Robotics Research, vol. 25, no. 12, pp. 1181–1203, 2006.
  • Yu et al. [2015] K.-T. Yu, J. Leonard, and A. Rodriguez, “Shape and Pose Recovery from Planar Pushing,” in IROS, 2015.
  • Yuan et al. [2017b]

    W. Yuan, C. Zhu, A. Owens, M. A. Srinivasan, , and E. H. Adelson, “Shape-independent hardness estimation using deep learning and a gelsight tactile sensor,” in

    IEEE International Conference on Robotics and Automation (ICRA), 2017.
  • Li et al. [2014] R. Li, R. Platt, W. Yuan, A. ten Pas, N. Roscup, M. A. Srinivasan, and E. Adelson, “Localization and manipulation of small parts using gelsight tactile sensing,” in Intelligent Robots and Systems (IROS), 2014 IEEE/RSJ International Conference on., 2014.
  • Izatt et al. [2017] G. Izatt, G. Mirano, E. Adelson, and R. Tedrake, “Tracking objects with point clouds from vision and touch,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 4000–4007.
  • Wang et al. [2018] S. Wang, J. Wu, X. Sun, W. Yuan, W. T. Freeman, J. B. Tenenbaum, and E. H. Adelson, “3D Shape Perception from Monocular Vision, Touch, and Shape Priors,” ArXiv e-prints, Aug. 2018.
  • Zeng et al. [2018] A. Zeng, S. Song, K. Yu et al., “Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
  • Hogan et al. [2018] F. R. Hogan, M. Bauzá, O. Canal, E. Donlon, and A. Rodriguez, “Tactile regrasp: Grasp adjustments via simulated tactile transformations,” CoRR, vol. abs/1803.01940, 2018.
  • [40] Website for push dataset. [Online]. Available: web.mit.edu/mcube/research/tactile_localization
  • He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.