Tactile sensing can augment, complement, and sometimes substitute vision when dealing with occluded and concealed objects [nichols2015methods], transparent or highly reflective materials [taira20103d], and when handling liquids [nagai2020tactile], for which optical sensing is not well suited. Tactile sensing is also an attractive modality based on established evidence from biological sciences indicating the tactile sense are “amplified” when being used alone [weaver2007attention]. Regarding the fact that optical information is limited or not available in the aforementioned cases, tactile sensing based exploration strategy is studied in this paper. An example application scenario is the robotic surgery, where a considerable part of the diagnosis and procedural knowledge comes from tactile exploration, as is used to identify tissue growth, inflammation and the presence of foreign bodies within the human anatomy [nichols2015methods]. Likewise, in other disciplines, such as machinery inspection and maintenance, operators rely on tactile perception to acquire some unique part properties, such as object mass, the center of mass [kaboli2016tactile], texture properties [fishel2012bayesian], friction coefficient [chen2018tactile], irregularities, impurities, and cracks [palermo2020implementing]. While some of this can be found through specialized optical equipment, the use of hands comes naturally to people, is intuitive and costless.
Most works in robotics use some form of optical acquisition device to model the environment in a contactless fashion [schmid2020efficient, struckmeier2019vita, carrillo2012comparison, yang2013gaussian]. Creating the same profile by tactile feedback can be challenging as it requires specialized fabricated miniaturized sensors, leading to a small detection range that is not comparable with optical cameras. Consequently, to increase the knowledge about the object properties (e.g., shape, material, etc.), multiple touches are required using small contact areas per touch, or fewer touches with larger contact areas. Enlarging the contact area generally requires larger contact forces to maintain the pressure needed, which can lead to object repositioning inadvertently [suresh2020tactile]. All these factors affect the accuracy and reliability of the tactile feedback.
Continuous efforts have been devoted to the design of novel tactile sensors to reduce contact intrusiveness. In particular, biomimetic designs have been shown to deliver accurate information with low fabrication costs [pearson2011biomimetic, fox2012tactile, lepora2019pixels]. Among biomimetic sensing, whisker based tactile sensor is inspired by animals and insects, such as mice, cockroaches, cats, and horses, who rely on this type of sensing for effective perception and navigation in the environment [lee1999review]. In contrast to other types of more conventional tactile sensors, whisker-based sensing enables larger exploration areas according to its volume, the number of probes, and the probes’ density. Whiskers are flexible, allowing compliance during “sweeping” motions over the target objects and using only minimal contact forces.
In this paper, we present a compliant whisker tactile sensor that is capable of exploring unknown environments. The sensing mechanism follows the same principle of whiskers in nature [solomon2006robotic, hartmann2001active]. The proposed artificial whisker sensor consists of fiber filaments that act as whisker probes, and barometer sensors that measure the pressures caused by the bending displacements of the probes with respect to their resting positions. The system can be applied to diverse applications that may benefit from the low contact force required by contact detection, such as suspicious object inspection [aggarwal2015haptic] and surgical diagnosis [nichols2015methods].
To facilitate autonomous tactile perception and understanding, the whisker sensor is used in conjunction with an active exploration policy, which directly determines the sample characteristics, and in turn, determines the task performance characterization. Previous works have studied acquisition policies for exploring indoor and outdoor scenarios through remote sensing (e.g., vision [schmid2020efficient, yang2013gaussian], sonar [fang2010coverage], etc.). In such works, physical contacts during exploration were commonly reduced or avoided altogether. A key question that remained is how to plan the exploration path with the tactile modality alone
. Unlike previous work, we approach the problem using a hybrid exploration policy, which consists of an active object searching policy, and a reactive contour tracing policy that explicitly considers the contact events. The exploration strategy is efficient in collecting contact points, which facilitates object classification using a deep neural network.
Our technical contributions are listed as follows. First, a low-cost whisker array tactile sensor is designed, manufactured, and discussed (Sec 3). Second, an autonomous tactile exploration policy is proposed to search and reconstruct the contour of multiple objects in unknown scenes (Sec. 4). Third, a deep learning classifier (CT-Net) is proposed to classify object contours. Last, the performance of the proposed work was evaluated both in simulation and on a real robot (Sec. 6, Sec. 7, Sec. 8).
2 Related Work
2.1 Tactile Sensors
Tactile sensors are devices that acquire tactile information through physical interaction with the environment. The most common tactile sensors are based on capacitive, piezoresistive, thermoresistive, inductive, piezoelectric, magnetic, and optical sensing mechanisms [tiwana2012review]. For a comprehensive survey of tactile sensors, we refer readers to the survey by Tiwana et al. [tiwana2012review].
Our focus is on tactile sensors that are suitable for exploring environments without object priors. However, Most of the aforementioned sensors are not suitable for spatial exploration tasks, because the design determines that those sensors only have a very narrow sensing range. For example, the commercial force-sensitive resistors (FSR), piezoresistive, and capacitive sensors are all manufactured into thin films (or plates), which constrains the contact to be inside a local surface region [schofield2016effect, maiolino2013flexible]. To address this limitation, we developed a whisker tactile sensor, which has a wide sensing range and high sensitivity. We report on satisfactory results in exploring a variety of challenging scenes.
2.2 Whisker Sensors
Whiskers (Vibrissae) sensing is commonly found in aquatic mammals, rodents, insects [prescott2011vibrissal, ahl1986role], and even viruses [kostyuchenko2005tail], in which hair-like or bristle-like structures are used for the tactile perception of the surroundings. Whisker sensing endows nature with a variety of functions. The whiskers allow enclosure, compliance, separation, heat dissipation, navigation, and wave propagation [pearson2011biomimetic].
Artificial whiskers, which are inspired by the biological structure of the vibrissae, have been adopted to enhance the sensing capabilities of robots. The earliest whisker sensor can be traced back to the 1970s [wang1978sensors], and more recently they have been used for applications including obstacle avoidance [mckerrow1991introduction], ground proximity [hirose1985titan], and object localization [fox2012tactile], etc.
What makes whiskers so effective in the animal kingdom is the quality and quantity of the sensory information obtained. These advantages can be attributed to the whisker’s sensing range, high sensitivity, and compliance. For example, the whiskers on a rat’s face have a density of around 30 on each side of the face, and lengths ranging from 20-100 mm [brecht1997functional], which enables detecting small objects in a wide range around the rat’s head. Similarly, these properties are also preferred on robots, due to the need of exploring the distribution of objects ahead of time. To reach information acquisition efficiency that is similar to animals, efforts have been devoted to fabricating whiskers as an array [harada2014fully, fend2003active]. More recently, Struckmeier et al. [struckmeier2019vita] improved the sensing mechanism by reproducing the active whisking behavior that is observed in rodents by motor “musculature”, which is capable of delivering a sweeping motion. The work most relevant to our sensor is the lightweight whisker array designed by Deer et al. [deer2019lightweight], which shows barometers with extended whiskers can be used as the sensing component to detect micro-force (e.g. detect air or fluid velocity around whiskers). Based on this sensing mechanism, various adaptions in mechanical, electrical, and signal processing aspects have been made to facilitate the usage of barometer based whiskers in active tactile exploration (refer to Sec. 3).
2.3 Active Tactile Spatial Exploration
Active spatial exploration concerns the acquisition of the scene or object’s spatial features using active exploration. The spatial exploration allows the agent to gather the information necessary to address challenges such as scene reconstruction [driess2017active], object recognition [zhang2017active]
, pose estimation[suresh2020tactile], and planning manipulation policies [dragiev2013uncertainty], which are stepstones in realistic applications leveraging robotics and machine perception.
Most active spatial exploration problems rely on the visual sensing modality. For example, the Active Simultaneous Localization and Mapping (Active SLAM) uses an active policy to guide the map reconstruction while simultaneously localizing the robot’s pose [carrillo2012comparison]. The same problem is more challenging when conditioned solely on the tactile modality, which is yet less investigated. When compared with the visual modality, the tactile sensing range is much shorter, resulting in reduced efficiency in information acquisition. For the same reason, each contact is generally not sufficiently informative about the object’s properties. While this problem can be alleviated by accumulating evidence from a large number of probes, there is a movement cost associated with the finger transitions [zhang2017active]. Besides, in some applications like bomb disposal, every probing motion could also lead to fatal outcomes.
To sample using an optimal strategy with a reduced number of probes, motion planners for information acquisition have been previously proposed. This is part of a research theme referred to as Informative Path Planning (IPP) [hitz2017adaptive, schmid2020efficient, yordanova2020coverage]. Among IPP approaches developed for tactile sensors, a commonly used approach is to explore uncertain regions using discrete sampling. For example, the next probing point can be obtained by solving the Bayesian optimization problem on a continuous function [srinivas2009gaussian]. Other approaches have also focused on developing efficient sampling policies to accelerate uncertainty reduction, and has shown to be able to increase tactile sampling efficiency by Jamali et al. (2016) [jamali2016active], Martinez et al. (2017) [martinez2017feeling], and Kaboli et al. (2019). [kaboli2019tactile]. The downside of such approaches is that discrete probes convey no observations during the transition between two probing events.
Conversely, sliding or sweeping motion (i.e., sample continuously along the object surface) can generate more efficient exploration paths [driess2017active]. For example, it has been shown that continuous informative tactile sampling can be achieved through a sliding motion on flat surfaces [abraham2017ergodic]. More recently, sliding motion has also been demonstrated on curved object surfaces. For instance, Driess et al. [driess2017active, driess2019active] used a compliant controller to facilitate data collection along object surfaces, and simultaneously used Gaussian Process Regression to estimate the object shape. Similar methods have also been adopted by, Rosales et al. (2018) [rosales2018gpatlasrrt] and Ottenhaus et al. (2018) [ottenhaus2018active]. While the works mentioned above indicate that simple geometric shape estimation can be accomplished through sliding sensor motion, the effectiveness of such an approach on complex object surfaces remains to be shown. Such a knowledge and technology gap is because discontinuities in the surface curvature add complex motion constraints to planning and control.
2.4 Tactile Object Recognition
Tactile object recognition is an essential component to achieve tactile intelligence, which is the ability of machines to make sense of the observations based on tactile sensing [luo2017robotic]. In most cases, the sensor configuration would determine the modality attributes used in the recognition task. Therefore, the specific design will be more sensitive to one or multiple options from geometry, texture, material stiffness, mass, etc. than others [tiwana2012review]. Particularly, geometric shape is a commonly used feature since it is one of the most intuitive object representations and can be easily measured with tactile sensing. In this paper, our developed method is based on the object shape information. In this category, conventional methods can be based on local shape descriptors, which are mainly adopted from techniques for processing visual information. These descriptors include LBP [li2013sensing], SIFT [lowe1999object] and Tactile-SIFT [luo2015novel], MR-8 [varma2005statistical], Normalized momentum [pezzementi2011tactile], etc. Another method is based on global shape features e.g., global point matching [meier2011probabilistic], enveloping polyhedral model [casselli1995robustness], triangle parameter histogram [zhang2016triangle].
In this paper, we propose a novel approach that recognizes objects by the projected contour. Compared to other shape attributes, the contour can be easily obtained, and is a “cheap” proxy for shape (requires fewer computational resources than volumetric attributes). In addition, in this work, we show that the contour shape has sufficient discriminative ability to object classification among a large number of categories when using together with a deep learning based point cloud classifier.
3 Whisker Sensor Design
A prototype of a novel artificial whisker sensor is presented. The sensor’s working principle is similar to the sensing mechanism of rodent animals’ vibrissae. The whisker based tactile sensing device consists of three parts: 1) whisker-based sensing probe, which was created by plastic-based soft materials that can transduce the contact force through deformation, and 2) a barometer based pressure sensing device that interface with the whisker rod, and 3) a programmable microprocessor for data processing and communication.
Five barometers were soldered on the top surface of a Printed Circuit Board (PCB) with a pentagonal layout, as shown in Fig. 2. Plastic tubes with a diameter of 6 mm and a height of 7 mm were then fixed on the barometers using epoxy resin (Devcon). Ecoflex 00-30 (Smooth-On) was applied to connect the whiskers with the barometers. Part A and part B of Ecoflex were mixed uniformly at a weight ratio of 1:1 and degassed for 10 min under a vacuum condition. Then, the Ecoflex was injected into the plastic tubes using a syringe. Each whisker with a diameter of 0.8 mm and a length of 7 cm was then inserted into a plastic tube. The whiskers were fixed horizontally with the assistance of a supporting skeleton. Finally, the PCB with fixed whiskers were cured in an oven at 65 C for an hour to crosslink the Ecoflex. After curing, the whole PCB was attached to the tool flange (3D printed using PLA).
One DPS310 digital barometer (Infineon Technologies) was used to sense the pressure at each whisker’s root. The sampling rate of all barometers was configured to 64 Hz. To control and read data from 5 barometers, a low-cost, 8-bits microprocessor STM8S003F3P6 (ST Microelectronics) was used. A SPI bus enabled serial communication between the microprocessor and the onboard sensors.
The system also consists of a computer that reads the sensor data, and runs the algorithms proposed. RS-485 serial communication was used for real-time communication between the sensor board and the computer. The RS-485 only has two wires, and allows up to 256 sensor boards on a single bus segment. Using communication, the sensor board is scalable to a large sensor array, as each pentagon board has 4 pairs of communication ports, which can be connected to any other sensor board seamlessly. To eliminate the data packet collision in transmission, a communication protocol was designed based on token-ring. For this, the transmission of sensor is triggered by the computer. The transmission of sensor () can only be triggered by sensor .
Pressure drifting is a commonly observed issue in barometers. This is mainly caused by the coupling effects from the temperature, and the partial inelastic strain recovery of the gel when a large force is applied [koiva2020barometer, deer2019lightweight]. The drifting is harmful because it reduces the signal-noise ratio, and thus may degrade the confidence in detecting contacts. The drifting can be removed by a high pass filter because it mainly contains low frequent components. For this, a first-order high pass filter with the cutoff frequency at 11.3 Hz was used to remove the drifting effect. The output signal of this high pass filter was then rectified to be positive. The noise was then removed by a first-order low pass filter with the cutoff frequency at 31.8 Hz. A contact event could be recognized if the filter output exceeds a threshold (0.0005 kPa in our case).
A special case is that the high-frequency signal component does not exist when the whisker rod is attached to the object in a stationary state. We solve this by adding a rotatory whisking motion to the robot end-effector’s z-axis. In our case, a sinusoid motion of degree magnitude at 0.5 Hz was applied.
4 Autonomous Tactile Exploration
4.1 Problem Formulation
Consider a dexterous end-effector with tactile sensing ability. A contact point in the world coordinate can be obtained when an object is in contact with the tactile sensor. As an example, refer to the scene in Fig. 1, in which the task space is defined inside a rectangular region in a horizontal plane: , , . The boundaries of this region can be relaxed to an irregular polygon. Also, the plane can be relaxed to a smoothed surface if the end-effector can slide to explore such surface. It is also a common case that only the end-effector is equipped with the tactile sensor. To avoid the case that the object may be unintentionally moved by the robot regions that are not equipped with tactile skin, only the tactile sensor is placed inside the task space.
Inside the robot’s task space , there exists an unknown number of objects , each with its contour curve . For any point pairs , it satisfies the spatial isolation condition:
Where is a distance constraint that assures objects do not attach to each other, and that sensor is able to move and explore freely between objects. This is a sufficient condition that allows the objects to be spatially separated by using only tactile observations.
The goal of tactile exploration is to localize and spatially characterize each object. While this can be achieved through exhaustive exploration, it would not be practically efficient motion-wise. The reason is that every single “touch” involves costs expressed in terms of travel distance, movement time, energy, and computing cost. Compared to visual exploration, we generally attempt to be efficient in tactile exploration, due to the costs and the limited amount of information conveyed per “touch”. Without loss of generality, this also reduces the chance of target repositioning by reducing the number of “touches” if intrusive tactile sensors are used.
For this reason, the aim is to minimize the traveling cost along the exploration path starting at the robot’s current position. Hereby, we define the acquisition function at time step as . Assume the traveling time of path is , the corresponding optimization problem is defined as Eq. (2):
Where is a given threshold value to ensure the object boundaries are sufficiently explored. This is subject to the Informative Path Planning (IPP) framework, on the region around the object contours rather than the whole task space. Obtaining analytical solutions for this problem is NP-hard (Feige et al. [feige1998threshold]). To solve this, a dual problem is generally being solved to obtain suboptimal solutions i.e., maximizing the information acquisition within cost budgets.
The problem of planning path is characterized by the unique properties of tactile sensing. There are two key differences with the traditional optical sensing based IPP frameworks: 1) the sensing range is the same as the contact range, leading to unavoidable contacts with objects; 2) observations are associated with motion constraints, which are incrementally added to the task space as the contact occurs. Although versatile informative path planners have been proposed to generate continuous exploration paths, most of them are either designed for contact-free task space [binney2010informative, hitz2017adaptive], or for obstacles detected ahead of time [schmid2020efficient, wei2020informative]. Such algorithms may not be capable to tackle scenarios with unpredictable contacts. To collect samples from the object surface, one adaption is to follow a re-planned path using partial observations, which is re-instantiated every time a new contact occurs. But there are additional issues related to the motion consistency and exploration efficiency. Besides, making contacts in this pattern is computationally inefficient due to frequent interruptions, which involves more computation cycles required by path re-planning. In addition, these approaches may not have theoretical guarantees for full coverage of the exploration region (i.e., cannot ensure object discovery). To address all these issues, a hybrid exploration policy is introduced in the next section.
4.2 Hybrid Exploration Policy
A hybrid tactile exploration policy is proposed to tackle the aforementioned shortcomings of the presented techniques. The rationale behind this idea is adapted from the blind’s tactile exploration strategies (e.g., search objects blindly by haptics, or explore a tactile symbol image for comprehension), which relies on multiple observations or simultaneous force feedback. According to the human study from Zhang et al. (2018) [zhang2018image], this exploration procedure is characterized by five different procedures, each exhibiting a distinct motion pattern. These motion patterns are: 1) Frame Following (FF), which traces the scene boundary to obtain its size, 2) Contour Following (CF), to learn object’s size and shape, 3) Surface Swiping (SS), which explores object’s internal structure, 4) Relative (RE) and 5) Absolute (AB), which obtain the object’s relative and absolute position by moving a finger back-and-forth, respectively.
Let us define tactile exploration procedures of a robotic system by three independent procedures: 1) Object Searching (OS), which is used to actively search and localize an object, 2) Contour Tracing (CT), which explores the object contour, and 3) Feature Sampling (FS) that is used for actively gathering object features. Examples of such procedures are given in Fig. 3. The OS and CT are adapted from the human blind’s exploration strategies mentioned above. For these two stages, the corresponding policies are discussed in Sec. 4.3 and Sec. 4.4. In addition, there is an FS procedure to collect additional information required by the task (Sec. 5.1). As opposed to human blind’s exploration, procedures RE, AB are not always involved due to the availability of accurate positioning of robotics, leading to a reduced number of steps when compared to the human counterpart. Conversely, human blind exploration relies on kinesthetic and cognitive motor functions to obtain the location of objects [zhang2018image, weaver2007attention], which is necessary to involve RE and AB procedures that are not used by robots.
4.3 Contour Tracing (CT)
The contour tracing (CT) is the most commonly found movement for object recognition performed by blind people [zhang2018image]. The contour tracing is a greedy policy for sampling contours of objects during autonomous exploration, which is capable of gathering additional information about the occupancy state being enclosed by the path.
Proposition 1: Contour tracing brings a higher information acquisition rate than a pure exploration policy (an iterative process that collects information about areas that have not been explored up to that iteration [shyam2019model]) for closed tracing paths.
Proof: Consider a robot exploring the task space by tracing the contour of an object using path . Since the object has not been visited before, it is labeled as “not occupied” in an equivalent grid representation. A pure exploration policy has an information acquisition rate . This definition is proportional to the total amount of information gathered , and is inversely proportional with the travel distance . When the path is closed, the region inside can be segmented by the tracing path in , and thus labeling the region as occupied. This is equivalent to an exploration rate of , where is the shortest coverage path inside region . This shows analytically that the exploration rate of the latter case is greater than the former case. Thereby enclosing the contour tracing path brings a higher exploration rate.
Without loss of generality, the contour tracing can be accomplished in multiple ways. For instance, a robot can sweep a force sensor along the object’s perimeter [ahmad1990shape]. If equipped with a multi-taxel tactile sensor, the contour tracing can be accomplished by pressing on an edge and simultaneously following the edge direction [lepora2019pixels]. Such approaches require surfaces to be continuous and uniform enough in order to slide the sensor smoothly. In contrast, many real objects have sharp edges or corners that do not comply with the smoothness condition, which is not consistent with the basic assumption of the previous techniques. To address this problem, reactive rhythmic spiral movement patterns generated by Hopf bifurcation [hassard1981theory] are introduced. This is denoted as the Hopf oscillator (introduced later in Sec. 4.5), which has been applied to planning versatile robotic locomotions, such as swimming [hu2014parameter], hopping [buchli2005dynamical], and quadrupled walking gaits [liu2017hopf].
4.4 Object Searching (OS)
Let us define the search space as the subspace of where object searching is performed. At time step 0, . If at time step , objects occupying regions have been discovered, the search space at time can be obtained by removing those occupied regions from the task space i.e., , where “” and “” operators are defined as the difference and union operation between sets, respectively. Following the definition of the search space, we can articulate remark 1.
Remark 1: Searching objects in can be simplified as a pure exploration problem.
Since the search space does not include any observed occupied regions, there are no historical observations acting as priors for the prediction of the object locations to be discovered. Besides, when assuming the locations of objects are independent from each other, it is also not possible to use discovered objects as priors. As a result, the only feasible policy for searching objects in is to collect information from unexplored regions, which is subject to the definition of the pure exploration problem.
Next, let us define the occupancy function , which is given in Eq. (4). Gathering observations from different locations builds up the observation set . We assume the occupancy state of a position can be acquired from a tactile sensor in the proposed implementation.
The acquisition function is defined as Eq. (5).
In Eq. (5),
is the standard deviation of the estimated occupancy function. When ignoring the transition path, the sampling process is subject to the uncertainty sampling convention in the Bayesian optimization setting [blanchard2020informative]. Note that evolves as the sampling proceeds.
We leverage Gaussian Process Regression (GPR) to estimate . This estimator is given by Eq. (6), where
is a kernel function. In our setting, we use Radial Basis Function (RBF) as the kernel function.is the position to be queried for .
The goal of the object searching task is to bring to zero the unexplored regions in the search space as the number of samples reaches infinity, so that an object can be eventually detected regardless of its size. We show this can be achieved by following proposition 2.
Proposition 2: Complete coverage of the obstacle-free regions in the task space can be achieved by the following hybrid policy:
1) object searching policy: search objects by walking towards . When is reached at time , continue to search object by replanning a new target , and then iterate the above procedures.
2) contour tracing policy: apply contour tracing immediately when an object is encountered at time before reaching . The object should be fully enclosed into . After that, resume to the object searching policy i.e., to follow a path that is towards a new target , where .
Proof: The proof can be completed by analyzing on two sub-policy cases, respectively:
For the object searching policy, is reached at time step , Then the robot will continue moving to the next planned target . Repeating this target chasing approach infinitely will lead to complete coverage of the obstacle-free area in the task space. This is the space-filling property of the Maximum Squared Error (MSE) sampling, which is referred to Theorem 6 and Theorem 7 in literature [vazquez2010convergence].
The contour tracing policy starts to execute at the same time when a contact event occurs, by which the object searching policy is interrupted before reaching . Since an occupied region will be found, this will result in a reduced search space (remark 1). Given the fact that there is a finite number of objects in the task space ( objects in total), transition to contour tracing policy will occur times at most. After that, no objects will be in the search space and maximum task space coverage will be attained by the object searching policy.
4.5 Policy Design
Proposition 2 defined the basic framework of the algorithm proposed. We implemented this algorithm by a finite state machine with two states: 1) Object Searching (OS), and 2) Contour Tracing (CT). The robot starts in object searching (OS) state until a contact event occurs. Then, the robot switches to contour tracing to acquire contact points, until the contour tracing path is closed. After that, the state machine transfers back to object searching (OS) to search for the next object.
4.5.1 Contour Tracing
Hopf oscillator is used to generate a forward propagation path along the object contour. For a given central point , The dynamic equation of the Hopf oscillator in a Cartesian coordinate is described as Eq. (9).
Where is the sensor’s position in Cartesian coordinates. and is the oscillatory frequency. When a contact event occurs, the algorithm changes the tooltip (where the sensor is mounted on) velocity direction, in such a way that it causes a “bouncing off” effect. This is implemented by updating the central point from to according to Eq. (10) for a binary touch sensor:
Alternatively, if contact direction vectoris observable, the central point can also be calculated by Eq. (11
), which can reduce the variance in the interval distances between contact points.
4.5.2 Object Searching
The object searching procedure target to plan an informative path based on the information acquisition function and an occupancy map. Here is obtained from the Gaussian Process Regression estimator introduced in Sec. 4.4. In each planning iteration, each new observation aggregates the matrix in Eq. (6) by an additional dimension in both row and column.
The proposed informative path planner implements the following idea from a sampling-based IPP framework [schmid2020efficient]. A Rapidly-exploring Random Tree [lavalle1998rapidly] is expanded inside the search space. We name the proposed algorithm as Tactile Object Searching (TOS) planner. The pseudo-code of TOS is given in Alg. 2. A tree structure is maintained by a vertex set and an edge set . The tree is expanded in the same way of RRT, in which the tree branch steers towards a stochastic target from its nearest node [karaman2011sampling]. Unlike canonical RRT
that aims to reach a single target, a heuristic sampling process is used to guide the tree expansion towards regions with high values in. To generate each tree node , a total number of candidate samples in the search space are generated uniformly. The is chosen by roulette selection (Eq. (12), with ) from candidate points. Fig. 5 givens a comparison between exploration trees from heuristic sampling and the canonical uniform sampling using the same number of samples.
Once the tree has been expanded, can be approximated by using vertices in the vertex set (i.e., ). The planned path can be obtained by backtracking from to the robot position (tree root).
4.6 Object Contour Extraction
The object contour can be estimated by directly connecting the contact points according to the contact sequence found through the Hopf oscillator. While this is a simple technique, it brings clear advantages when compared to conventional point cloud segmentation approaches using clustering (e.g., mean-shift clustering [kaboli2019tactile]). For our approach, object segmentation and point segmentation (attributing points to objects) can be simultaneously achieved, because each transition from OS to CT can only introduce one new object, which all the following contact samples belong to. This is in contrast to clustering for which determining the number of clusters, their locations, and point labeling are all treated as individual steps. It has been reported that the performance of clustering techniques is degraded when objects are spatially entangled, or when the point cloud density has high variance [wiwie2015comparing]. In comparison, segmentation by contour tracing is unlikely to be affected by these factors, as shown in Sec. 7.3.
Second, an algorithm that is widely used to calculate the concave hull polygon from a point cluster is the -shape [edelsbrunner1983shape], which involves higher computational cost, and an ad-hoc parameter that requires manual efforts in tuning. In contrast, our introduced method is computationally cheap and parameter-free.
5 Object Recognition
5.1 Problem Definition
The key idea presented in this section relies on the fact the contour shapes convey categorical information. Such information can enable object classification among a wide category of objects when used appropriately. For this purpose, a deep learning classifier is used for object classification using object contour. In addition, the discriminability of the proposed classifier is improved by incorporating very few probing points in space.
The problem formulation of classifying an object using contour points can be expressed as: finding function that predicts the object label . is a point set that belongs to an object contour . is a point set in that is within the object surface and is reachable by the end-effector. can be obtained by contour tracing (Sec. 4.3). is collected during Feature Sampling (FS) procedure described in Sec. 4.2. While the method for obtaining is task-dependent, adding a few discrete contacts inside the estimated contour polygon from above of the object provides a general approach. Given the number of total contacts , and planar locations that are inside the contour polygon , the end-effector is lowered at each of those locations from above of the object, until the sensor reaches the object at . If no contact is detected, the is set to zero, indicating that the object is hollow at that point. Note that the position of the first contact for each object is always chosen as the centroid point of the object’s estimated contour polygon .
5.2 Design of classifier
Point clouds have been classified by deep neural networks [luo2017robotic, qi2017pointnet]. We will leverage this to design a classifier that uses both contour points and probing points as the input observations. The proposed network is named as CT-Net. The network architecture is described in Fig. 6. In this network, contour points in are transformed into by increasing one dimension (appending a zero), and then being concatenated with probing points in
. Then, each point is transformed to higher dimensions progressively using Multi-Layer Perception (MLP) with ReLU activation function. Side branches are added, to alleviate the gradient vanishing problem by creating shortcuts for intermediate layers. This is accomplished by concatenating all feature branches first (“+” operator in figure), and then reducing the feature dimension by using another MLP. The transformed intermediate features are fused into the last MLP layer’s output by a sum operation (“” operator in figure).
Since the point sequence can vary (even for planar contours, since the sampling does not always start from the same point), an aggregation function is used to obtain the invariance from the point order. To achieve such invariance, a max aggregation [qi2017pointnet] is used to calculate the object latent feature. Next, a canonical MLP classifier with a softmax function is used to calculate the categorical possibility vector.
6 Results on Whisker Sensing
The performance of the whisker sensors was characterized by pushing the whiskers under different conditions. Linear motor (LinMot PS01-23 × 80) was used to supply a linear reciprocating motion at a speed of 1 m/s. The PCB was mounted on a holder flange and fixed onto a base plate. The fixing pose assures the whisker to be perpendicular to the moving direction of motor’s pusher, as shown in Fig. 7 (a) and (b). Initially, the motor and the whisker were aligned to be in the critical contact state (zero air gap with no pressure offset observed from the barometer). The pushing distance of the motor was then set as 0 mm, 5 mm, 10 mm, 15 mm, 20 mm, 25 mm, and 30 mm, respectively. It could also be observed that the sensor’s output is a function of the “root distance”, which is defined as the distance between the contact point and the barometer in the critical contact state. The root distance was set as 60 mm, 50 mm and 40 mm, respectively. This results in a combination of experiment configurations.
The resultant pressures from barometer sensors are shown in Fig. 7
(c). B-spline interpolation was used to obtain the intermediate value (surface) based on the sensor’s raw pressure outputs (dots). First, when the root distance is fixed, the pressure reading increases monotonously as the pushing distance increases. This monotonicity implies that when the robot can actively determine the pushing distance (e.g., horizontally offsetting the tool from the initial contact position), it is theoretically possible to calculate the root distance as a function of the sensor’s pressure reading, by which the contact position can be estimated. But the sensitivity may decrease if the pushing distance were too large, as the slope of the pressure gradually decreases with the pushing distance. Second, under the same pushing distance, the pressure reading decreases when the root distance increases (farther from the barometer). This implies that there may exist a maximum length limit when choosing the whisker filaments.
Fig. 8 shows the sensor’s output in the time domain. The experiment was conducted with a real robot controlled by the Hopf oscillator. Overall, there are three contact events that correspond to three signal peaks. Besides, the raw pressure data from the barometer’s reading drifted by 0.017 kPa in a time period of 30 seconds, from 101.433 kPa to 101.450 kPa. Nevertheless, the drifting effect can be mitigated by the filter module introduced in Sec. 3, by which the magenta curve with a higher signal-noise ratio was obtained.
7 Results on Tactile Exploration
7.1 Experiment Configuration
The proposed tactile spatial exploration algorithm was evaluated both in simulation, and with a real robot. A UR16e robot was equipped with the developed whisker sensor. The algorithm parameters used in both evaluation settings are shown in Table. 1. In particular, the evaluations focused on the following: 1) the efficiency of exploration, and 2) the quality of the object shape recovered from the tactile observations.
|Oscillator convergence ratio||10||10|
|No. of nodes in tree expansion||1000||1000|
|neighborhood distance threshold||0.10||0.10|
|RBF kernel lengthscale||0.08||0.08|
|GP noise level||0.02||0.02|
7.2 Exploration Efficiency
The exploration efficiency can be quantified using two metrics: 1) Scene Uncertainty measures the acquisition function averaged over all positions in the task space (Eq. (13)). Low values are preferred as they indicate more knowledge about the environment.
2) Contour Uncertainty measures the acquisition function averaged over all positions within the object contour (Eq. (14)). Low values are preferred, as they indicate that the points collected are highly informative of the object’s shape. Unlike scene uncertainty which focuses on the scene, contour uncertainty focuses on the obtained knowledge around objects.
The proposed algorithm was compared both qualitatively and quantitatively with two baselines. The first baseline is the method proposed by Kaboli et al. (2019) [kaboli2019tactile], which collects contact samples by progressively selecting line paths that are parallel to the and axis. To select line parameters, a constraint is specified to make the line crosses , so to increase the information acquisition. We denote this method as “Line Sweep” due to its motion pattern. Second, another baseline is the TOS planner alone, which is a component of our hybrid tactile exploration policy, but with the Hopf oscillator disabled (also denoted as “pure object searching policy”). This comparative experiment can reveal the contribution of contour tracing in object characterization.
The simulation experiment was conducted over 6 scenes. The results of the object contour extraction are presented in Fig. 9. Particularly, it shows the quantitative efficiency results of two scenes: 1) an environment with 5 small objects (Fig. 9 (a)), and 2) a scene with non-convex objects with inter-occlusion (Fig. 9 (d)). The efficiency metrics and as a function of the trajectory length are shown in Fig. 10
. The mean value (curves), as well as 95% confidence interval (shadow regions) calculated from 5 experiments per algorithm in each scene, are presented in Fig.10.
First, it can be seen that when the hybrid policy (blue curve) was used, both and were reduced at a faster rate than the line sweep algorithm baseline [kaboli2019tactile] (green curve). This superior performance can still be noticed even when the contour tracing was removed (orange curve), showing the efficiency advantages in object searching. Next, the contour tracing can improve the metric , as reflected by the results that hybrid policy can outperfrom two other baseline algorithms in Fig. 10 (b) and (d). In addition, it can also be observed that the confidence interval of the hybrid policy is smaller than other baselines in scene (d). The reason is that hybrid policy can maintain a consistent movement pattern in contact-rich scenes with a stable exploration rate. In comparison, the paths from two baseline policies were frequently interrupted by collisions, and have to be re-planned.
Fig. 11 shows plots of the contact points, and the predicted values of occupancy function using the Gaussian Process at the end of this exploration session. First, the contact points from the hybrid policy are distributed evenly around the contour by virtue of the contour tracing policy. This effect cannot be seen on the two other baselines. Second, the hybrid and the pure object searching baseline policy enabled successful object localization even when inter-occlusion exists between objects, as shown in scene (d) in Fig. 11. This is by virtue of the dense coverage guarantee from Proposition 2. In contrast, the line sweep baseline algorithm failed to localize the square in the middle of the scene (d), which is occluded by two L-shape objects.
Fig. 13 and Fig. 13 show the changes of over travel distance, and the trajectory of the hybrid policy, respectively. There are two facts that can be observed from these figures. First, the region area covered by exploration paths increased with the travel distance. Note that at the end of this exploration session, the regions with high uncertainty were only remained inside the contour, which cannot be visited by the sensor. Second, the hybrid policy avoided planning paths crossing through the discovered objects by explicitly using their contour polygons. This is opposed to the two baselines where collision paths are unavoidable if the object information were not complete.
7.3 Contour Reconstruction
Object contour can be extracted by the method proposed in Sec. 4.6. In Fig. 9, the contour extraction results from 6 different scenes are demonstrated. To be specific, we tested on primitive shapes in scenes (a), (b), and (c). In particular, the size of objects in scenes (a) and (c) are relatively small, and thus difficult to be found during object searching. Even though the challenge, all objects can be successfully localized. For small objects, contact points in scene (c) may not be sufficiently dense to represent the ground truth shape accurately, because the oscillator radius used is relatively large compared to the object size (squares, with a side length of 0.09). This issue can be solved by choosing a smaller value for when detecting small objects, at a cost of longer traveling distance due to more contact events.
Scenes (d)(e)(f) are experiments with non-convex objects. For scene (d), the gap distances between objects are relatively small compared to the object scale. For this reason, it is intractable to segment different objects by clustering correctly. In comparison, our object extraction method (Sec. 4.6) can succeed in reconstructing all three objects. In scene (e), the hybrid policy succeeds in localizing all pentagrams. But similar to (b), improving the reconstruction of those sharp corners of the pentagrams requires smaller . Last, the results from scene (f) show that the Hopf oscillator can go in and out of small chambers in the “S” shape, showing the robustness of the contour tracing motion.
7.4 Experiment with Whiskers on the Robot
The proposed approaches were implemented on a UR16e robot. The tactile feedback was obtained by using the whisker sensor developed, which was mounted onto the robot’s end-effector using a 3D-printed flange, as shown in Fig. 14 (a). During object searching and contour tracing, the robot moved in a horizontal plane (, where the desk surface is defined as ), as shown in Fig. 14 (b). Once the object is being localized, it is possible to sample points directly in volumetric space in order to enhance the object feature discriminability. This was done by approaching from top of the object until contact events, as shown in Fig. 14 (c). The experiments included 11 real objects, which are shown in Fig. 14 (d). The real objects include: a tape, a rubber airplane, an eyeglasses case, an apple, a game controller, a TV remote control, a bottle, a banana, a Realsense camera package box, and a book. The objects chosen have different masses, dimensions, affordances, materials and textures, friction coefficients, and convexities. Since the area of the task space was limited by the reachability of the robot arm, 3-6 objects were selected for each experiment.
It was found that 60-120 contact events in each real experiment were required to characterize the objects. To accomplish this, the contact detection procedure had to be robust, while at the same time non-disruptive (otherwise may reposition the object, which may deform the observed contour and lead to misrecognition). This indicates that our system is sufficiently robust to handle all 11 objects regardless of their shapes, configurations and positions. Fig. 15 shows the appearance of real objects in the first row, together with the contact samples and extracted polygon contours in the second row. It can be seen from the figure that there exists accurate shape correspondence between the real object layouts and their descriptor representations. In addition, there were no object repositioning events thanks to the compliance provided by the whiskers. We refer the reader to the link below Fig. 1 for the video recording of the exploration process.
8 Object Classification Results
8.1 Real-world object classification
In this experiment, object contours from real objects were classified using the deep neural network introduced in Sec. 5.2. The data was collected using the same method as Sec. 7.4. 4-10 observations were collected for each object, leading to 79 observations in total. Fig. 17 shows one example observation per object. To use batch training, the number of points in needs to be the same among different object observations. Thus, was interpolated to 64 points at an equal distance interval based on the polygon obtained in the previous step.
The ability of learning from scarce observations (data efficiency) is important, since acquisition of a large dataset of observations from real scenes is expensive. We first quantified the data efficiency by training the proposed network with a reduced amount of data. Only approximately 20% instances in the dataset were used for training, and the rest were used for validation (Setting A). The results were then compared to setting B, in which around 80% of the observations were used for training and the remaining 20% for validation. To reduce the performance variance, the network was evaluated by 5-fold cross-validation. The final metric is the average accuracy of all folds. Last, an ablation study was conducted to quantify the contribution from . For this, the accuracies of the following two settings were compared: 1) training and validating the network with only , and 2) training and validating the network with and altogether.
|Experiment Setting||Fold 1||Fold 2||Fold 3||Fold 4||Fold 5||Avg|
The results of these trials can be found in Table. 2. In this table, each entry is a classification accuracy obtained from an individual validation session. The network was trained using a learning rate at
for 500 episodes with data augmentation. The data augmentation was accomplished by rotating the contour around its centroid point by a random angle (sampled from a uniform distribution between). It can be observed that the CT-Net is effective, as when using only 20% observation instances for training (setting A, with one point in ), the averaged validation accuracy was 82.7%. This accuracy was increased to 98.3% when the training split was increased to 80% instances. Besides, we conclude that the proposed network is data efficient, because each fold of training data in setting A only includes 12-20 observations. Such amount of data is considerably small regarding the number of categories is 11. Last, it can be observed that the validation accuracy in the setting achieves equivalent or better performance than using only in all cases. This proves that using the point set can improve the object discriminability.
8.2 Scalability and Ablation Studies
An experiment was conducted to show that the proposed classification network is capable to tackle classification problems with a larger dataset of observation instances and categories. For this, we utilize the 3DNet dataset [wohlkinger20123dnet], which is composed of aligned mesh models from 222 categories. Because the categorical sample distribution in 3DNet is highly imbalanced, only categories with more than 15 objects were used. That is, a total of 2098 objects that belong to 68 categories were adopted. 70% of those objects were split for training and the remaining 30% for validation.
Due to the availability of mesh models, the points in the object contour set were directly obtained by raycasting without the need of measurements by physical interactions. This was done by three steps, as shown in Fig. 18. First, the surface vertices were projected onto a planar canvas at . Second, the contour estimation was generated by calculating the concave hull polygon through -shape algorithm [edelsbrunner1983shape]. Last, a given number of points were sampled from the polygon to create the contour point set . For , a given number of points were generated using the same method described in Sec. 8. The was then obtained at each point by raycasting from in a line perpendicular to , which is defined as the first intersection point between the ray and the mesh model.
The network was trained for 200 episodes using the Adam optimizer. The learning rate was at the beginning, and was then decayed to half the value for every 30 episodes. The classification accuracies are shown in Table. 3. Overall, the highest classification accuracy achieved is 61.1%. This proves the algorithm’s discriminability considering that there exist 68 categories in total. This is also reflected by the comparison with PointNet [qi2017pointnet], which is a deep neural network designed for recognizing volumetric points only. For a PointNet model that was trained and tested with 64 volumetric points in (Experiment 7⃝), the accuracy achieved is 37.5%, which is lower than the performances achieved by CT-Net.
An ablation study was conducted to investigate how the accuracy of CT-Net is affected by the number of contour points in and the number of volumtric points in . For , it was found that the accuracy was affected obviously when had less than 25 points. Conversely, the influences decreased for the points added afterwards. For , the accuracy increased significantly for the first observation (44.4% 57.5% with 50 points in ), but the accuracy was less affected by adding more observations. For instance, it only gets 0.1% accuracy boost when adding 63 more points afterward (2⃝ and 3⃝). To reduce the sampling cost, only one point was sampled for in the real experiment (Sec. 8.1).
|Index||Architecture||No. pts in||No. pts in||Accuracy (%)|
The tSNE analysis was used to visualize discriminability between categories, as shown in Fig. 19. The visualization was created using the validation split of 3DNet dataset, with the same neural network as setting 3⃝ in Table. 3. The data distribution indicates that samples within the same category tend to be spatially clustered together, while different categories tend to be spatially separated from each other.
The main objective of this work is to search and recognize objects by using the tactile modality alone, which allows gathering information from a scene when vision modality is not available. Currently, our hardware system is based on a UR16e robot with a whisker sensor manufactured at a cost of around 15$ each. Without loss of generality, the proposed approach can be applied to other robotic systems with at least a binary tactile sensor. The algorithm was also tested based on the embedded 6-Axis Force/Torque (F/T) sensor of the UR16e robot, by which the scene exploration task was conducted with satisfactory results for average to large objects.
The advantages of the techniques presented in this paper can be summarized by the following highlights. First, high information gathering efficiency was achieved by planning informative paths. Second, the intrusiveness of interaction was reduced by using a compliant whisker sensor, together with the passive dynamics for contour tracing. Third, to further facilitate exploration in volumetric space, the surface of the object was probed around the object’s position. Fourth, the planar and volumetric observations were integrated in such a way that high object discriminability was achieved, which allows recognizing new objects by only very few observations.
9.1 Broader Applications
This work potentially has broad impacts on other spatial exploration applications as well. For instance, in telesurgery, autonomous palpation policies have been developed to localize tumors beneath the skin [nichols2015methods]. We envision that the tumor searching and contour segmentation can be accomplished by the method proposed in this paper as well. When sliding a tactile sensor on the skin surface, a tumor can be localized if a sudden change in the material’s stiffness is observed. For this, the object searching policy can find the tumor with minimal traveling cost. After that, the contour tracing policy can be used to trace its boundary. This technique can promisingly improve the task efficiency due to the advantages described in Sec. 4.2. Similarly, the same method can be used to approach the problem of finding hazardous objects concealed or buried, and recognizing their categories by the shape of contours without the need of extracting them out [patel2020digger].
In addition, the blind grasping problem [wu2019mat] can be approached by the method proposed in this work. The object’s location, and a rough estimation of its shape can be obtained by our tactile exploration technique. Once a sufficient number of features is accumulated, it is possible to detect grasps based on the priors collected, and then lift the objects. This enables the robot to be applied to retrieving objects underwater. This is a challenging task when using the visual modality due to the water surface reflection, medium refraction, and turbid water conditions. We believe that our techniques can solve these challenging problems in an efficient manner.
The main limitation is the inability to handle cluttered scenes, where objects are in close contact with each other. In those cases, the contact observations cannot provide complete descriptions about the object contour due to inter-occlusion, which may lead to misrecognition. We envision this can be addressed by involving planar pushing techniques such as the method by Suresh et al. [suresh2020tactile] for contour tracing and object separation.
So far, the proposed method is reliable in scenes where objects are not in contact, and could be reached from at least one assured direction. Problems such as the exploration tasks within complex volumetric environments with layered structures are not consistent with this precondition, and may require further efforts to be solved.
In this paper we studied the problem of how to utilize the tactile modality to explore, characterize, and understand the environment as well as the objects within it. Unlike many commercially used tactile sensors that may require a relatively large force or pressure to obtain tactile observations, our compliant whisker-based tactile sensor has high sensitivity and long sensing range. This allowed acquiring contact samples with minimal intrusiveness. The tactile intelligence is then introduced by designing tactile exploration policies, such as the hybrid exploration pattern proposed. This allows to actively search objects by planning informative paths, and reactively trace the object contour by making physical contacts.
The feasibility of the proposed methods was evaluated not only in simulation, but also on a real robot with the developed whisker sensor. The object contour, as well as the volumetric contact points were successfully obtained with real versatile daily objects. A deep neural network was used to classify these observations, which proves that the contour shape is informative enough about the object category. In addition, experiments were conducted to show that this classification approach is generalizable to large datasets.
In the future, we plan to extend this framework to assist humans in teleoperation tasks with low visual information due to the medium or the lack of suitable sensors. We envision the proposed techniques can be applied to applications that include but are not limited to: blind grasping and object manipulation, underwater object localization and recognition, autonomous palpation in telesurgery, and navigation by visual-tactile fusion.
This material is based upon work supported by the National Science Foundation under Grant NSF NRI #1925194 and #2140612. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.