I Introduction
The potential market of service and entertainment humanoid robots has attracted great research interests. One of the most fundamental and challenging steps is to allow robots to interact and walk autonomously within a real world scenario. Ideally, they should be able to accept high level human commands and, then, autonomously walk in a reallife environment consisting of floors and stairs without colliding with obstacles. Thus, it is necessary for humanoids to be able to identify candidate traversable places within acceptable stability limits. This includes, for instance, flat surfaces or surfaces with limited slope and roughness values.
This paper presents a pipeline to compute a mobility map in realtime using an RGBD sensor mounted on the iCub, as shown in Figure 1
. As in most navigation scenarios the terrain information is vital to classify what is traversable and what is not. In particular, leggedtype robots such as humanoids require precise information on the surrounding terrain, not only for determining locomotion strategies, but also for coordinating complex body motions, such as generating obstacleavoiding freeleg trajectories during walking.
For humanoids, and from a simplified kinematic point of view, a surface is considered as a part of an obstacle if its geometric properties do not allow for a safe foot step planning. Useful surface properties may include its slope, roughness, rugosity and size. Such properties can be measured or estimated from 3D data, which can be provided, for instance, by using a structured light sensor. Such sensors are considered cheaper and lighter than their counterpart laser scanners. They also provide richer visual information, which can be useful for tasks including object recognition, classification and segmentation.
Ii Related Work
In recent years, the interest in scene modelling and understanding algorithms has widely spread. Specifically, segmentation and classification of outdoor workspaces using 3D depth data [Dima3], in order to allow mobile robots to move in real world environments while navigating between obstacles safely.[Ferri10].
The literature is rich also of research examples that deals with the problem of indoor scene understanding in order to give to the humanoid robot the capability of moving in autonomy in a domestic environment
[Li(b)]. The terrain plays a key role in works like this, and an overview on terrain traversabilty methods can be found in [Papadakis]. Most of the methods use 3D geometric features (e.g. normals) to interpret the scene that in general are very computationally demanding. For this reason these approaches are not appropriate to manage dynamic obstacles, for which the map has to be updated frequently and then the computation has to be repeated several times. On the other hand approaches that learn the scene through proprioceptive informations are more suitable for online applications. This approach is demonstrated in [Rover] for a MarsRover platform where the authors represent the environment in proprioception space in terms of expected slip. Moreover proprioceptive informations can be used in addition of 3D data to increase the reliability of the slope inference and then the accuracy of the classification [KarumanchiABS7].More in general the reliability in obstacle detection and scene classification can be augmented relying on multiple sensing modalities such as color and 3D data [Ras4]. Visual and structural modalities are clearly complementary: vision alone may be inadequate or unreliable in the presence of strong shadows, while depth measurement of sensors like structured light sensors can be misled by the sunlight interference with the IR sensor. In literature can be found works that try to solve the same problem, but using different tools to obtain this kind of information as laser rangefinder [Ras4], or stereo vision [Takaoka(h)], but vision techniques are in general challenged by darkness or other extreme lighting conditions and the accuracy that they can provide is too low to manage a such complex issue as scene modelling and interpretation.
Iii Our Implementation
This section describes our mobility map implementation, where Figure 2 highlights different data processing and estimation steps.
Data reduction: In this paper, we used the Asus Xtion PRO LIVE. This sensor produces depth maps at 30 fps with a resolution of 640480. The depth map includes over 300 thousands of points. In the next steps, it is required to evaluate a surface properties at any point of interest using its nearest neighbours. The computation cost is, hence, , where is the number of points in the depth map. Based on the application, the way the sensor is mounted on the robot and the sensor field of view, we are able to reduce the computational cost of the depth map by downsampling it into voxels of 1cm1cm1cm. Further data reduction can be achieved by limiting the depth range of the sensor to include only points with acceptable depth uncertainty. In this paper, we removed points at distances longer than 1.52 meters.
Denoising: 3D information contained in the point cloud is often contaminated by noise. This is due to many causes including the interference of daylight with the IR sensor and intrinsic errors made by the sensor performing the triangulation. Before attempting to estimate the characteristics of a point with respect to its surrounding, it is important to analyse if the surrounding neighbourhood is a good representation of the underlying sampled surface. Thus, following the method proposed in [Rusu08RAS2], for each point in a cloud , the mean distance to its nearest neighbours is first computed. Then a distribution over the mean distance space for the entire point cloud is assembled and its mean
are estimated. Our motivation is to keep points with mean distance to the nearest neighbours that is statistically similar to those by rest of the points. Thus, the new downsampled point cloud can be extracted using,(1) 
where is a desired density restrictiveness factor.
Normal Computation: For a good representation and segmentation we have to use features with high discriminating power such as normals and curvature. They are two of the most widely used geometric features because they provide information on the orientation of surfaces, indispensable for scene understanding. They are treated as local features, because they characterize the information provided by the nearest neighbours of each point. Their estimated values are sensitive to sensor noise and the selection of the neighbours.
After we have determined the neighbourhood of a query point
, we can use it to compute a local feature that represents the geometry around the query point. One surface point feature can be computed as the normal vector of the tangent plane, which can be estimated solving a leastsquare plane fitting problem over
[shakarji1998least]. As explained in [RusuDoctoralDissertation] this plane can be represented by a point x and a normal vector , and, given the distance from a point to the plane as , the values of x and are computed in a leastsquare sense such that .Assume
(2) 
as centroid of , we can then solve the fitting problem obtaining
as the eigenvector
of the smallest eigenvalue
of the covariance matrix of , expressed as(3) 
To resolve for the sign of , however, we need the viewing point of the sensor . To define the sign of the normals, each has to satisfy
(4) 
In this way all the normals point towards the viewpoint of the sensor.
RGBN coding: Before segmentation, we colourcode the point cloud using the computed normals. In the previous step, we have computed, for each point , a normal , function of its neighbourhood. We assign to the following RGBcoding
(5) 
where and are respectively the maximum and minimum value of the component among all the normals contained in the cloud. By doing so, pointcloud’s surfaces with similar normal, and thus orientation, will have consistent and similar colors (Fig. 2.c). This is an important step which allows us to use color differences to segment regions with common geometric features.
Ground removal: Ground removal is an important step for robust segmentation results. It provides two main benefits: First, it has the effect of isolating the object from the background, and thus improving segmentation. Second, it allows us to further reduce the computational cost of the entire system, because we exclude from the analysis the segment that has the majority of the points in the cloud.
There are many ways to segment the floor from the rest of the scene, some of them use images [GroundImages2] but, despite their robustness, they are often afflicted by the differences in light conditions of the scene. One way to achieve robustness is by using RANSAC, using the following steps:

Randomly select three noncollinear unique points from the point cloud ;

Compute the model coefficients from the three points ;

Compute the distances from all to the plane model ;

Count the number of points whose distance d to the plane model falls between , where represents a user specified threshold.
The last step represents a way of “scoring” a specific model that we used to find the best plane in the cloud. Every set of points is stored, and the above steps are repeated for iterations. The number of iterations is defined as follow. If
is the probability of picking a sample that produces a bad estimate (
i.e.outlier), then is the probability of picking at least one good sample (i.e. inlier). This means that the probability of picking good samples becomes . For trials, the probability of failure becomes . If is the desired probability of success (e.g. = 0.99), then:(6) 
After the algorithm is terminated, the set with the largest number of points (inliers) is selected as the support for the best planar model found, the ground in our case.
Color segmentation: After removing the ground, our objective is to segment the remaining regions in the cloud. A RANSACbased method wont suffice for this task, because it assumes that all the objects in the scene can be mathematically modelled. Thus, we utilize Region Growing [Adams:1994:SRG:628313.628615] for this task. This method follows a flood fill approach. The method aims at selecting a set of homogeneous points by optimising for a given interregional constraint. A seed point is first selected. At each optimisation step, the surrounding of the seed point is iteratively allowed to grow by including more points into the computations of the local constraint. This process is iterated until this local constraint is satisfied. Typical constraints may include the Euclidean distance between the selected point and the seed point or local features, including geometric and photometric features. For more efficient computations, we utilise the previously colorcoded pointcloud based on the computed surface normals as a feature, and formulate our regional constraints using the Euclidean distance. This choice simply allows us to extract regions with homogeneous surface orientations. Algorithm 1 summarizes steps of Region Growing on 3D pointclouds, where
is a colorcoded pointcloud, is a color threshold, is the computed Euclidean distance between points and in the color space (), is the region list, is the neighbour finding function, and is the available point list, that it is initialized using all the points of .
Surface properties estimation: After dividing the cloud in clusters with similar geometric properties (Fig. 2.e), we estimate those properties that will be used to determine the mobility scoring of individual segments in the pointcloud. First, we compute the slope, which is a measure of steepness for planes or more in general for flat regions. First, we estimate the best plane that fits the points within the ith segment using leastsquares and RANSAC. The slope is then calculated as the angle between the fitted plane normal and the vector representing the normal vector of the removed ground segment. This can be simply computed as:
(7) 
In addition to the slope, our mobility function considers the surface roughness, which represents as a measure of the asperity of a certain surface. Surface roughness is widely used in mobile robotics navigation planning [shockparam13], because of its ability to limit the mobility of various robotic platforms. For humanoids, safe and stable foot step planning is highly correlated to terrain roughness.
There are many ways to estimate the roughness mostly because a canonical definition has not already formulated. We followed an approach already used in[Friedman9]. For each segment of the cloud we compute the roughness index as:
(8) 
where is the area of the segment, and is the area of the segment projection on the corresponding estimated plane, which was computed using RANSAC and leastsquares. To compute we utilize Delaunay triangulation. Given a set of vertices , such that and , and represents the vertex described by its point coordinates. The triangles of the surface are contained in the set , where , such that and represents a triangle defined by three vertices in . Thus, , and similarly , are computed using:
(9) 
where is the area of the jth triangle (), which is computed as half the magnitude of the cross product of the vectors ( and ) representing two adjacent sides of the triangle. Thus,
(10) 
Mobility map: The classification of the scene as a function of robots mobility is performed using a simple and intuitive approach. Every segment is assigned a score between 0 and 1, where 0 corresponds to untraversable surfaces while 1 corresponds to traversable surfaces that does not require any additional foot step planning, which is shown in table I. This mobility rule is very discrete, and hence limits its applications to multiple foot step configurations. In order to accommodate finer mobility decisions, we further improve the resolution of our decisions on the traversability of surfaces by applying Gaussian Process Regression (GPR) with a Squared Exponential (SE) covariance function [Rasmussen:2005:GPM:1162254] (see Fig. 3):
(11) 
where are input pairs containing surface features, namely the estimated slope and roughness, ,
is the variance (we assumed,
) and is the length scale which defined the smoothness of the mobility score (we assumed, ). Since we would like our mobility to be a function of both slope and roughness, we train our GPR model using two the inputs ( and ) with one output mobility score . Thus, we can obtain the score as an inference instance using our trained GPR model:(12) 
where, is the covariance matrix, to denote the vector of covariances between the test point and the training points , is the expected noise in the measured mobility , given the normals. The range of input dimensions was limited by the walking capabilities and mechanical limits of the robot.
Iv Experiments
In this Section we present supporting experimental results to demonstrate the performance of the proposed mobility map computation pipeline. The collected dataset comprised 3D depth maps and associated RGB images. Due to the limited mobility of the iCub robot and the sensitivity of the depth sensor to daylight, we tested our implementation only on data sets collected for indoor environments. For example, laboratory corridors with various objects being located along the way. The method of this paper was implemented using C++, and is available on GitHub^{1}^{1}1https://github.com/Nicogene/MobilityMapBuilder..
Iva RGBD Sensor Model
The validation of our system has been made through the projection of the 3D mobility maps on the images acquired by the RGB camera of Asus Xtion Pro Live(Fig. 4, 6). The mapping has been done using the following pinhole camera model,
(13) 
where are the image coordinate, are the 3D point coordinates, define the optical center, are the focal lengths and are the factory offsets between the IR sensor frame and the RGB camera frame.
IvB Free corridor
Fig. 4 shows mobility map estimation and pointcloud segmentation results in the image frame using our pipeline for the case of obstacle free path. Our motivation is to initially test our implementation of ground removal, since the performance of our method relies on it. Also, showing the results in the image frame provides us with a way to validate the accuracy of our implementation. On the other hand, Fig. 5 shows the verification of our segmentation results in 3D space.
IvC Corridor with obstacles
IvD Processing Requirements
The dataset was processed on an Intel Core i5 CPU@2.30GHz4. In table II we report the time required for each step of the algorithm measured on a sample cloud. Notice that the variations in the number of points processed at each step is due to modifications to the pointcloud done by the previous step, including sampling, outliers removal, and groud separation. This analysis demonstrates that the algorithm can be used to analyse a scene at the rate of two frames per second, which is suitable for foot step planning.
V Conclusion and Future work
This paper has presented an approach for scene segmentation and mobility estimation based on the analysis of 3D data acquired by a RGBD sensor. The method was used to successfully and efficiently classify real world indoor workspaces into traversable and untraversable regions. We have achieved a mobility map computation pipeline that runs at 2Hz. Our future work will investigate removing many existing assumptions, including our ground removal algorithm which assumes that the largest segment in any depth map corresponds to the ground. We will also be looking at integrating vision with the depth data to build more robust features for segmentation.
Comments
There are no comments yet.