Long-term autonomy is an essential capability for service mobile robots deployed in human environments. One challenging aspect of long-term autonomy in a human environment is robustness to changes in the environment.
Many approaches have been proposed to reason about a changing environment, including estimating the latest state of the environment [18, 13], estimating different environment configurations [2, 6], or modeling the dynamics of the environment [5, 16, 17, 1, 19]. However, in large environments, it may not be feasible to make the periodic observations required by these approaches. Therefore, we model observations in human environments as arising from three distinct types of features : Dynamic Features (DFs) or moving objects such as people or carts, Short- Term Features (STFs) or movable objects such as tables or chairs, and Long-Term Features (LTFs) which persist over long periods of time, such as office walls or columns. Episodic non-Markov Localization (EnML)  simultaneously reasons about global localization information from LTFs, and local relative information from STFs. A key requirement to EnML is an estimate of the Long-Term Vector Map: the features in the environment that persist over time, represented in line segment or vector form (Fig. 1).
In this paper, we introduce an algorithm to build and update Long-Term Vector Maps indefinitely, utilizing observations from all deployments of all the robots in an environment. Our proposed algorithm filters out observations corresponding to DFs from a single deployment using a signed distance function (SDF) . Merging the SDFs from multiple deployments then filters out the short-term features. Remaining observations correspond to LTFs, and are used to build a vector map via robust local linear regression. Uncertainty estimates of the resultant Long-Term Vector Map are calculated by a novel Monte Carlo uncertainty estimator. Our proposed approach thus consists of the following steps:
Filter: Use the most recent observations to compute an SDF and discard points based on weights and values given by the SDF (Section IV).
Line Extraction: Use greedy sequential local RANSAC  and non-linear least-squares fitting to extract line segments from the filtered observations (Section V).
Estimate Feature Uncertainty: Compute segment endpoint covariance estimates via Monte Carlo resampling of the observations (Section VI).
Map Update: Add, merge, and delete lines using a de-coupled scatter matrix representation  (Section VII).
Our approach takes advantage of the robust filtering provided by the SDF while avoiding dependency on a discrete world representation and grid resolution by representing LTFs as line segments in . Evaluation of our approach is detailed in section VIII. Across all metrics examined in this study, we find vector maps constructed via SDF filtering comparable or favorable to occupancy grid based approaches.
Ii Related Work
The problem of long-term robotic mapping has been studied extensively, with most algorithms relying on one of two dominant representations: occupancy grids [10, 14] and geometric or polygonal maps [8, 20]. Recently, work towards metric map construction algorithms that are able to cope with dynamic and short-term features has accelerated.
Most approaches fall into one of four categories: dynamics on occupancy grids, latest state estimation, ensemble state estimation, and observation filters.
One common approach models map dynamics on an occupancy grid using techniques such as learning non-stationary object models 
or modeling grid cells as Markov chains[16, 17]. Alternatively, motivated by the widely varying timescales at which certain aspects of an environment may change, some approaches seek to leverage these differences by maintaining information relating to multiple timescales within one or more occupancy grids [1, 19].
Other approaches estimate the latest state of the world, including dynamic and short-term features. Dynamic pose-graph SLAM  can be used in low-dynamic environments, and spectral analysis techniques  attempt to predict future environment states on arbitrary timescales.
Instead of estimating solely the latest state, some approaches estimate environment configurations based on an ensemble of recent states. Temporal methods such as recency weighted averaging  determine what past information is still relevant, and other techniques such as learning local grid map configurations  borrow more heavily from the dynamic occupancy grid approach.
Another approach filters out all observations corresponding to non-LTFs, resulting in a “blueprint” map. Previous algorithms have had some success filtering dynamic objects, specifically people , but have struggled to differentiate between STFs and LTFs. Furthermore, all of the methods mentioned above rely on an occupancy grid map representation, whereas our method produces a polygonal, vector map.
Iii Long-Term Vector Mapping
Long-term vector mapping runs iteratively over multiple robot deployments, operating on the union of all registered laser observations from the given deployment, aligned to the same frame. We call these unions composite scans and denote them , where is a single laser scan. Composite scans are processed in batch after each deployment, and may be generated via Episodic non-Markov Localization  or a similar localization algorithm.
After each deployment, a short-term signed distance function (ST-SDF) given by a set of weights and values is computed over the area explored by the robot by considering the composite scan . The ST-SDF is then used to update the long-term SDF (LT-SDF), given by and , which aggregates information over all previous deployments. The updated LT-SDF is denoted and , and is used to determine a filtered composite scan , containing observations corresponding exclusivley to LTFs. We call this process SDF-filtering. Note that after only one deployment, it is not possible to distinguish STFs from LTFs. However, as the number of deployments increases, the number of observations corresponding to STFs in approaches zero.
The next step in our algorithm is feature (line) extraction. Line extraction does not rely on any persistent data, using only the filtered composite scan to extract a set of lines . Each is defined by endpoints and , a scatter matrix , a center of mass , and a mass .
Uncertainties in the endpoints of each line segment are computed by analyzing a distribution of possible endpoints generated via Monte Carlo resampling the initial set of observations and subsequently refitting the samples. For a given line the uncertainty estimation step takes as input the endpoints and and a set of inliers , and produces covariance estimates and for these endpoints.
The long-term vector map is updated based on the newest set of extracted lines and the current SDF. Similar lines are merged into a single line, obsolete lines are deleted, and uncertainties are recomputed. Thus, this step takes the results from the most recent deployment, , as well as the existing map given by , , and , and outputs an updated map, . Fig. 2 presents an overview of the algorithm as it operates on data from a single deployment.
Iv SDF-based Filtering
Let be a composite scan, where is a single observation providing depth and angle with respect to the laser’s frame and the robot pose at which and were recorded. That is, . The filtering problem is to determine such that all originate from LTFs and all originate from STFs and DFs.
Determining is a three-step process. First, we construct a ST-SDF over the area observed by the robot during the deployment corresponding to . Next, we update the LT-SDF based on the ST-SDF. Finally, we use the updated LT-SDF to decide which observations correspond to LTFs.
Iv-a SDF Construction
SDFs operate over discretized space, so we create a grid of resolution containing all . Each pixel in the SDF maintains two measurements, a value and a weight . For every observation , all pixels that lie along the given laser ray update their respective values according to and , where and are the current distance and weight values, respectively, and and are the distance and weight values for the given reading . and are given by
where and is the signed distance from the range reading to the pixel, with pixels beyond the range reading having and those in front having . and are parameters that depend on the accuracy of the sensor. Pixels that are located along the ray but are more than beyond the detection point are not updated since we do not know whether or not they are occupied. Fig. 3 illustrates a single pixel update during SDF construction. Note that this process is parallelizable since weights and values for each pixel may be calculated independently. Thus, the SDF construction step, outlined in Algorithm 1, runs in time proportional to .
The intuition for the weight and value functions comes from two observations. First, capping by helps keep our SDF robust against anomalous readings. Second, follows common laser noise models. Other choices for and may yield similar results; however, these choices have already been successfully adopted elsewhere .
Iv-B SDF Update
Once the ST-SDF is calculated we normalize the weights:
Here, is the maximum weight over all pixels and is a dynamic feature threshold. The LT-SDF is then updated as the weighted average of the ST-SDF and the LT-SDF, i.e. WeightedAverage. Pixel weights loosely map to our confidence about the associated value, and values are an estimate for how far a pixel is from the nearest surface.
Iv-C SDF Filter
Given an up-to-date SDF, we determine
using bicubic interpolation on the position of each observationand the updated LT-SDF. The filtering criteria are if BicubicInterpolation and BicubicInterpolation, where is a STF threshold and is a threshold that filters observations believed to be far from the surface of any object. Lines 11-15 in Algorithm 1 detail the filtering process. Thus, after running SDF Filtering(), we obtain a filtered composite scan , used to find LTFs. Additionally, SDF Filtering updates the LT-SDF needed for the map update step.
V Line Extraction
Given , extracting lines requires solving two problems. First, for each , a set of observations must belong to line . Second, endpoints and defining line must be found. We take a greedy approach, utilizing sequential local RANSAC to provide plausible initial estimates for line endpoints and . Points whose distance to the line segment is less than , where is proportional to the noise of the laser, are considered members of the inlier set . Once a hypothetical model and set of inliers with center of mass have been suggested by RANSAC (lines 5-7 in Algorithm 2), we perform a non-linear least-squares optimization using the cost function
The new endpoints and are then used to find a new set of inliers . is the projection of a point onto the infinite line containing and . Iteration terminates when , where is a convergence threshold.
This cost function has several desireable properties. First, when all points lie between the two endpoints, the orientation of the line will be identical to the least-squares solution. Second, when many points lie beyond the ends of the line segment, the endpoints are pulled outward, allowing the line to grow and the region in which we accept inliers to expand. Last, the term allows the line to shrink in the event that the non-linear least-squares solver overshoots the appropriate endpoint. Once a set of lines has been determined by running Line Extraction() we complete our analysis of a single deployment by estimating our uncertainty in feature locations.
Vi Uncertainty Estimation
Given a line , with endpoints and and a set of inliers , uncertainty estimation produces covariance estimates and for and , respectively. To estimate and we resample using the covariance of each range reading. is derived based on the sensor noise model in . In world coordinates is given by
are standard deviations for range and angle measurements for the sensor, respectively. Resamplingtimes, as shown in Fig. 4, produces a distribution of and . We then construct a scatter matrix from the set of hypothetical endpoints, and compute covariances and by using the SVD of .
The Monte Carlo approach, detailed in Algorithm 3, is motivated by the following factors: 1) There is no closed-form solution to covariance for endpoints of a line segment. 2) A piece-wise cost function makes it difficult to calculate the Jacobian reliably. 3) Resampling is easy since we already have and can calculate .
Vii Map update
Given the current map , the LT-SDF, and a set of new lines, , where every is specified by a set of endpoints and , a set of covariance matrices and , a center of mass , a mass , and a partial scatter matrix , the update step produces an updated map .
The map updates are divided into two steps outlined in Algorithm 4. First, we check if all current lines in the map are contained within high-weight regions. That is, we check that the weight of every pixel a given line passes through satisfies . If a line lies entirely within a high-weight region, it remains unchanged. Similarly, if a line lies entirely outside all high-weight regions, it is removed. If only part of a line remains within a high-weight region, we can lower bound the mass of the remaining region by , where and are the extreme points of the segment remaining in the high-weight region (line 11). We then resample points uniformly along the line between and , adding Gaussian noise in the perpendicular direction with a standard deviation based on the sensor noise model. The points are then fit using the cost function given in (4). The sampling and fitting process is executed a fixed number of times (lines 12-16), and the distribution of fit results is then used to compute covariance estimates for the new line.
The second part of the update involves merging new lines, , with lines from the current map, . This process consists of first finding candidates for merging (lines 22-23) and then computing the updated parameters and covariances (lines 24-25). Note that the mapping from new lines to existing lines may be onto, but without loss of generality we consider merging lines pairwise. Because our parameterization uses endpoints, lines which ought to be merged may not have endpoints near one another. So, we project and from onto , resulting in and , respectively. and are merged if they pass the chi-squared test:
where , and is given by a linear interpolation of the covariance estimates for and determined by where is projected along .
We would like our merged LTFs and their uncertainty measures to remain faithful to the entire observation history. However, storing every observation is infeasible. Instead, our method implicitly stores observation histories using decoupled scatter matrices , reducing the process of merging lines with potentially millions of supporting observations to a single matrix addition.
The orientation of the new line is found via eigenvalue decomposition of the associated scatter matrix, and new endpoints are found by projecting the endpoints from the original lines onto the new line and taking the extrema. Thus, after executingMap Update(), we have a set of vectors in corresponding to LTFs.
Table I displays the major parameters and physical constants needed for long-term vector mapping.
|Line Merge Criteria|
|Sensor Noise Threshold||meters|
|RANSAC Inlier Criteria||meters|
|Line Fit Convergence||meters|
|SDF Max Value||meters|
To demonstrate the effectiveness of our algorithm we mapped 4 different environments, including standard data. Data was also collected from three separate environments, in addition to the MIT Reading Room: AMRL, Wall-occlusion, and Hallway-occlusion, using a mobile robot platform and a Hokuyo UST-10LX laser range finder. The AMRL, Wall, and Hall datasets consist of 8, 5, and 5 deployments, respectively. MIT Reading Room contains 20 deployments. Each deployment contains hundreds of scans, corresponding to hundreds of thousands of observations. Datasets are intended to display a high amount of clutter and variability, typical of scenes mobile robots need to contend with.
The MIT and AMRL data sets are intended to test the accuracy of our algorithm over larger numbers of deployments. Both rooms contain multiple doors, walls of varying lengths, and achieve many different configurations, testing our ability to accurately identify LTF positions. The Wall- and Hallway-occlusion datasets are intended to measure our algorithm’s robustness in environments where LTFs are heavily occluded.
Quantitatively we are concerned with accuracy and robustness, defining accuracy as pairwise feature agreement, where inter-feature distances are preserved, and feature-environment correspondance, where vectors in the map correspond to LTFs in the environment. Vectors should also be absent from areas where there are no long-term features such as doorways. Metric ground truth is established by either measuring wall lengths or hand-fitting the parts of the data we know correspond to LTFs. Robustness refers to a map’s ability to deal with large numbers of STFs and lack of degradation over time.
Over all datasets, our approach correctly identifies all 7 doorways (4 in MIT, 2 in AMRL, 1 in Hallway), and correctly ignores all 73 STFs. Using the AMRL and MIT datasets, we find the average difference between pair-wise feature separation in the generated map versus ground truth to be on the order of 2cm. Our line extraction method yields MSE values in the order of meters, while marching squares yields a MSE of roughly , about three times as large. Additionally, marching squares yields over features while LTVM maintains on the order of LTFs. Furthermore, the maps do not degrade over the timescales tested in this paper, with no noticeable difference in average pair-wise feature disagreement between maps generated after one deployment and those considering all deployments. Fig. 5 displays qualitative results for all environments.
In this paper, we introduced an SDF-based approach to filter laser scans over multiple deployments in order to build and maintain a long-term vector map. We experimentally showed the accuracy and robustness of the generated maps in a variety of different environments and further evaluated the effectiveness of long-term vector mapping compared to more standard, occupancy grid techniques. Currently this approach processes observations in batch post deployment, but modifying the algorithm to run in an online setting seems an appropriate next step.
The authors would like to thank Manuela Veloso from Carnegie Mellon University for providing the CoBot4 robot used to collect the AMRL datasets, and to perform experiments at University of Massachusetts Amherst.
-  Temporal occupancy grids: a method for classifying the spatio-temporal properties of the environment. In IROS, 2002., Cited by: §I, §II.
-  (2005) Dynamic maps for long-term operation of mobile service robots. In RSS, Cited by: §I, §II.
-  (2012) Planar polygon extraction and merging from depth images. In IROS, Cited by: item 4, §VII.
-  (2014) Episodic non-markov localization: reasoning about short-term and long-term features. In ICRA, Cited by: §I, §III.
-  (2002) Towards object mapping in non-stationary environments with mobile robots. In IROS, Cited by: §I, §II.
-  (2007) Autonomous navigation in dynamic environments. pp. 3–28. Cited by: §I, §II.
-  (2013) Real-time camera tracking and 3d reconstruction using signed distance functions. In RSS, Cited by: §IV-A.
-  (1985) Position referencing and consistent world modeling for mobile robots. In Robotics and Automation., Cited by: §II.
-  (1996) A volumetric method for building complex models from range images. In ACM, Cited by: §I.
-  (1989) Using occupancy grids for mobile robot perception and navigation. Computer. Cited by: §II.
-  (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM. Cited by: item 2.
-  (2002) Map building with mobile robots in populated environments. In IROS, Cited by: §II.
-  (2014) Spectral analysis for long-term robotic mapping. In ICRA, Cited by: §I, §II.
-  (1989) A Bayesean method for certainty grids. In AAAI Spring Symposium on Robot Navigation, Cited by: §II.
-  (2003) Weighted line fitting algorithms for mobile robot map building and efficient data representation. In ICRA, Cited by: §VI.
-  Independent markov chain occupancy grid maps for representation of dynamic environment. In IROS, 2012, Cited by: §I, §II.
-  (2013) Lifelong localization in changing environments. IJRR. Cited by: §I, §II.
-  (2012) Dynamic pose graph slam: long-term mapping in low dynamic environments. In IROS, Cited by: §I, §II.
-  (2005) Mobile robot simultaneous localization and mapping in dynamic environments. Autonomous Robots. Cited by: §I, §II.
-  (2000) Line segment based map building and localization using 2d laser rangefinder. In ICRA, Cited by: §II.