Log In Sign Up

Curating Long-term Vector Maps

Autonomous service mobile robots need to consistently, accurately, and robustly localize in human environments despite changes to such environments over time. Episodic non-Markov Localization addresses the challenge of localization in such changing environments by classifying observations as arising from Long-Term, Short-Term, or Dynamic Features. However, in order to do so, EnML relies on an estimate of the Long-Term Vector Map (LTVM) that does not change over time. In this paper, we introduce a recursive algorithm to build and update the LTVM over time by reasoning about visibility constraints of objects observed over multiple robot deployments. We use a signed distance function (SDF) to filter out observations of short-term and dynamic features from multiple deployments of the robot. The remaining long-term observations are used to build a vector map by robust local linear regression. The uncertainty in the resulting LTVM is computed via Monte Carlo resampling the observations arising from long-term features. By combining occupancy-grid based SDF filtering of observations with continuous space regression of the filtered observations, our proposed approach builds, updates, and amends LTVMs over time, reasoning about all observations from all robot deployments in an environment. We present experimental results demonstrating the accuracy, robustness, and compact nature of the extracted LTVMs from several long-term robot datasets.


page 1

page 2

page 3

page 4

page 5

page 6


Long-Term Localization using Semantic Cues in Floor Plan Maps

Lifelong localization in a given map is an essential capability for auto...

Probabilistic Object Maps for Long-Term Robot Localization

Robots deployed in settings such as warehouses and parking lots must cop...

Sobi: An Interactive Social Service Robot for Long-Term Autonomy in Open Environments

Long-term autonomy in service robotics is a current research topic, espe...

An Idiotypic Immune Network as a Short Term Learning Architecture for Mobile Robots

A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approa...

Recurrent-OctoMap: Learning State-based Map Refinement for Long-Term Semantic Mapping with 3D-Lidar Data

This paper presents a novel semantic mapping approach, Recurrent-OctoMap...

Identifying robust landmarks in feature-based maps

To operate in an urban environment, an automated vehicle must be capable...

Two-Timescale Learning Using Idiotypic Behaviour Mediation For A Navigating Mobile Robot

A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approa...

I Introduction

Long-term autonomy is an essential capability for service mobile robots deployed in human environments. One challenging aspect of long-term autonomy in a human environment is robustness to changes in the environment.

Many approaches have been proposed to reason about a changing environment, including estimating the latest state of the environment [18, 13], estimating different environment configurations [2, 6], or modeling the dynamics of the environment [5, 16, 17, 1, 19]. However, in large environments, it may not be feasible to make the periodic observations required by these approaches. Therefore, we model observations in human environments as arising from three distinct types of features [4]: Dynamic Features (DFs) or moving objects such as people or carts, Short- Term Features (STFs) or movable objects such as tables or chairs, and Long-Term Features (LTFs) which persist over long periods of time, such as office walls or columns. Episodic non-Markov Localization (EnML) [4] simultaneously reasons about global localization information from LTFs, and local relative information from STFs. A key requirement to EnML is an estimate of the Long-Term Vector Map: the features in the environment that persist over time, represented in line segment or vector form (Fig. 1).

Fig. 1: Observations at different stages of the LTVM pipeline. In alphabetical order: raw data from all deployments, weights computed by the SDF, filtered data, final LTVM.

In this paper, we introduce an algorithm to build and update Long-Term Vector Maps indefinitely, utilizing observations from all deployments of all the robots in an environment. Our proposed algorithm filters out observations corresponding to DFs from a single deployment using a signed distance function (SDF) [9]. Merging the SDFs from multiple deployments then filters out the short-term features. Remaining observations correspond to LTFs, and are used to build a vector map via robust local linear regression. Uncertainty estimates of the resultant Long-Term Vector Map are calculated by a novel Monte Carlo uncertainty estimator. Our proposed approach thus consists of the following steps:

  1. Filter: Use the most recent observations to compute an SDF and discard points based on weights and values given by the SDF (Section IV).

  2. Line Extraction: Use greedy sequential local RANSAC [11] and non-linear least-squares fitting to extract line segments from the filtered observations (Section V).

  3. Estimate Feature Uncertainty: Compute segment endpoint covariance estimates via Monte Carlo resampling of the observations (Section VI).

  4. Map Update: Add, merge, and delete lines using a de-coupled scatter matrix representation [3] (Section VII).

Our approach takes advantage of the robust filtering provided by the SDF while avoiding dependency on a discrete world representation and grid resolution by representing LTFs as line segments in . Evaluation of our approach is detailed in section VIII. Across all metrics examined in this study, we find vector maps constructed via SDF filtering comparable or favorable to occupancy grid based approaches.

Ii Related Work

The problem of long-term robotic mapping has been studied extensively, with most algorithms relying on one of two dominant representations: occupancy grids [10, 14] and geometric or polygonal maps [8, 20]. Recently, work towards metric map construction algorithms that are able to cope with dynamic and short-term features has accelerated.

Most approaches fall into one of four categories: dynamics on occupancy grids, latest state estimation, ensemble state estimation, and observation filters.

One common approach models map dynamics on an occupancy grid using techniques such as learning non-stationary object models [5]

or modeling grid cells as Markov chains 

[16, 17]. Alternatively, motivated by the widely varying timescales at which certain aspects of an environment may change, some approaches seek to leverage these differences by maintaining information relating to multiple timescales within one or more occupancy grids [1, 19].

Other approaches estimate the latest state of the world, including dynamic and short-term features. Dynamic pose-graph SLAM [18] can be used in low-dynamic environments, and spectral analysis techniques [13] attempt to predict future environment states on arbitrary timescales.

Instead of estimating solely the latest state, some approaches estimate environment configurations based on an ensemble of recent states. Temporal methods such as recency weighted averaging [2] determine what past information is still relevant, and other techniques such as learning local grid map configurations [6] borrow more heavily from the dynamic occupancy grid approach.

Another approach filters out all observations corresponding to non-LTFs, resulting in a “blueprint” map. Previous algorithms have had some success filtering dynamic objects, specifically people [12], but have struggled to differentiate between STFs and LTFs. Furthermore, all of the methods mentioned above rely on an occupancy grid map representation, whereas our method produces a polygonal, vector map.

Iii Long-Term Vector Mapping

Long-term vector mapping runs iteratively over multiple robot deployments, operating on the union of all registered laser observations from the given deployment, aligned to the same frame. We call these unions composite scans and denote them , where is a single laser scan. Composite scans are processed in batch after each deployment, and may be generated via Episodic non-Markov Localization  [4] or a similar localization algorithm.

After each deployment, a short-term signed distance function (ST-SDF) given by a set of weights and values is computed over the area explored by the robot by considering the composite scan . The ST-SDF is then used to update the long-term SDF (LT-SDF), given by and , which aggregates information over all previous deployments. The updated LT-SDF is denoted and , and is used to determine a filtered composite scan , containing observations corresponding exclusivley to LTFs. We call this process SDF-filtering. Note that after only one deployment, it is not possible to distinguish STFs from LTFs. However, as the number of deployments increases, the number of observations corresponding to STFs in approaches zero.

The next step in our algorithm is feature (line) extraction. Line extraction does not rely on any persistent data, using only the filtered composite scan to extract a set of lines . Each is defined by endpoints and , a scatter matrix , a center of mass , and a mass .

Uncertainties in the endpoints of each line segment are computed by analyzing a distribution of possible endpoints generated via Monte Carlo resampling the initial set of observations and subsequently refitting the samples. For a given line the uncertainty estimation step takes as input the endpoints and and a set of inliers , and produces covariance estimates and for these endpoints.

The long-term vector map is updated based on the newest set of extracted lines and the current SDF. Similar lines are merged into a single line, obsolete lines are deleted, and uncertainties are recomputed. Thus, this step takes the results from the most recent deployment, , as well as the existing map given by , , and , and outputs an updated map, . Fig. 2 presents an overview of the algorithm as it operates on data from a single deployment.

Fig. 2: Flow of information during processing of a single deployment, deployment . Boxes 1, 2, and 3 correspond to SDF filtering, line finding and uncertainty estimation, and map updating, respectively.

Iv SDF-based Filtering

Let be a composite scan, where is a single observation providing depth and angle with respect to the laser’s frame and the robot pose at which and were recorded. That is, . The filtering problem is to determine such that all originate from LTFs and all originate from STFs and DFs.

Determining is a three-step process. First, we construct a ST-SDF over the area observed by the robot during the deployment corresponding to . Next, we update the LT-SDF based on the ST-SDF. Finally, we use the updated LT-SDF to decide which observations correspond to LTFs.

Iv-a SDF Construction

SDFs operate over discretized space, so we create a grid of resolution containing all . Each pixel in the SDF maintains two measurements, a value and a weight . For every observation , all pixels that lie along the given laser ray update their respective values according to and , where and are the current distance and weight values, respectively, and and are the distance and weight values for the given reading . and are given by


where and is the signed distance from the range reading to the pixel, with pixels beyond the range reading having and those in front having . and are parameters that depend on the accuracy of the sensor. Pixels that are located along the ray but are more than beyond the detection point are not updated since we do not know whether or not they are occupied. Fig. 3 illustrates a single pixel update during SDF construction. Note that this process is parallelizable since weights and values for each pixel may be calculated independently. Thus, the SDF construction step, outlined in Algorithm 1, runs in time proportional to .

Fig. 3: SDF construction from a single laser ray. Pixels along the laser ray are updated if they are free, or if they are just beyond the obstacle. Over many ray casts, pixels may be marked as belonging to more than one category (boundary, interior, exterior) due to sensor noise. The SDF’s main advantage is that it ignores erroneous readings.

The intuition for the weight and value functions comes from two observations. First, capping by helps keep our SDF robust against anomalous readings. Second, follows common laser noise models. Other choices for and may yield similar results; however, these choices have already been successfully adopted elsewhere  [7].

Iv-B SDF Update

Once the ST-SDF is calculated we normalize the weights:


Here, is the maximum weight over all pixels and is a dynamic feature threshold. The LT-SDF is then updated as the weighted average of the ST-SDF and the LT-SDF, i.e. WeightedAverage. Pixel weights loosely map to our confidence about the associated value, and values are an estimate for how far a pixel is from the nearest surface.

Iv-C SDF Filter

Given an up-to-date SDF, we determine

using bicubic interpolation on the position of each observation

and the updated LT-SDF. The filtering criteria are if BicubicInterpolation and BicubicInterpolation, where is a STF threshold and is a threshold that filters observations believed to be far from the surface of any object. Lines 11-15 in Algorithm 1 detail the filtering process. Thus, after running SDF Filtering(), we obtain a filtered composite scan , used to find LTFs. Additionally, SDF Filtering updates the LT-SDF needed for the map update step.

1:Input: Raw composite scan , long-term SDF weights and values
2:Output: Filtered composite scan , updated SDF weights and values
3: empty image
4: empty image
5:for all range readings  do
6:      valueUpdate
7:      weightUpdate
8: normalize
11:for all range readings  do
12:      BicubicInterpolation
13:      BicubicInterpolation
14:     if  and  then
Algorithm 1 SDF Filtering

V Line Extraction

Given , extracting lines requires solving two problems. First, for each , a set of observations must belong to line . Second, endpoints and defining line must be found. We take a greedy approach, utilizing sequential local RANSAC to provide plausible initial estimates for line endpoints and . Points whose distance to the line segment is less than , where is proportional to the noise of the laser, are considered members of the inlier set . Once a hypothetical model and set of inliers with center of mass have been suggested by RANSAC (lines 5-7 in Algorithm 2), we perform a non-linear least-squares optimization using the cost function


The new endpoints and are then used to find a new set of inliers . is the projection of a point onto the infinite line containing and . Iteration terminates when , where is a convergence threshold.

This cost function has several desireable properties. First, when all points lie between the two endpoints, the orientation of the line will be identical to the least-squares solution. Second, when many points lie beyond the ends of the line segment, the endpoints are pulled outward, allowing the line to grow and the region in which we accept inliers to expand. Last, the term allows the line to shrink in the event that the non-linear least-squares solver overshoots the appropriate endpoint. Once a set of lines has been determined by running Line Extraction() we complete our analysis of a single deployment by estimating our uncertainty in feature locations.

1:Data: Filtered composite scan
2:Result: Set of new lines
4:while  not empty do
5:     Propose via RANSAC
6:      findInliers(, )
7:     , fitSegment()
8:     while  do
9:          findInliers(, )
11:         , fitSegment()      
Algorithm 2 Line Extraction

Vi Uncertainty Estimation

Given a line , with endpoints and and a set of inliers , uncertainty estimation produces covariance estimates and for and , respectively. To estimate and we resample using the covariance of each range reading. is derived based on the sensor noise model in  [15]. In world coordinates is given by


where and

are standard deviations for range and angle measurements for the sensor, respectively. Resampling

times, as shown in Fig. 4, produces a distribution of and . We then construct a scatter matrix from the set of hypothetical endpoints, and compute covariances and by using the SVD of .

The Monte Carlo approach, detailed in Algorithm 3, is motivated by the following factors: 1) There is no closed-form solution to covariance for endpoints of a line segment. 2) A piece-wise cost function makes it difficult to calculate the Jacobian reliably. 3) Resampling is easy since we already have and can calculate .

1:Input: Set of new lines , number of samples
2:Output: Set of endpoint covariance estiamates ,
4:for all  do
5:     ,
6:      inliers associated with
7:     for  iterations do
9:         for all  do
10:               sample
12:         , fitSegment()
13:         ,      
14:      estimateCovariance
15:      estimateCovariance
Algorithm 3 Feature Uncertainty Estimation
Fig. 4: Monte Carlo uncertainty estimation of feature endpoints. Given an initial set of observations and their corresponding covariances represented by ellipses, we resample the observations and fit a line times. The resulting distribution of endpoints is used to estimate endpoint covariance.

Vii Map update

Given the current map , the LT-SDF, and a set of new lines, , where every is specified by a set of endpoints and , a set of covariance matrices and , a center of mass , a mass , and a partial scatter matrix , the update step produces an updated map .

The map updates are divided into two steps outlined in Algorithm 4. First, we check if all current lines in the map are contained within high-weight regions. That is, we check that the weight of every pixel a given line passes through satisfies . If a line lies entirely within a high-weight region, it remains unchanged. Similarly, if a line lies entirely outside all high-weight regions, it is removed. If only part of a line remains within a high-weight region, we can lower bound the mass of the remaining region by , where and are the extreme points of the segment remaining in the high-weight region (line 11). We then resample points uniformly along the line between and , adding Gaussian noise in the perpendicular direction with a standard deviation based on the sensor noise model. The points are then fit using the cost function given in (4). The sampling and fitting process is executed a fixed number of times (lines 12-16), and the distribution of fit results is then used to compute covariance estimates for the new line.

The second part of the update involves merging new lines, , with lines from the current map, . This process consists of first finding candidates for merging (lines 22-23) and then computing the updated parameters and covariances (lines 24-25). Note that the mapping from new lines to existing lines may be onto, but without loss of generality we consider merging lines pairwise. Because our parameterization uses endpoints, lines which ought to be merged may not have endpoints near one another. So, we project and from onto , resulting in and , respectively. and are merged if they pass the chi-squared test:


where , and is given by a linear interpolation of the covariance estimates for and determined by where is projected along .

We would like our merged LTFs and their uncertainty measures to remain faithful to the entire observation history. However, storing every observation is infeasible. Instead, our method implicitly stores observation histories using decoupled scatter matrices  [3], reducing the process of merging lines with potentially millions of supporting observations to a single matrix addition.

The orientation of the new line is found via eigenvalue decomposition of the associated scatter matrix, and new endpoints are found by projecting the endpoints from the original lines onto the new line and taking the extrema. Thus, after executing

Map Update(), we have a set of vectors in corresponding to LTFs.

Table I displays the major parameters and physical constants needed for long-term vector mapping.

Fig. 5: Raw data, filtered data, and resultant maps for MIT (a-c), AMRL (d-f), Wall-occlusion (g-i) and Hallway-occlusion (j-l) datasets. Data shown in the left and center columns are aggregates of all deployments and are never actually stored while the algorithm is operating. The last column is the resultant LTVM which is stored in full, requiring only a few KB. Note the absence of STFs from the final maps, as well as the presence of doorways. In the MIT dataset, some doors were open only once over the entire deployment history. The Hallway-occlusion dataset demonstrates the algorithm’s robustness to STFs, as it is able to distinguish the column in the hallway even as it is partially or completely occluded on every deployment.
Name Symbol Domain Our Value
DF Threshold
STF Threshold
Line Merge Criteria
Sensor Noise Threshold meters
RANSAC Inlier Criteria meters
Line Fit Convergence meters
SDF Max Value meters
TABLE I: Thresholds and physical constants
1:Input: Set of new lines , set of long-term lines , long-term SDF ,
2:Output: Updated long-term lines
3:for all  do
4:     trace along
5:     if  exists entirely outside high-weight regions then
7:     if  exists partially outside high-weight regions then
10:         for all remaining parts of ,  do
11:               getEnpoints()
12:              for  iterations do
13:                   regenerateInliers()
14:                   fitSegment()
17:              estimateCovariance()
20:for all  do
21:     for all  do
23:         if  for  then
24:               merge
26:         else
Algorithm 4 Map Update

Viii Results

To demonstrate the effectiveness of our algorithm we mapped 4 different environments, including standard data. Data was also collected from three separate environments, in addition to the MIT Reading Room: AMRL, Wall-occlusion, and Hallway-occlusion, using a mobile robot platform and a Hokuyo UST-10LX laser range finder. The AMRL, Wall, and Hall datasets consist of 8, 5, and 5 deployments, respectively. MIT Reading Room contains 20 deployments. Each deployment contains hundreds of scans, corresponding to hundreds of thousands of observations. Datasets are intended to display a high amount of clutter and variability, typical of scenes mobile robots need to contend with.

The MIT and AMRL data sets are intended to test the accuracy of our algorithm over larger numbers of deployments. Both rooms contain multiple doors, walls of varying lengths, and achieve many different configurations, testing our ability to accurately identify LTF positions. The Wall- and Hallway-occlusion datasets are intended to measure our algorithm’s robustness in environments where LTFs are heavily occluded.

Quantitatively we are concerned with accuracy and robustness, defining accuracy as pairwise feature agreement, where inter-feature distances are preserved, and feature-environment correspondance, where vectors in the map correspond to LTFs in the environment. Vectors should also be absent from areas where there are no long-term features such as doorways. Metric ground truth is established by either measuring wall lengths or hand-fitting the parts of the data we know correspond to LTFs. Robustness refers to a map’s ability to deal with large numbers of STFs and lack of degradation over time.

Over all datasets, our approach correctly identifies all 7 doorways (4 in MIT, 2 in AMRL, 1 in Hallway), and correctly ignores all 73 STFs. Using the AMRL and MIT datasets, we find the average difference between pair-wise feature separation in the generated map versus ground truth to be on the order of 2cm. Our line extraction method yields MSE values in the order of meters, while marching squares yields a MSE of roughly , about three times as large. Additionally, marching squares yields over features while LTVM maintains on the order of LTFs. Furthermore, the maps do not degrade over the timescales tested in this paper, with no noticeable difference in average pair-wise feature disagreement between maps generated after one deployment and those considering all deployments. Fig. 5 displays qualitative results for all environments.

Ix Conclusion

In this paper, we introduced an SDF-based approach to filter laser scans over multiple deployments in order to build and maintain a long-term vector map. We experimentally showed the accuracy and robustness of the generated maps in a variety of different environments and further evaluated the effectiveness of long-term vector mapping compared to more standard, occupancy grid techniques. Currently this approach processes observations in batch post deployment, but modifying the algorithm to run in an online setting seems an appropriate next step.

X Acknowledgements

The authors would like to thank Manuela Veloso from Carnegie Mellon University for providing the CoBot4 robot used to collect the AMRL datasets, and to perform experiments at University of Massachusetts Amherst.


  • [1] D. Arbuckle, A. Howard, and M. Mataric Temporal occupancy grids: a method for classifying the spatio-temporal properties of the environment. In IROS, 2002., Cited by: §I, §II.
  • [2] P. Biber (2005) Dynamic maps for long-term operation of mobile service robots. In RSS, Cited by: §I, §II.
  • [3] J. Biswas and M. Veloso (2012) Planar polygon extraction and merging from depth images. In IROS, Cited by: item 4, §VII.
  • [4] J. Biswas and M. Veloso (2014) Episodic non-markov localization: reasoning about short-term and long-term features. In ICRA, Cited by: §I, §III.
  • [5] R. Biswas, B. Limketkai, S. Sanner, and S. Thrun (2002) Towards object mapping in non-stationary environments with mobile robots. In IROS, Cited by: §I, §II.
  • [6] W. Burgard, C. Stachniss, and D. Hähnel (2007) Autonomous navigation in dynamic environments. pp. 3–28. Cited by: §I, §II.
  • [7] E. Bylow, J. Sturm, C. Kerl, F. Kahl, and D. Cremers (2013) Real-time camera tracking and 3d reconstruction using signed distance functions. In RSS, Cited by: §IV-A.
  • [8] R. Chatila and J. Laumond (1985) Position referencing and consistent world modeling for mobile robots. In Robotics and Automation., Cited by: §II.
  • [9] B. Curless and M. Levoy (1996) A volumetric method for building complex models from range images. In ACM, Cited by: §I.
  • [10] A. Elfes (1989) Using occupancy grids for mobile robot perception and navigation. Computer. Cited by: §II.
  • [11] M. A. Fischler and R. C. Bolles (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM. Cited by: item 2.
  • [12] D. Hahnel, D. Schulz, and W. Burgard (2002) Map building with mobile robots in populated environments. In IROS, Cited by: §II.
  • [13] T. Krajnik, J. P. Fentanes, G. Cielniak, C. Dondrup, and T. Duckett (2014) Spectral analysis for long-term robotic mapping. In ICRA, Cited by: §I, §II.
  • [14] H. P. Moravec and D. W. Cho (1989) A Bayesean method for certainty grids. In AAAI Spring Symposium on Robot Navigation, Cited by: §II.
  • [15] S. T. Pfister, S. I. Roumeliotis, and J. W. Burdick (2003) Weighted line fitting algorithms for mobile robot map building and efficient data representation. In ICRA, Cited by: §VI.
  • [16] J. Saarinen, H. Andreasson, and A. J. Lilienthal Independent markov chain occupancy grid maps for representation of dynamic environment. In IROS, 2012, Cited by: §I, §II.
  • [17] G. D. Tipaldi, D. Meyer-Delius, and W. Burgard (2013) Lifelong localization in changing environments. IJRR. Cited by: §I, §II.
  • [18] A. Walcott-Bryant, M. Kaess, H. Johannsson, and J. J. Leonard (2012) Dynamic pose graph slam: long-term mapping in low dynamic environments. In IROS, Cited by: §I, §II.
  • [19] D. F. Wolf and G. S. Sukhatme (2005) Mobile robot simultaneous localization and mapping in dynamic environments. Autonomous Robots. Cited by: §I, §II.
  • [20] L. Zhang and B. K. Ghosh (2000) Line segment based map building and localization using 2d laser rangefinder. In ICRA, Cited by: §II.