I Introduction
This paper explores the role of topological understanding and the concomitant benefits of such an understanding to the SLAM framework. Figure1 shows an erroneous Pose Graph labelled ‘a’, while the topological graph is shown labelled as ‘b’ in the same figure succinctly. Each node in the is labelled by the Deep Convolutional Network. The is converted to a Manhattan Graph wherein the Manhattan properties of the nodes (length or width of the topology) and edges are gleaned from the . While the facilitate seamless loop detection between a pair of Manhattan nodes, such relations when integrated with a backend SLAM framework, enables recovery of an optimized posegraph and corresponding map. The crux of the paper lies in detailing the framework and its efficacy in challenging real world settings of two different warehouses.
There have been a number of works in this area and a detailed review of such methods can be seen in [garcia2015vision]. Prominent and well cited amongst these include [ulrich2000appearance, sunderhauf2012switchable, pronobis2006discriminative, ranganathan2006rao, kosecka2003qualitative]. Most of these methods are focused exclusively on vision based loop detection with invariant descriptors. Many relate to an individual image as a distinct topology of the scene without relating such nodes to a metalevel label such as a rackspace, corridor, intersection etc. Some seminal works as [agarwal2013robust] show recovery from wrong loop closures in a Manhattan environment though they don’t discuss how the original MG is constructed and how such relations can be integrated into a backend SLAM. [ranganathan2006bayesian]
shows how a Bayesian inference over topologies can be performed to obtain more accurate topological maps. However, the topological constructs are at a local image level than at a larger metalevel such as in this paper. In other words,
[ranganathan2006bayesian] does not entertain notions of metalevel topological labels that go beyond an immediate lowerlevel topology restricted to the scene seen by the robot.In this paper, we distinguish ourselves by portraying how higher level/meta level topological constructs that go beyond an immediate frame/scene and the relations that they enjoy amongst them percolate to a lower level posegraph and elevate their metric relations. In fact, we recover close to ground truth floor plans from a highly disorganized map at the start. This is the essential contribution of the paper. In addition, the following constitute our contributions:

A deep convolutional network capable of learning warehouse topologies and ablation studies over the same.

A Siamese Neural Network based relational classifier which resolves topological element ambiguity and helps achieve an accurate pose graph purely based on Topological relations. Ablation studies are performed to signify the effectiveness of the classifier.

We showcase a backend SLAM framework that integrates loop closure relations from an intermediate level Manhattan Graph to the lowest level Pose Graph and elevate a disoriented unoptimized map to a structured optimized map which closely resembles the floor plan of the warehouse. Apart from the loop closure relations, the SLAM integrates other Manhattan relations to the pose graph. Ablation studies show the utility of both loop and Manhattan constraints as well as the superior performance of an incremental topological SLAM over a full batch topological SLAM. (Refer to Table III.)

We also show how the twoway exchange between the and further improves the accuracy of the . This twoway exchange between between the various levels of representation is unique to this effort. Refer to the bottom two rows in Table III.
Through the above formulation, the paper essentially exploits the Manhattan properties present in indoor warehouse scenes to perform PG recoveries.
Ii Methodology
Consider an unoptimized pose graph represented by its nodes as and edges as . The edges relation are of the following kinds:

Odometry relation between successive nodes.

Loop closure relation between a pair of nodes.

Manhattan relation between a pair of nodes.
We obtain odometry relations from fused ICP and wheel odometry estimates which gives us the initial pose graph which is highly erroneous. We then leverage the topological and Manhattan level awareness to generate the loop closure and Manhattan relations and use them for pose graph optimization to recover accurate graphs. This whole process is divided into 3 subsections:

Topological categorization using a convolutional neural network classifier and its graph construction.

Constructing Manhattan Graph from the obtained Topological Graph and predicting loop closure constraints using MultiLayer Perceptron.

Pose graph optimization using obtained Manhattan and loop closure constraints.
Each part of the pipeline is described in each subsection below, followed by experiments and results which are explained in the next section.
Iia Topological Categorization and Graph Construction
Every node is associated with a topological label or where for a warehouse scene. To obtain these topological labels from visual data, we train a Convolutional Neural Network (CNN) configured for classification. The training data consists of RGB images resized to and paired with topological node labels. For our warehouse setting, the labels are Rackspace, Corridor, Intersection:

Rackspace: Location on path between two rackspaces

Corridor: Location on the warehouse boundary path common to rackspaces

Intersection: A transition location on the path
Figure1 entails examples of frames and their topological labels. We train a ResNet18 [he2016deep]
architecture pretrained on ImageNet
[imagenet_cvpr09] with its final layer replaced by aneuron fully connected layer, corresponding to the possible topological node labels. During training, we optimize the network to minimize crossentropy loss. To account for class imbalance, we use classweighted loss
[johnson2019survey] with the following set of weights: Rackspace, Corridor, Intersection. The CNN is finetuned for epochs using Adam optimizer with a learning rate of for the pretrained ResNet18 layers and a learning rate of for the final layer weights. We stop training when the validation loss starts to increase. For training, we use images from two warehouses with a minibatch size of . To evaluate the trained network, we use images. The results are presented in the next section.After obtaining the inferred labels from the CNN, we group together the adjacent nodes that share the same label. Thus, a node in Topological Graph consists of two positions from the dense Pose Graph , i.e. the starting and ending positions of that topology.
IiB Manhattan Graph Construction and Constraint Prediction using MLP
We now explain how the Topological Graph of the last section is converted to a Manhattan Graph, . We denote each node in the as a metanode, , where corresponds to a collection i.e. of nodes, such that we write and for every node in the collection set . A new metanode is formed when there is a change in the label.
The pose graph nodes, their corresponding topology labels shown in the color denoting the label, the collection of such nodes that constitute a meta node also shown in the same color in the are portrayed in figure 1.
The relies on two essential measurements for its construction.

The length of traversal or the length of topologies such as corridor or a rackspace.

The angle made between two corridors/two rackspaces/rackspace and corridor via an intersection.
The length of the traversal is obtained by integrating fused odometry and ICP based transformations between two successive nodes of the Pose Graph that belong to the same meta node in the Manhattan Graph. The angle made as the robot moves from one topology (rackspace/corridor) to another (rackspace/corridor) via an intersection is estimated by fusing odometry and scan matching ICP measurements and integrating them over the traversal through the Intersection. This angle is binned to the closest multiple of as one of . We use these sets of obtained lengths and angles along with the category of the meta node i.e. as attributes as input to a Siamesestyle MLP neural network in order to determine if any two nodes in are the same instance of a topological construct. In other words, the MLP determines if any two nodes in correspond to the same topological area of the workspace.
The training data for the MLP consists of what we have described as “metanodes” above. Each metanode is a tuple consisting of . The four values of the tuple denote the starting and ending displacement coordinate of a particular node with respect to a global origin (global origin is the point from where the robot starts moving in the warehouse). and denote the displacement coordinates in the xdirection. and denote the displacement coordinates in the ydirection. We create training data on the fly since we know the general structure of our warehouse and hence can create nodes synthetically using random numbers with similar lengths. The architecture is a Siamese network [bromley1994signature] which consists of two hidden layers. We apply contrastive loss on the output obtained from the Siamese network to constrain semantically similar “metanode” representations to lie closer to each other. During inference, the MLP compares two nodes of the Manhattan Graph and predicts if the nodes correspond to the same topological instance. We base our approach on two strong assumptions:

Each node comprises of one contiguous region of one particular category.

Each node has displacement only in one direction. (Along x or y).
The classification that results from the MLP is particularly powerful due to its ability to classify two topological instance to be the same even when viewed from opposing viewpoints. This is shown in Figure2 where the same topology is viewed from opposite viewpoint and have little in common. Yet the MLP’s accurate classification of them to be the same instance becomes particularly useful for the Pose Graph optimization described in the next section.
The MLP’s non reliance on perceptual inputs also comes in handy for repetitive topologies. Warehouse scenes are often characterizes by repetitive structure and are prone to perceptual aliasing. The classification accuracy of the MLP is unaffected by such repetitiveness in the environment since it bypasses perceptual inputs. Yet the MLP does make use of perceptual inputs minimally in that it attempts to answer if the two nodes in the MG are the same instances only if the topological labels of the two nodes are predicted to be the same by the CNN.
IiC Pose Graph Optimization
The Manhattan relation that exists between two nodes and represented as serves as a Manhattan constraint between the nodes corresponding to in the pose graph (in a manner consistent with the edge relations given in posegraph libraries such as G2O [kummerle2011_g2o], GTSAM [kaess2008isam]). is typically or depending on whether the topology is being revisited with the same or opposing orientation.
The output of the MLP classifier is also used to invoke loop closure constraints. A pair of nodes classified to be the same topological construct by MLP corresponds to two sets of posegraph nodes in the unoptimized graph belonging to the same area. Multiple loop closure relation are thus obtained between the posegraph nodes of these two sets. Apart from these, there exist immediate Manhattan relations between two adjacent rackspaces or two adjacent corridors or a rackspace adjacent to a corridor mediated through an intersection. All such relations that exist in the Manhattan Graph as well as the loop closure relations percolate to the nodes in the PG as described further below.
In effect the optimizer solves for [sunderhauf2011brief]:
where
is posterior probability of posegraph
over set of constraints , and are pose and controls of the robot. The loop closure relation between nodes and is obtained using ICP. There are in principle loop closure relations that are possible between topological constructs where is the collection set of Manhattan node as described before and is the cardinality of the set . Whereas in practice we only sample a subset of such relations to constrain the graph.Similarly, the graph is also constrained by Manhattan relations that are invoked between the posegraph nodes that constitute the sets and where and represent the Pose Graph nodes within the neighbourhood of and .
Typically . More formally, let be the set that enumerates all loop closure pairs discovered by the MLP over a Manhattan Graph MG. i.e , where each element of the set is a loop closure pair on the graph and are the nodes of the . Let be an iterator iterating over the element of , . Let be the set of all loop closure relations, obtained for every by sampling from the number of loop closures possible for every . Similarly, let be the set of all Manhattan relation obtained for every from the neighbouring nodes in the unoptimized graph for every . Then
Iii Experimentation and Results
Iiia Topological Categorization in a Real Warehouse Setting
The performance of the topological node classification CNN (Section IIA) can be viewed in Table I
. For the combined dataset, the network is able to classify the rackspace and corridor with very low false positives and false negatives with precision and recall more than
each. However, it is relatively difficult to classify the third class i.e. intersection as there is not much semantic consistency as the robot moves from one topology to another, which is reflected in the fact that the recall value is quite low, about . We explain how this inaccuracy affects the downstream modules in the Section IIID.Warehouse dataset  Accuracy 

1&2  93.75 
1  95.15 
2  89.06 
Metrics for Combined Data (1&2)  
Category  Precision  Recall 

Rackspace  94.2  96.3 
Corridor  96.3  96.4 
Intersection  85.6  78.1 
IiiB Efficacy of Loop Closure Constraint Prediction using MLP
Network Type  Warehouse1  Warehouse2 
Accuracy  71.2  67.7 
We showcase our pipeline on two different warehouses. There were two experiments performed. First, we sample our training data according to the layout and length constraints of warehouse1 and use the datapoints of warehouse1 as the lone testing data. In our second experiment, we train our MLP specifically according to the layout and length constraints of warehouse2.
The detection of nodes belonging to the same topology was observed to be accurate at the initial phase of the trajectory. The latter part of the trajectory was not accurate and had drift due to which the detection of nodepairs was observed to be inaccurate (Unoptimized Pose Graph shown in Figure3). We were able to improve the accuracy of the MLP and were able to generate accurate loop pairs by performing optimization on the Pose Graph in a cyclic fashion as shown in Figure3.
We performed the experiments on both warehouses. The accuracy is calculated by checking for the percentage of the true node pairs. An accuracy of and was observed for the first and the second warehouse respectively.
IiiC Pose graph optimization Results
The ablation study on the type of constraints have been done in five stages. The robustness of map recovery increases with each stage which reflects in Absolute Trajectory Error in Table III.
IiiC1 Stages of Map Recovery

[label=()]

Manhattan constraints: Only manhattan constraints are used to optimize pose graph, PG. Constraints are extracted from manhattan graph, MG, between nodes proposed by MLP to be similar.

Loop Closure and Manhattan constraints: Apart from Manhattan constraints, Loop Closure constraints as explained in section IIC are also used to constrain the PG.

Dense Proposals from MLP: We consider nodes that have been classified to belong to the same instance with low confidence along with those classified to be the same with high confidence. This increases the number of constraints improving the optimization performance. The wrongly detected loops are filtered based on the loop closure (ICP) residual cost and do not make it to the optimization.

Dense Proposals by MLP in Feedback Loop: A feedback loop is invoked on the optimized PG from previous stage. A new MG is computed on the optimized PG, this manhattan graph, MG, is feeded to MLP and sets of constraints are generated in a cyclic manner. This feedback mechanism leads to MLP performance improvement as shown in Table IV and also helps in achieving very low Absolute Trajectory Error, ATE of 1.82 meters on four different maps from 11.57 meters in unoptimized map. This corresponds to the fourth of the contribution mentioned in the Section I.

Incremental formulation: Performing the feedback strategy from the previous stage in an incremental formulation in ISAM [kaess2008isam] helps us to achieve the lowest ATE of 1.45 meters in our system. This confirms the robustness of our system to recover from highly unoptimized trajectories.
IiiC2 Qualitative Results
We evaluate our system in two challenging real warehouse settings. The warehouse dimensions are and contains rackspaces with intermediate corridors and intersections. All experiments start with highly deformed trajectories. In all the cases, we were able to recover trajectories close to the groundtruth. Note that in our case, the groundtruth trajectory is the optimized map from the cartographer that has been confirmed with warehouse floor plan by our collaborators . These results are shown in Figure 5. The top row shows highly distorted pose graph trajectories while the middle row showcases the results of our optimization framework. The last row depicts the ground truth trajectories. The overall pipeline gets best illustrated with Figure 3.
IiiD Robustness Analysis
We analyze the performance of the topological SLAM due to errors in topological classification due to the CNN and due to failure to detect loops by the MLP. Errors in topology classification manifest as loop detection in the MG. Therefore, the analysis is one of the robustness due to wrong loop detection wherein both false positive and false negative cases are considered. The robustness to the pose graph optimization stems due from the following features:

Residuals in the ICP estimated loop closure end up serving as priors to the element of dynamically scaled covariance matrix [agarwal2013robust], which serves as a robust kernel providing for backend topology recovery even when the number of wrong loop closures increase.

An optimized PG feeds back to the MG and alleviates its error. The improved MG improve the loop detection performance of MLP, which percolate to the PG nodes and further improve its accuracy. Overtime this iterative exchange of information between the various representations improves the robustness of the PG backend.
To analyse the performance of our robust kernel exclusively in presence of the outliers, we synthetically introduced false positive and false negative loop closure pairs in the constraints for
. In Figure 4 Xaxis represents percentage of loop closure pairs in the dataset having equal amount of false positive and false negative pairs. Yaxis represents ATE of TG with respect to the ground truth. The Figure 4 depicts the performance of our framework due to errors occurred in topology classification and loop detection.From the analysis of results in Figure 4 it is evident that gradual increase in outliers can be tolerated by Robust kernel with DCS [agarwal2013robust] as compared with non robust optimization techniques.
Method Type  W2.1  W2.2  W1.1  W1.2  Avg 
ATE  ATE  ATE  ATE  ATE  
Unoptimized  4.7  7.5  16.3  17.8  11.57 
MLP Manhattan (G2O)  3.42  2.85  4.5  7.4  4.54 
MLP Manhattan + LC (G2O)  3.09  2.7  3.9  1.67  2.84 
Dense MLP Manhattan + LC (G2O)  1.98  1.96  2.75  1.65  2.08 
Dense MLP Manhattan + LC (In Feedback Loop) (G2O)  1.67  1.8  2.21  1.6  1.82 
Dense MLP Manhattan + LC (In Feedback Loop) (iSAM)  1.6  0.98  1.02  2.2  1.45 
True Positive  False Positive  Accuracy  


119  81  59.5  

133  55  70.7 
Iv Conclusion and Future Work
This paper shows how higher level abstractions of an indoor workspace such as real warehouses can be used to effectively improve lower level backend modules of localization and mapping. Specifically we show how higher and intermediate level abstractions in the form of Topological Graph and Manhattan Graph can recover from backend pose graph optimization failures. Further by constant information exchange between the various levels of map abstractions we improve quantitatively the ATE by more than 87.4% starting from very distorted pose graphs. We further show the method is robust to failures in the higher level representations such, which occurs when the Deep CNN architecture wrongly classifies a topological construct or when the Siamese style classifier wrongly detects or fails to detect loops in the Manhattan graph. The results shown are on two different real warehouse scenes over an area of around , filled with many repetitive topologies in the form of corridor areas and rackspaces. Future results are intended to be shown on a variety of indoor topologies and office spaces such as for example those found in the Gibson environment [xiazamirhe2018gibsonenv].
Comments
There are no comments yet.