Look Around You: Sequence-based Radar Place Recognition with Learned Rotational Invariance

03/10/2020 ∙ by Matthew Gadd, et al. ∙ University of Oxford 0

This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos taken by visual sensors. Due to the complete horizontal field of view inherent to the radar scan formation process, we show how this off-the-shelf sequence-based trajectory matching system can be manipulated to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction. We demonstrate the efficacy of the approach on 26 km of challenging urban driving taken from the largest radar-focused urban autonomy dataset released to date – showing a boost of 30 in recall at high levels of precision over a nearest neighbour approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In order for autonomous vehicles to travel safely at higher speeds or operate in wide-open spaces where there is a dearth of distinct features, a new level of robust sensing is required. fmcw radar satisfies these requirements, thriving in all environmental conditions (rain, snow, dust, fog, or direct sunlight), providing a

view of the scene, and detecting targets at ranges of up to hundreds of metres with centimetre-scale precision. Indeed, there is a burgeoning interest in exploiting fmcw radar to enable robust mobile autonomy, including ego-motion estimation 

[cen2018precise, cen2019radar, 2019ICRA_aldera, 2019ITSC_aldera, Barnes2019MaskingByMoving, UnderTheRadarArXiv], localisation [KidnappedRadarArXiv, tang2020rsl]

, and scene understanding 

[weston2019probably].

Figure 1 shows an overview of the pipeline proposed in this paper which extends our recent work in extremely robust radar-only place recognition [KidnappedRadarArXiv] in which a metric space for embedding polar radar scans was learned, facilitating topological localisation using nn matching. We show that this learned metric space can be leveraged within a sequenced-based topological localisation framework to bolster matching performance by both mitigating visual similarities that are caused by the planarity of the sensor and failures due to sudden obstruction in dynamic environments. Due to the complete horizontal fov of the radar scan formation process, we show how the off-the-shelf sequence-based trajectory matching system can be manipulated to allow us to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction.

This paper proceeds by reviewing related literature in Section II. Section III describes our approach for a more canny use of a metric space in which polar radar scans are embedded. We describe in Section IV details for implementation, evaluation, and our dataset. Section V discusses results from such an evaluation. Sections VII and VI summarise the findings and suggest further avenues for investigation.

Figure 1: An overview of our pipeline. The offline stages include enforcing a metric space by training a fcnn which takes polar radar scans as input, and encoding a trajectory of scans (the map) by forward passes through this network (c.f. Section III-B). The online stages involve inference to represent the place the robot currently finds itself within in terms of the learned knowledge and querying the space (c.f. Section III-A) which – in contrast to our prior work – involves a search for coherent sequences of matches rather than a globally closest frame in the embedding space.

Ii Related Work

Recent work has shown the promise of fmcw radar for robust place recognition [KidnappedRadarArXiv, gskim2020mulran] and metric localisation [UnderTheRadarArXiv]. None of these methods account for temporal effects in the radar measurement stream.

SeqSLAM [milford2012seqslam] and its variants have been extremely successful at tackling large-scale, robust place recognition with video imagery in the last decade. Progress along these lines has included automatic scaling for viewpoint invariance [pepperell2015automatic], probabilistic adjustments to the search technique [hansen2014visual], and dealing with challenging visual appearance change using gan [latif2018addressing].

The work presented in this paper is most closely influenced by the use of feature embeddings learned by training cnn [dongdong2018cnn], omnidirectional cameras [cheng2019panoramic], and lidar [yin2018synchronous] within the SeqSLAM framework.

Iii Methodology

Broadly, our method can be summarised as leveraging very recent results in dl techniques which provide good metric embeddings for the global location of radar scans within a robust sequence-based trajectory matching system. We begin the discussion with a brief overview of the baseline SeqSLAM algorithm, followed by a light description of the learned metric space, and concluded by an application which unifies these systems – the main contribution of this paper.

Iii-a Overview of SeqSLAM

Our implementation of the proposed system is based on an open-source, publicly available port of the original algorithm

111https://github.com/tmadl/pySeqSLAM.

Incoming images are preprocessed by downsampling (to thumbnail resolution) and patch normalisation. A difference matrix is constructed storing the euclidean distance between all image pairs. This difference matrix is then contrast enhanced. Examples of these matrices can be seen in LABEL:fig:diff_m. For more detail, a good summary is available in [sunderhauf2013we].

When looking for a match to a query image, SeqSLAM sweeps through the contrast-enhanced difference matrix to find the best matching sequence of adjacent frames.

In the experiments (c.f. Sections V and IV) we refer to this baseline search as SeqSLAM.

Figure 2: The off-the-shelf sequence matching SeqSLAM system is manipulated in this paper to facilitate backwards lcd. This is achieved by mirroring the set, (blue), of trajectories considered – also considering (red). Importantly, this is not a useful adjustment under a naïve application of SeqSLAM to radar images and is only beneficial if a rotationally invariant representation is used to construct the difference matrices.

Iii-B Overview of Kidnapped Radar

To learn filters and cluster centres which help distinguish polar radar images for place recognition we use NetVLAD [arandjelovic2016netvlad] with VGG-16 [simonyan2014very]

as a front-end feature extractor – both popularly applied to the place recognition problem. Importantly, we make alterations such that the network invariant to the orientation of input radar scans, including: circular padding 

[wang2018omnidirectional], anti-aliasing blurring [zhang2019making]

azimuth-wise max-pooling.

To enforce the metric space, we perform online triplet mining and apply the triplet loss described in [schroff2015facenet]. Loop closure labels are taken from a ground truth dataset (c.f. Section IV).

The interested reader is referred to [KidnappedRadarArXiv] for more detail.

In the experiments (c.f. Sections V and IV) we refer to representations obtained in this manner as kRadar.

Iii-C Sequence-based Radar Place Recognition

We replace the image preprocessing step with inference on the network described in Section III-B, resulting in radar scan descriptors of size .

The difference matrix is obtained by calculating the Euclidean distance between every pair of embeddings taken from places along the reference and live trajectories in a window of length . This distance matrix is then locally contrast enhanced in sections of length .

When searching for a match to a query image, we perform a sweep through this contrast-enhanced difference matrix to find the best matching sequence of frames based on the sum of sequence differences. In order to be able to detect matches in reverse, this procedure is repeated with a time-reversed live trajectory – this method would not be applicable to narrow fov cameras but is appropriate here as the radar has a fov.

A simple visualisation of the process is shown in Figure 2. In tis case, the forward search (blue lines) is mirrored to perform a backwards search (red lines), which results in the selection of the best match (solid black line). In the experiments (c.f. Sections V and IV) we refer to this modified search as LAY (“Look Around You”).

This procedure is performed for each template, on which a threshold is applied to select the best matches. Section V discusses the application of the threshold and reports the results in comparison to the original SeqSLAM approach; in particular Figure 4 shows visual examples of the discussed methodology.

Iv Experimental Setup

This section details our experimental design in obtaining the results to follow in Section V.

Iv-a Vehicle and radar specifications

Data was collected using the Oxford RobotCar platform [RobotCarDatasetIJRR]. The vehicle, as described in the Oxford Radar RobotCar Dataset [RadarRobotCarDatasetArXiv], is fitted with a CTS350-X Navtech fmcw scanning radar.

Iv-B Ground truth database

The ground truth database is curated offline to capture the sets of nodes that are at a maximum distance () from a query frame, creating a graph-structured database that yields triplets of nodes for training the representation discussed in Section III-B.

To this end, we adjust the accompanying ground truth odometry described in [RadarRobotCarDatasetArXiv]

in order to build a database of ground truth locations. We manually selected a moment during which the vehicle was stationary at a common point and trimmed each ground trace accordingly. We also aligned the ground traces by introducing a small rotational offset to account for differing attitudes.

Figure 3: Visualisation of a ground truth matrix showing the Euclidean distance between the global positiosn that pairs of radar scans were captured at. Each trajectory pair is associated with such a matrix. Values in these matrices are scaled from distant (white) to nearby (black). In this region of the dataset, the vehicle revisits the same stretch of the route in the opposite direction – visible as the contours perpendicular to the main diagonal.

Iv-C Trajectory reservation

Each approximately trajectory in the Oxford city centre was divided into three distinct portions: train, valid, and test.

The network is trained with ground truth topological localisations between two reserved trajectories in the train split.

The test split, upon which the results presented in Section V are based, was specifically selected to feature vehicle traversals over portions of the route in the opposite direction; data from this split are not seen by the network during training.

The results focus on a tr scenario, in which all remaining trajectories in the dataset are localised against a map built from the first trajectory that we did not use for learning, totalling trajectory pairs (and of driving) with the same map but a different localisation run.

Iv-D Measuring performance

In the ground truth database, all locations within a radius of a ground truth location are considered true positives whereas those outside are considered true negatives, a more strictly imposed boundary than in [KidnappedRadarArXiv].

Evaluation of pr is different for the sequence- and nn-based approaches. For the nn-based approach of [KidnappedRadarArXiv] we perform a ball search of the discretised metric space out to a varying embedding distance threshold. For the sequence-based approach advocated in this paper, we vary the minimum match score.

As useful summaries of pr performance, we analyse auc as well as some F-scores, including

, , and with  [pino1999modern].

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 4: Difference matrices upon which the SeqSLAM variants perform trajectory searches (top row) and relative match-score-matrices (bottom row). These are constructed matching the representation of radar scans in two trajectories (rows-versus-columns for each matrix) – vgg-16/netvlad on the left side (a, b, e and f) and kRadar on the right side (c, d, g and h). (a) and (c) are the difference matrices before enhancement – directly used by the nn search in [KidnappedRadarArXiv] – and (b) and (d) are the respective enhanced form – on which SeqSLAM performs its searches. (e) and (f) are constructed using embeddings inferred by vgg-16/netvlad in the enhanced form (b), while (g) and (h) use embeddings inferred by kRadar in the enhanced form (d). (e) and (g) are computed by using the forward-style match score method employed by standard SeqSLAM; in contrast, (f) and (h) employ the proposed backward-style match score method. All match-score matrices are not defined for the first window of columns as SeqSLAM must fill a buffer of frames before any matching is possible.
(a)
(b)
Figure 5: Binarised match score matrices for the fig:score_fwd_thr baseline and fig:score_bwd_thr mirrored SeqSLAM variants. The threshold applied for binarisation is higher for fig:score_fwd_thr (kRadar,SeqSLAM) than for (kRadar,LAY). This is in order to qualitatively show that even when increasing numbers of potential matches are allowed in SeqSLAM (high recall), the true backwards loop closures are not featured. For LAY (right), they are. From these it is evident that the tailored changes to the fundamental SeqSLAM search strategy are better suited to discover loop closures as the vehicle revisits the same route section with opposing orientation – a common scenario in structured, urban driving.
Figure 6: pr curves showing the benefit of, firstly, using learned metric embeddings as opposed to radar scans directly and, secondly, the tailored changes to the baseline SeqSLAM search algorithm when performing sequence-based radar place recognition.

Iv-E Hyperparameter tuning

To produce a fair comparison of the different configurations we can utilise to solve the topological localisation problem, we performed a hyperparameter tuning on the various algorithms; we selected two random trials and excluded them fro the final evaluation. The window width for the trajectory evaluation

and the enhancement window have been chosen through a grid search procedure. The final values are the ones which produced precision-recall curves with the highest value of precision at recall.

V Results

This section presents instrumentation of the metrics discussed in Section IV-D.

The hyperparameter optimisation (c.f. Section IV-E) results in the parametisation of the systems for comparison as enumerated in Table I.

Representation Search
vgg-16/netvlad SeqSLAM 34 50
kRadar SeqSLAM 37 60
vgg-16/netvlad LAY 31 60
kRadar LAY 37 60
Table I: Hyperparameter summary as results of the hyperparameter grid-search optimisation.

Figure 6 shows a family of pr curves for various methods that it is possible to perform SeqSLAM with radar data. Here, only a single trajectory pair is considered (one as the map trajectory, the other as the live trajectory). From Figure 6 it is evident that:

  1. Performance when using the learned metric embeddings is superior to either polar or cartesian radar scans,

  2. Sequence-based matching of trajectories outperforms nn-based searches,

  3. Performance when using the baseline architecture is outstripped by the rotationally-invariant modifications, and

  4. Performance when using the modified search algorithm is boosted.

Observation 1 can be attributed to the fact that the learned representation is designed to encode only knowledge concerning place, whereas the imagery is subject to sensor artefacts. Observation 2 can be attributed to perceptual aliasing along straight, canyon-like sections of an urban trajectory being mitigated. Observation 3 can be attributed to the rotationally-invariant architecture itself. Observation 4 can be attributed to the ability of the adjusted search to detect loop closures in reverse.

Table II provides further evidence for these findings by aggregating pr-related statistics over the entirety of the dataset discussed in Section IV-C – the map trajectory is kept constant and the live trajectory varies over forays spanning a month of urban driving.

While it is clear that we outperform nn techniques presented in [KidnappedRadarArXiv], the F-scores in Table II present a mixed result when comparing the standard SeqSLAM search and the modified search discussed in Section III-C. However, consider Figure 5. Here, the structure of the backwards loop closures is discovered more readily by the backwards search.

It is important to remember when inspecting the results shown in Figures 6 and II that the data in this part of the route (c.f. Section IV-C) is unseen by the network during training, and particularly challenging. This is a necessary analysis of the generalisation of learned place recognition methods but is not a requirement when deploying the learned knowledge in tr modes of autonomy.

The takeaway message is that we have improved the recall at good precision levels by about by applying sequence-based place recognition techniques to our learned metric space.

Representation Search auc max max max
vgg-16/netvlad nn 0.26 0.34 0.37 0.29 0.03 0.00
kRadar nn 0.37 0.41 0.40 0.41 0.16 0.06
vgg-16/netvlad SeqSLAM 0.47 0.49 0.37 0.60 0.40 0.31
kRadar SeqSLAM 0.52 0.53 0.41 0.64 0.45 0.37
vgg-16/netvlad LAY 0.46 0.48 0.36 0.60 0.39 0.32
kRadar LAY 0.53 0.53 0.42 0.62 0.46 0.36
Table II: Summary statistics for various radar-only SeqSLAM techniques (including representation of the radar frame and style of search) as aggregated over a month of urban driving. All quantities are expressed as a mean value. As discussed in Section IV-D, the requirement imposed on matches (as true/false positives/negatives) is more strict than that presented in [KidnappedRadarArXiv] with consequentially worse performance than previously published for vgg-16/netvlad kRadar, and NN systems. The key message of this paper is that sequence-based exploitation of these learned metric embeddings (middle and bottom rows) is beneficial in comparison to nn matching in a discretised search space (top two rows).

Vi Conclusion

We have presented an application of recent advances in learning representations for imagery obtained by radar scan formation to sequence-based place recognition. The proposed system is based on a manipulation of off-the-shelf SeqSLAM with prudent adjustments made taking into account the complete sweep made by scanning radar sensors. We have have further proven the utility of our rotationally invariant architecture – a crucial enabling factor of our SeqSLAM variant. Crucially, we achieve a boost of in recall at high levels of precision over our previously published nn approach.

Vii Future Work

In the future we plan to retrain and test the system on the all-weather platform described in [kyberd2019], a signficant factor in the development of which was to explore applications of fmcw radar to mobile autonomy in challenging, unstructured environments. We also plan to integrate the system presented in this paper with our mapping and localisation pipeline which is built atop of the scan-matching algorithm of [2018ICRA_cen, 2019ICRA_cen].

Acknowledgment

This project is supported by the Assuring Autonomy International Programme, a partnership between Lloyd’s Register Foundation and the University of York as well as UK EPSRC programme grant EP/M019918/1. We would also like to thank our partners at Navtech radar.

References