In order for autonomous vehicles to travel safely at higher speeds or operate in wide-open spaces where there is a dearth of distinct features, a new level of robust sensing is required. fmcw radar satisfies these requirements, thriving in all environmental conditions (rain, snow, dust, fog, or direct sunlight), providing a
view of the scene, and detecting targets at ranges of up to hundreds of metres with centimetre-scale precision. Indeed, there is a burgeoning interest in exploiting fmcw radar to enable robust mobile autonomy, including ego-motion estimation[cen2018precise, cen2019radar, 2019ICRA_aldera, 2019ITSC_aldera, Barnes2019MaskingByMoving, UnderTheRadarArXiv], localisation [KidnappedRadarArXiv, tang2020rsl]
, and scene understanding[weston2019probably].
Figure 1 shows an overview of the pipeline proposed in this paper which extends our recent work in extremely robust radar-only place recognition [KidnappedRadarArXiv] in which a metric space for embedding polar radar scans was learned, facilitating topological localisation using nn matching. We show that this learned metric space can be leveraged within a sequenced-based topological localisation framework to bolster matching performance by both mitigating visual similarities that are caused by the planarity of the sensor and failures due to sudden obstruction in dynamic environments. Due to the complete horizontal fov of the radar scan formation process, we show how the off-the-shelf sequence-based trajectory matching system can be manipulated to allow us to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction.
This paper proceeds by reviewing related literature in Section II. Section III describes our approach for a more canny use of a metric space in which polar radar scans are embedded. We describe in Section IV details for implementation, evaluation, and our dataset. Section V discusses results from such an evaluation. Sections VII and VI summarise the findings and suggest further avenues for investigation.
Ii Related Work
Recent work has shown the promise of fmcw radar for robust place recognition [KidnappedRadarArXiv, gskim2020mulran] and metric localisation [UnderTheRadarArXiv]. None of these methods account for temporal effects in the radar measurement stream.
SeqSLAM [milford2012seqslam] and its variants have been extremely successful at tackling large-scale, robust place recognition with video imagery in the last decade. Progress along these lines has included automatic scaling for viewpoint invariance [pepperell2015automatic], probabilistic adjustments to the search technique [hansen2014visual], and dealing with challenging visual appearance change using gan [latif2018addressing].
The work presented in this paper is most closely influenced by the use of feature embeddings learned by training cnn [dongdong2018cnn], omnidirectional cameras [cheng2019panoramic], and lidar [yin2018synchronous] within the SeqSLAM framework.
Broadly, our method can be summarised as leveraging very recent results in dl techniques which provide good metric embeddings for the global location of radar scans within a robust sequence-based trajectory matching system. We begin the discussion with a brief overview of the baseline SeqSLAM algorithm, followed by a light description of the learned metric space, and concluded by an application which unifies these systems – the main contribution of this paper.
Iii-a Overview of SeqSLAM
Our implementation of the proposed system is based on an open-source, publicly available port of the original algorithm111https://github.com/tmadl/pySeqSLAM.
Incoming images are preprocessed by downsampling (to thumbnail resolution) and patch normalisation. A difference matrix is constructed storing the euclidean distance between all image pairs. This difference matrix is then contrast enhanced. Examples of these matrices can be seen in LABEL:fig:diff_m. For more detail, a good summary is available in [sunderhauf2013we].
When looking for a match to a query image, SeqSLAM sweeps through the contrast-enhanced difference matrix to find the best matching sequence of adjacent frames.
Iii-B Overview of Kidnapped Radar
To learn filters and cluster centres which help distinguish polar radar images for place recognition we use NetVLAD [arandjelovic2016netvlad] with VGG-16 [simonyan2014very]
as a front-end feature extractor – both popularly applied to the place recognition problem. Importantly, we make alterations such that the network invariant to the orientation of input radar scans, including: circular padding[wang2018omnidirectional], anti-aliasing blurring [zhang2019making]
To enforce the metric space, we perform online triplet mining and apply the triplet loss described in [schroff2015facenet]. Loop closure labels are taken from a ground truth dataset (c.f. Section IV).
The interested reader is referred to [KidnappedRadarArXiv] for more detail.
Iii-C Sequence-based Radar Place Recognition
We replace the image preprocessing step with inference on the network described in Section III-B, resulting in radar scan descriptors of size .
The difference matrix is obtained by calculating the Euclidean distance between every pair of embeddings taken from places along the reference and live trajectories in a window of length . This distance matrix is then locally contrast enhanced in sections of length .
When searching for a match to a query image, we perform a sweep through this contrast-enhanced difference matrix to find the best matching sequence of frames based on the sum of sequence differences. In order to be able to detect matches in reverse, this procedure is repeated with a time-reversed live trajectory – this method would not be applicable to narrow fov cameras but is appropriate here as the radar has a fov.
A simple visualisation of the process is shown in Figure 2. In tis case, the forward search (blue lines) is mirrored to perform a backwards search (red lines), which results in the selection of the best match (solid black line). In the experiments (c.f. Sections V and IV) we refer to this modified search as LAY (“Look Around You”).
This procedure is performed for each template, on which a threshold is applied to select the best matches. Section V discusses the application of the threshold and reports the results in comparison to the original SeqSLAM approach; in particular Figure 4 shows visual examples of the discussed methodology.
Iv Experimental Setup
This section details our experimental design in obtaining the results to follow in Section V.
Iv-a Vehicle and radar specifications
Data was collected using the Oxford RobotCar platform [RobotCarDatasetIJRR]. The vehicle, as described in the Oxford Radar RobotCar Dataset [RadarRobotCarDatasetArXiv], is fitted with a CTS350-X Navtech fmcw scanning radar.
Iv-B Ground truth database
The ground truth database is curated offline to capture the sets of nodes that are at a maximum distance () from a query frame, creating a graph-structured database that yields triplets of nodes for training the representation discussed in Section III-B.
To this end, we adjust the accompanying ground truth odometry described in [RadarRobotCarDatasetArXiv]
in order to build a database of ground truth locations. We manually selected a moment during which the vehicle was stationary at a common point and trimmed each ground trace accordingly. We also aligned the ground traces by introducing a small rotational offset to account for differing attitudes.
Iv-C Trajectory reservation
Each approximately trajectory in the Oxford city centre was divided into three distinct portions: train, valid, and test.
The network is trained with ground truth topological localisations between two reserved trajectories in the train split.
The test split, upon which the results presented in Section V are based, was specifically selected to feature vehicle traversals over portions of the route in the opposite direction; data from this split are not seen by the network during training.
The results focus on a tr scenario, in which all remaining trajectories in the dataset are localised against a map built from the first trajectory that we did not use for learning, totalling trajectory pairs (and of driving) with the same map but a different localisation run.
Iv-D Measuring performance
In the ground truth database, all locations within a radius of a ground truth location are considered true positives whereas those outside are considered true negatives, a more strictly imposed boundary than in [KidnappedRadarArXiv].
Evaluation of pr is different for the sequence- and nn-based approaches. For the nn-based approach of [KidnappedRadarArXiv] we perform a ball search of the discretised metric space out to a varying embedding distance threshold. For the sequence-based approach advocated in this paper, we vary the minimum match score.
As useful summaries of pr performance, we analyse auc as well as some F-scores, including, , and with [pino1999modern].
Iv-E Hyperparameter tuning
To produce a fair comparison of the different configurations we can utilise to solve the topological localisation problem, we performed a hyperparameter tuning on the various algorithms; we selected two random trials and excluded them fro the final evaluation. The window width for the trajectory evaluationand the enhancement window have been chosen through a grid search procedure. The final values are the ones which produced precision-recall curves with the highest value of precision at recall.
This section presents instrumentation of the metrics discussed in Section IV-D.
Figure 6 shows a family of pr curves for various methods that it is possible to perform SeqSLAM with radar data. Here, only a single trajectory pair is considered (one as the map trajectory, the other as the live trajectory). From Figure 6 it is evident that:
Performance when using the learned metric embeddings is superior to either polar or cartesian radar scans,
Sequence-based matching of trajectories outperforms nn-based searches,
Performance when using the baseline architecture is outstripped by the rotationally-invariant modifications, and
Performance when using the modified search algorithm is boosted.
Observation 1 can be attributed to the fact that the learned representation is designed to encode only knowledge concerning place, whereas the imagery is subject to sensor artefacts. Observation 2 can be attributed to perceptual aliasing along straight, canyon-like sections of an urban trajectory being mitigated. Observation 3 can be attributed to the rotationally-invariant architecture itself. Observation 4 can be attributed to the ability of the adjusted search to detect loop closures in reverse.
Table II provides further evidence for these findings by aggregating pr-related statistics over the entirety of the dataset discussed in Section IV-C – the map trajectory is kept constant and the live trajectory varies over forays spanning a month of urban driving.
While it is clear that we outperform nn techniques presented in [KidnappedRadarArXiv], the F-scores in Table II present a mixed result when comparing the standard SeqSLAM search and the modified search discussed in Section III-C. However, consider Figure 5. Here, the structure of the backwards loop closures is discovered more readily by the backwards search.
It is important to remember when inspecting the results shown in Figures 6 and II that the data in this part of the route (c.f. Section IV-C) is unseen by the network during training, and particularly challenging. This is a necessary analysis of the generalisation of learned place recognition methods but is not a requirement when deploying the learned knowledge in tr modes of autonomy.
The takeaway message is that we have improved the recall at good precision levels by about by applying sequence-based place recognition techniques to our learned metric space.
We have presented an application of recent advances in learning representations for imagery obtained by radar scan formation to sequence-based place recognition. The proposed system is based on a manipulation of off-the-shelf SeqSLAM with prudent adjustments made taking into account the complete sweep made by scanning radar sensors. We have have further proven the utility of our rotationally invariant architecture – a crucial enabling factor of our SeqSLAM variant. Crucially, we achieve a boost of in recall at high levels of precision over our previously published nn approach.
Vii Future Work
In the future we plan to retrain and test the system on the all-weather platform described in [kyberd2019], a signficant factor in the development of which was to explore applications of fmcw radar to mobile autonomy in challenging, unstructured environments. We also plan to integrate the system presented in this paper with our mapping and localisation pipeline which is built atop of the scan-matching algorithm of [2018ICRA_cen, 2019ICRA_cen].
This project is supported by the Assuring Autonomy International Programme, a partnership between Lloyd’s Register Foundation and the University of York as well as UK EPSRC programme grant EP/M019918/1. We would also like to thank our partners at Navtech radar.