EchoVPR: Echo State Networks for Visual Place Recognition

by   Anil Ozdemir, et al.

Recognising previously visited locations is an important, but unsolved, task in autonomous navigation. Current visual place recognition (VPR) benchmarks typically challenge models to recover the position of a query image (or images) from sequential datasets that include both spatial and temporal components. Recently, Echo State Network (ESN) varieties have proven particularly powerful at solving machine learning tasks that require spatio-temporal modelling. These networks are simple, yet powerful neural architectures that – exhibiting memory over multiple time-scales and non-linear high-dimensional representations – can discover temporal relations in the data while still maintaining linearity in the learning. In this paper, we present a series of ESNs and analyse their applicability to the VPR problem. We report that the addition of ESNs to pre-processed convolutional neural networks led to a dramatic boost in performance in comparison to non-recurrent networks in four standard benchmarks (GardensPoint, SPEDTest, ESSEX3IN1, Nordland) demonstrating that ESNs are able to capture the temporal structure inherent in VPR problems. Moreover, we show that ESNs can outperform class-leading VPR models which also exploit the sequential dynamics of the data. Finally, our results demonstrate that ESNs also improve generalisation abilities, robustness, and accuracy further supporting their suitability to VPR applications.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


Learning Spatio-Temporal Representation with Local and Global Diffusion

Convolutional Neural Networks (CNN) have been regarded as a powerful cla...

Convolutional Neural Network-based Place Recognition

Recently Convolutional Neural Networks (CNNs) have been shown to achieve...

Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework

Video deraining is an important task in computer vision as the unwanted ...

DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding

Many of the leading approaches for video understanding are data-hungry a...

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Models based on deep convolutional networks have dominated recent image ...

Complex Sequential Understanding through the Awareness of Spatial and Temporal Concepts

Understanding sequential information is a fundamental task for artificia...

Please sign up or login with your details

Forgot password? Click here to reset