Supervised Fine-tuning Evaluation for Long-term Visual Place Recognition

11/14/2022
by   farid-alijani, et al.
0

In this paper, we present a comprehensive study on the utility of deep convolutional neural networks with two state-of-the-art pooling layers which are placed after convolutional layers and fine-tuned in an end-to-end manner for visual place recognition task in challenging conditions, including seasonal and illumination variations. We compared extensively the performance of deep learned global features with three different loss functions, e.g. triplet, contrastive and ArcFace, for learning the parameters of the architectures in terms of fraction of the correct matches during deployment. To verify effectiveness of our results, we utilized two real world datasets in place recognition, both indoor and outdoor. Our investigation demonstrates that fine tuning architectures with ArcFace loss in an end-to-end manner outperforms other two losses by approximately 1 4 given certain thresholds, for the visual place recognition tasks.

READ FULL TEXT

page 1

page 4

research
08/25/2023

Fine-tuning can cripple your foundation model; preserving features may be the solution

Pre-trained foundation models, owing primarily to their enormous capacit...
research
02/26/2022

An Improved Deep Learning Approach For Product Recognition on Racks in Retail Stores

Automated product recognition in retail stores is an important real-worl...
research
04/13/2023

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Recent works have shown that large models pretrained on common visual le...
research
08/20/2021

Contrastive Representations for Label Noise Require Fine-Tuning

In this paper we show that the combination of a Contrastive representati...
research
08/12/2017

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

This paper describes our solution for the video recognition task of Acti...
research
12/16/2022

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

End-to-end text-to-speech synthesis (TTS) can generate highly natural sy...
research
01/24/2019

In Defense of the Triplet Loss for Visual Recognition

We employ triplet loss as a space embedding regularizer to boost classif...

Please sign up or login with your details

Forgot password? Click here to reset