DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

by   Min Yang, et al.

Image Retrieval is a fundamental task of obtaining images similar to the query one from a database. A common image retrieval practice is to firstly retrieve candidate images via similarity search using global image features and then re-rank the candidates by leveraging their local features. Previous learning-based studies mainly focus on either global or local image representation learning to tackle the retrieval task. In this paper, we abandon the two-stage paradigm and seek to design an effective single-stage solution by integrating local and global information inside images into compact image representations. Specifically, we propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval. It attentively extracts representative local information with multi-atrous convolutions and self-attention at first. Components orthogonal to the global image representation are then extracted from the local information. At last, the orthogonal components are concatenated with the global representation as a complementary, and then aggregation is performed to generate the final representation. The whole framework is end-to-end differentiable and can be trained with image-level labels. Extensive experimental results validate the effectiveness of our solution and show that our model achieves state-of-the-art image retrieval performances on Revisited Oxford and Paris datasets.


page 1

page 3

page 8


Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval

Image retrieval targets to find images from a database that are visually...

DALG: Deep Attentive Local and Global Modeling for Image Retrieval

Deeply learned representations have achieved superior image retrieval pe...

Unifying Deep Local and Global Features for Efficient Image Search

A key challenge in large-scale image retrieval problems is the trade-off...

Leveraging Implicit Spatial Information in Global Features for Image Retrieval

Most image retrieval methods use global features that aggregate local di...

Deep Learning Based Image Retrieval in the JPEG Compressed Domain

Content-based image retrieval (CBIR) systems on pixel domain use low-lev...

Coarse2Fine: Two-Layer Fusion For Image Retrieval

This paper addresses the problem of large-scale image retrieval. We prop...

Learning Super-Features for Image Retrieval

Methods that combine local and global features have recently shown excel...

Please sign up or login with your details

Forgot password? Click here to reset