Benchmarking unsupervised near-duplicate image detection

07/03/2019
by   Lia Morra, et al.
3

Unsupervised near-duplicate detection has many practical applications ranging from social media analysis and web-scale retrieval, to digital image forensics. It entails running a threshold-limited query on a set of descriptors extracted from the images, with the goal of identifying all possible near-duplicates, while limiting the false positives due to visually similar images. Since the rate of false alarms grows with the dataset size, a very high specificity is thus required, up to 1 - 10^-9 for realistic use cases; this important requirement, however, is often overlooked in literature. In recent years, descriptors based on deep convolutional neural networks have matched or surpassed traditional feature extraction methods in content-based image retrieval tasks. To the best of our knowledge, ours is the first attempt to establish the performance range of deep learning-based descriptors for unsupervised near-duplicate detection on a range of datasets, encompassing a broad spectrum of near-duplicate definitions. We leverage both established and new benchmarks, such as the Mir-Flick Near-Duplicate (MFND) dataset, in which a known ground truth is provided for all possible pairs over a general, large scale image collection. To compare the specificity of different descriptors, we reduce the problem of unsupervised detection to that of binary classification of near-duplicate vs. not-near-duplicate images. The latter can be conveniently characterized using Receiver Operating Curve (ROC). Our findings in general favor the choice of fine-tuning deep convolutional networks, as opposed to using off-the-shelf features, but differences at high specificity settings depend on the dataset and are often small. The best performance was observed on the MFND benchmark, achieving 96% sensitivity at a false positive rate of 1.43 × 10^-6.

READ FULL TEXT

page 7

page 20

research
03/14/2022

Dataset and Case Studies for Visual Near-Duplicates Detection in the Context of Social Media

The massive spread of visual content through the web and social media po...
research
06/11/2015

Techniques for effective and efficient fire detection from social media images

Social media could provide valuable information to support decision maki...
research
11/10/2015

Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing

A typical image retrieval pipeline starts with the comparison of global ...
research
03/19/2016

Large scale near-duplicate image retrieval using Triples of Adjacent Ranked Features (TARF) with embedded geometric information

Most approaches to large-scale image retrieval are based on the construc...
research
11/06/2020

Efficient image retrieval using multi neural hash codes and bloom filters

This paper aims to deliver an efficient and modified approach for image ...
research
11/25/2021

GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval

Even though it has extensively been shown that retrieval specific traini...
research
12/11/2018

Rotation Invariant Descriptors for Galaxy Morphological Classification

The detection of objects that are multi-oriented is a difficult pattern ...

Please sign up or login with your details

Forgot password? Click here to reset