Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

10/17/2020
by   Pavlos Avgoustinakis, et al.
0

In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the robust similarity calculation between two videos, we first extract representative audio-based video descriptors by leveraging transfer learning based on a Convolutional Neural Network (CNN) trained on a large scale dataset of audio events, and then we calculate the similarity matrix derived from the pairwise similarity of these descriptors. The similarity matrix is subsequently fed to a CNN network that captures the temporal structures existing within its content. We train our network following a triplet generation process and optimizing the triplet loss function. To evaluate the effectiveness of the proposed approach, we have manually annotated two publicly available video datasets based on the audio duplicity between their videos. The proposed approach achieves very competitive results compared to three state-of-the-art methods. Also, unlike the competing methods, it is very robust to the retrieval of audio duplicates generated with speed transformations.

READ FULL TEXT
research
08/20/2019

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

In this paper we introduce ViSiL, a Video Similarity Learning architectu...
research
03/14/2020

Emotions Don't Lie: A Deepfake Detection Method using Audio-Visual Affective Cues

We present a learning-based multimodal method for detecting real and dee...
research
09/03/2023

Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time Warping Algorithm

This paper presents a novel semi-supervised deep learning algorithm for ...
research
11/10/2022

3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicat...
research
03/15/2023

Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

Existing audio analysis methods generally first transform the audio stre...
research
07/04/2019

LumièreNet: Lecture Video Synthesis from Audio

We present LumièreNet, a simple, modular, and completely deep-learning b...
research
12/27/2017

A Robust Zero-Watermark Scheme with Similarity-based Retrieval for Copyright Protection of 3D Video

The copyright protection of 3D videos has become a crucial issue. In thi...

Please sign up or login with your details

Forgot password? Click here to reset