Attention-Based Audio Embeddings for Query-by-Example

10/16/2022
by   Anup Singh, et al.
0

An ideal audio retrieval system efficiently and robustly recognizes a short query snippet from an extensive database. However, the performance of well-known audio fingerprinting systems falls short at high signal distortion levels. This paper presents an audio retrieval system that generates noise and reverberation robust audio fingerprints using the contrastive learning framework. Using these fingerprints, the method performs a comprehensive search to identify the query audio and precisely estimate its timestamp in the reference audio. Our framework involves training a CNN to maximize the similarity between pairs of embeddings extracted from clean audio and its corresponding distorted and time-shifted version. We employ a channel-wise spectral-temporal attention mechanism to better discriminate the audio by giving more weight to the salient spectral-temporal patches in the signal. Experimental results indicate that our system is efficient in computation and memory usage while being more accurate, particularly at higher distortion levels, than competing state-of-the-art systems and scalable to a larger database.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2022

Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Audio fingerprinting systems must efficiently and robustly identify quer...
research
06/16/2023

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

This paper explores grading text-based audio retrieval relevances with c...
research
10/22/2020

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Most of existing audio fingerprinting systems have limitations to be use...
research
09/28/2022

Audio Retrieval with WavText5K and CLAP Training

Audio-Text retrieval takes a natural language query to retrieve relevant...
research
05/16/2023

Robust and lightweight audio fingerprint for Automatic Content Recognition

This research paper presents a novel audio fingerprinting system for Aut...
research
05/29/2019

A new definition of the distortion matrix for an audio-to-score alignment system

In this paper we present a new definition of the distortion matrix for a...
research
10/04/2021

AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

Artefacts that differentiate spoofed from bona-fide utterances can resid...

Please sign up or login with your details

Forgot password? Click here to reset