Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio Text Augmentations

08/24/2022
by   Paul Primus, et al.
27

The absence of large labeled datasets remains a significant challenge in many application areas of deep learning. Researchers and practitioners typically resort to transfer learning and data augmentation to alleviate this issue. We study these strategies in the context of audio retrieval with natural language queries (Task 6b of the DCASE 2022 Challenge). Our proposed system uses pre-trained embedding models to project recordings and textual descriptions into a shared audio-caption space in which related examples from different modalities are close. We employ various data augmentation techniques on audio and text inputs and systematically tune their corresponding hyperparameters with sequential model-based optimization. Our results show that the used augmentations strategies reduce overfitting and improve retrieval performance.

READ FULL TEXT
research
08/08/2023

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

This work presents a text-to-audio-retrieval system based on pre-trained...
research
09/20/2022

Language-based Audio Retrieval Task in DCASE 2022 Challenge

Language-based audio retrieval is a task, where natural language textual...
research
11/28/2019

Data Augmentation for Deep Transfer Learning

Current approaches to deep learning are beginning to rely heavily on tra...
research
03/07/2022

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

In this paper, we propose two techniques, namely joint modeling and data...
research
02/14/2023

Detecting human and non-human vocal productions in large scale audio recordings

We propose an automatic data processing pipeline to extract vocal produc...
research
03/09/2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Audio-driven talking face has attracted broad interest from academia and...
research
10/06/2022

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

We present an analysis of large-scale pretrained deep learning models us...

Please sign up or login with your details

Forgot password? Click here to reset