Few-shot learning of new sound classes for target sound extraction

06/14/2021
by   Marc Delcroix, et al.
0

Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds. It can be realized using a neural network that extracts the target sound conditioned on a 1-hot vector that represents the desired AE class. With this approach, embedding vectors associated with the AE classes are directly optimized for the extraction of sound classes seen during training. However, it is not easy to extend this framework to new AE classes, i.e. unseen during training. Recently, speech, music, or AE sound extraction based on enrollment audio of the desired sound offers the potential of extracting any target sound in a mixture given only a short audio signal of a similar sound. In this work, we propose combining 1-hot- and enrollment-based target sound extraction, allowing optimal performance for seen AE classes and simple extension to new classes. In experiments with synthesized sound mixtures generated with the Freesound Dataset (FSD) datasets, we demonstrate the benefit of the combined framework for both seen and new AE classes. Besides, we also propose adapting the embedding vectors obtained from a few enrollment audio samples (few-shot) to further improve performance on new classes.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
04/08/2022

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

In many situations, we would like to hear desired sound events (SEs) whi...
research
04/02/2022

Improving Target Sound Extraction with Timestamp Information

Target sound extraction (TSE) aims to extract the sound part of a target...
research
06/10/2020

Listen to What You Want: Neural Network-based Universal Sound Selector

Being able to control the acoustic events (AEs) to which we want to list...
research
12/19/2021

Detect what you want: Target Sound Detection

Human beings can perceive a target sound that we are interested in from ...
research
03/08/2022

Locate This, Not That: Class-Conditioned Sound Event DOA Estimation

Existing systems for sound event localization and detection (SELD) typic...
research
04/12/2022

Text-Driven Separation of Arbitrary Sounds

We propose a method of separating a desired sound source from a single-c...
research
02/26/2020

An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

The problem of training a deep neural network with a small set of positi...

Please sign up or login with your details

Forgot password? Click here to reset