SpliceOut: A Simple and Efficient Audio Augmentation Method

09/30/2021
by   Arjit Jain, et al.
4

Time masking has become a de facto augmentation technique for speech and audio tasks, including automatic speech recognition (ASR) and audio classification, most notably as a part of SpecAugment. In this work, we propose SpliceOut, a simple modification to time masking which makes it computationally more efficient. SpliceOut performs comparably to (and sometimes outperforms) SpecAugment on a wide variety of speech and audio tasks, including ASR for seven different languages using varying amounts of training data, as well as on speech translation, sound and music classification, thus establishing itself as a broadly applicable audio augmentation method. SpliceOut also provides additional gains when used in conjunction with other augmentation techniques. Apart from the fully-supervised setting, we also demonstrate that SpliceOut can complement unsupervised representation learning with performance gains in the semi-supervised and self-supervised settings.

READ FULL TEXT
research
01/05/2022

Robust Self-Supervised Audio-Visual Speech Recognition

Audio-based automatic speech recognition (ASR) degrades significantly in...
research
02/27/2020

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

We propose autoencoding speaker conversion for training data augmentatio...
research
03/16/2021

Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning

This paper describes the results of an informal collaboration launched d...
research
06/09/2021

Unsupervised Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) systems can be trained to achieve rem...
research
04/18/2022

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Recent work has designed methods to demonstrate that model updates in AS...
research
04/27/2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Multimodal speech recognition aims to improve the performance of automat...
research
04/05/2021

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Speech-based image retrieval has been studied as a proxy for joint repre...

Please sign up or login with your details

Forgot password? Click here to reset