Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

10/26/2022
by   Marie Kunešová, et al.
0

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection. First, we concentrate on only one task – speaker change detection – where our proposed system surpasses the previously reported results on four different corpora, and achieves comparable performance even when trained on out-of-domain data from an artificially designed dataset. Then we expand our approach to tackle all three tasks in a single multitask system with state-of-the-art performance on the AMI corpus. The implementation of the algorithms in this paper is publicly available at https://github.com/mkunes/w2v2_audioFrameClassification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Recently, self-supervised learning (SSL) has demonstrated strong perform...
research
09/22/2022

Cross-domain Voice Activity Detection with Self-Supervised Representations

Voice Activity Detection (VAD) aims at detecting speech segments on an a...
research
10/28/2022

SG-VAD: Stochastic Gates Based Speech Activity Detection

We propose a novel voice activity detection (VAD) model in a low-resourc...
research
04/22/2017

Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

Automatically assessing emotional valence in human speech has historical...
research
06/13/2021

WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments

In the speaker extraction problem, it is found that additional informati...
research
06/22/2023

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identificati...
research
12/02/2019

Speaker detection in the wild: Lessons learned from JSALT 2019

This paper presents the problems and solutions addressed at the JSALT wo...

Please sign up or login with your details

Forgot password? Click here to reset