USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

09/14/2023
by   Guanlong Zhao, et al.
0

We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75 consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8 internal test sets, beating the previous monolingual baseline model by 21 relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model that...
research
05/20/2023

Self-supervised representations in speech-based depression detection

This paper proposes handling training data sparsity in speech-based auto...
research
11/15/2021

Joint Unsupervised and Supervised Training for Multilingual ASR

Self-supervised training has shown promising gains in pretraining models...
research
07/08/2022

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

Self-supervised-learning-based pre-trained models for speech data, such ...
research
05/13/2020

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

In previous works, only parameter weights of ASR models are optimized un...
research
06/28/2023

Cascaded encoders for fine-tuning ASR models on overlapped speech

Multi-talker speech recognition (MT-ASR) has been shown to improve ASR p...
research
02/10/2023

Spoken language change detection inspired by speaker change detection

Spoken language change detection (LCD) refers to identifying the languag...

Please sign up or login with your details

Forgot password? Click here to reset