Speech Activity Detection Based on Multilingual Speech Recognition System

10/23/2020
by   Seyyed Saeed Sarfjoo, et al.
0

To better model the contextual information and increase the generalization ability of a voice detection system, this paper leverages a multi-lingual Automatic Speech Recognition (ASR) system to perform Speech Activity Detection (SAD). Sequence-discriminative training of multi-lingual Acoustic Model (AM) using Lattice-Free Maximum Mutual Information (LF-MMI) loss function, effectively extracts the contextual information of the input acoustic frame. The index of maximum output posterior is considered as a frame-level speech/non-speech decision function. Majority voting and logistic regression are applied to fuse the language-dependent decisions. The leveraged multi-lingual ASR is trained on 18 languages of BABEL datasets and the built SAD is evaluated on 3 different languages. In out-of-domain datasets, the proposed SAD model shows significantly better performance w.r.t. baseline models. In the Ester2 dataset, without using any in-domain data, this model outperforms the WebRTC, phoneme recognizer based VAD (Phn_Rec), and Pyannote baselines (respectively 7.1, 1.7, and 2.7 (DetER) metrics. Similarly, in the LiveATC dataset, this model outperforms the WebRTC, Phn_Rec, and Pyannote baselines (respectively 6.4, 10.0, and 3.7 absolutely) in DetER metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2022

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Multilingual automatic speech recognition (ASR) systems mostly benefit l...
research
06/02/2021

Attention-based Contextual Language Model Adaptation for Speech Recognition

Language modeling (LM) for automatic speech recognition (ASR) does not u...
research
02/27/2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Multi-lingual speech recognition aims to distinguish linguistic expressi...
research
12/28/2020

Building Multi lingual TTS using Cross Lingual Voice Conversion

In this paper we propose a new cross-lingual Voice Conversion (VC) appro...
research
11/07/2018

Analysis of Multilingual Sequence-to-Sequence speech recognition systems

This paper investigates the applications of various multilingual approac...
research
08/13/2020

MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

Voice activity detection (VAD) makes a distinction between speech and no...
research
10/17/2022

A Treatise On FST Lattice Based MMI Training

Maximum mutual information (MMI) has become one of the two de facto meth...

Please sign up or login with your details

Forgot password? Click here to reset