Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

06/01/2020
by   Chander Chandak, et al.
0

This paper presents our modeling and architecture approaches for building a highly accurate low-latency language identification system to support multilingual spoken queries for voice assistants. A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language. Conventionally, LID relies on acoustic only information to detect input language. We propose an approach that learns and combines acoustic level representations with embeddings estimated on ASR hypotheses resulting in up to 50 that uses acoustic only features. Furthermore, to reduce the processing cost and latency, we exploit a streaming architecture to identify the spoken language early when the system reaches a predetermined confidence level, alleviating the need to run multiple ASR systems until the end of input query. The combined acoustic and text LID, coupled with our proposed streaming runtime architecture, results in an average of 1500ms early identification for more than 50 improved results by adopting a semi-supervised learning (SSL) technique using the newly proposed model architecture as a teacher model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2020

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

Multilingual ASR technology simplifies model training and deployment, bu...
research
08/04/2021

Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Running automatic speech recognition (ASR) on edge devices is non-trivia...
research
06/16/2022

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

This work aims to automatically evaluate whether the language developmen...
research
10/13/2015

A language model based approach towards large scale and lightweight language identification systems

Multilingual spoken dialogue systems have gained prominence in the recen...
research
06/05/2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Despite the rapid progress in automatic speech recognition (ASR) researc...
research
11/10/2020

A low latency ASR-free end to end spoken language understanding system

In recent years, developing a speech understanding system that classifie...
research
04/08/2022

Transducer-based language embedding for spoken language identification

The acoustic and linguistic features are important cues for the spoken l...

Please sign up or login with your details

Forgot password? Click here to reset