A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

11/04/2022
by   Jian Xue, et al.
0

In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language. The backbone of SM2 is Transformer Transducer, which has high streaming capability. Instead of human labeled speech translation (ST) data, SM2 models are trained using weakly supervised data generated by converting the transcriptions in speech recognition corpora with a machine translation service. With 351 thousand hours of anonymized speech training data from 25 languages, SM2 models achieve comparable or even better ST quality than some recent popular large-scale non-streaming speech models. More importantly, we show that SM2 has the truly zero-shot capability when expanding to new target languages, yielding high quality ST results for source-speech, target-text pairs that are not seen during training.

READ FULL TEXT
research
05/24/2022

T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation

We present a new approach to perform zero-shot cross-modal transfer betw...
research
06/05/2023

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Whisper, the recently developed multilingual weakly supervised model, is...
research
07/27/2023

Turning Whisper into Real-Time Transcription System

Whisper is one of the recent state-of-the-art multilingual speech recogn...
research
11/05/2018

Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation

End-to-end Speech Translation (ST) models have many potential advantages...
research
06/29/2023

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic ...
research
09/01/2021

Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development

This paper introduces a human-in-the-loop (HITL) data annotation pipelin...
research
12/06/2022

Robust Speech Recognition via Large-Scale Weak Supervision

We study the capabilities of speech processing systems trained simply to...

Please sign up or login with your details

Forgot password? Click here to reset