A Dual-Decoder Conformer for Multilingual Speech Recognition

08/22/2021
by   Krishna D N, et al.
0

Transformer-based models have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. This work proposes a dual-decoder transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a Conformer [1] encoder, two parallel transformer decoders, and a language classifier. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence along with language information. We consider phoneme recognition and language identification as auxiliary tasks in the multi-task learning framework. We jointly optimize the network for phoneme recognition, grapheme recognition, and language identification tasks with Joint CTC-Attention [2] training. Our experiments show that we can obtain a significant reduction in WER over the baseline approaches. We also show that our dual-decoder approach obtains significant improvement over the single decoder approach.

READ FULL TEXT
research
08/22/2021

Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer

Transformers have recently become very popular for sequence-to-sequence ...
research
06/12/2018

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

Sequence-to-sequence attention-based models integrate an acoustic, pronu...
research
11/02/2020

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

We introduce dual-decoder Transformer, a new model architecture that joi...
research
12/03/2020

Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition

One crucial challenge of real-world multilingual speech recognition is t...
research
11/16/2022

Streaming Joint Speech Recognition and Disfluency Detection

Disfluency detection has mainly been solved in a pipeline approach, as p...
research
09/24/2022

A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion

Cyrillic and Traditional Mongolian are the two main members of the Mongo...
research
04/11/2023

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

In recent years, a great deal of attention has been paid to the Transfor...

Please sign up or login with your details

Forgot password? Click here to reset