3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

04/07/2022
by   Zhao You, et al.
0

Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR. In this paper, based on our prior work, we identify and integrate several approaches to achieve further improvements for ASR tasks, which we denote as multi-loss, multi-path and multi-level, summarized as "3M" model. Specifically, multi-loss refers to the joint CTC/AED loss and multi-path denotes the Mixture-of-Experts(MoE) architecture which can effectively increase the model capacity without remarkably increasing computation cost. Multi-level means that we introduce auxiliary loss at multiple level of a deep model to help training. We evaluate our proposed method on the public WenetSpeech dataset and experimental results show that the proposed method provides 12.2 toolkit. On our large scale dataset of 150k hours corpus, the 3M model has also shown obvious superiority over the baseline Conformer model. Code is publicly available at https://github.com/tencent-ailab/3m-asr.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2019

Multi-Level Mesa

Multi-level Mesa is an extension to support the Python based Agents Base...
research
08/08/2018

End-to-end Speech Recognition with Word-based RNN Language Models

This paper investigates the impact of word-based RNN language models (RN...
research
10/14/2021

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Automatic Speech recognition (ASR) is a complex and challenging task. In...
research
02/24/2023

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Multilingual Automatic Speech Recognition (ASR) models have extended the...
research
03/08/2018

From Social to Individuals: a Parsimonious Path of Multi-level Models for Crowdsourced Preference Aggregation

In crowdsourced preference aggregation, it is often assumed that all the...
research
05/29/2023

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

State-of-the-art ASR systems have achieved promising results by modeling...
research
11/23/2021

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Mixture-of-experts based acoustic models with dynamic routing mechanisms...

Please sign up or login with your details

Forgot password? Click here to reset