CarneliNet: Neural Mixture Model for Automatic Speech Recognition

07/22/2021
by   Aleksei Kalinov, et al.
0

End-to-end automatic speech recognition systems have achieved great accuracy by using deeper and deeper models. However, the increased depth comes with a larger receptive field that can negatively impact model performance in streaming scenarios. We propose an alternative approach that we call Neural Mixture Model. The basic idea is to introduce a parallel mixture of shallow networks instead of a very deep network. To validate this idea we design CarneliNet – a CTC-based neural network composed of three mega-blocks. Each mega-block consists of multiple parallel shallow sub-networks based on 1D depthwise-separable convolutions. We evaluate the model on LibriSpeech, MLS and AISHELL-2 datasets and achieved close to state-of-the-art results for CTC-based models. Finally, we demonstrate that one can dynamically reconfigure the number of parallel sub-networks to accommodate the computational requirements without retraining.

READ FULL TEXT
research
09/13/2017

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

Neural models have become ubiquitous in automatic speech recognition sys...
research
11/05/2018

End-to-End Monaural Multi-speaker ASR System without Pretraining

Recently, end-to-end models have become a popular approach as an alterna...
research
01/16/2023

BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition

Recent developments using End-to-End Deep Learning models have been show...
research
04/07/2021

Pushing the Limits of Non-Autoregressive Speech Recognition

We combine recent advancements in end-to-end speech recognition to non-a...
research
04/18/2019

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

We present SpecAugment, a simple data augmentation method for speech rec...
research
02/23/2023

Evaluating Automatic Speech Recognition in an Incremental Setting

The increasing reliability of automatic speech recognition has prolifera...
research
09/23/2020

FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning

Strong presentation skills are valuable and sought-after in workplace an...

Please sign up or login with your details

Forgot password? Click here to reset