Deformable TDNN with adaptive receptive fields for speech recognition

04/30/2021
by   Keyu An, et al.
0

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs are limited and fixed, which is not desirable for tasks like speech recognition, where the temporal dynamics of speech are varied and affected by many factors. This paper proposes to use deformable TDNNs for adaptive temporal dynamics modeling in end-to-end speech recognition. Inspired by deformable ConvNets, deformable TDNNs augment the temporal sampling locations with additional offsets and learn the offsets automatically based on the ASR criterion, without additional supervision. Experiments show that deformable TDNNs obtain state-of-the-art results on WSJ benchmarks (1.42%/3.45% WER on WSJ eval92/dev93 respectively), outperforming standard TDNNs significantly. Furthermore, we propose the latency control mechanism for deformable TDNNs, which enables deformable TDNNs to do streaming ASR without accuracy degradation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling

Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
research
11/02/2021

Recent Advances in End-to-End Automatic Speech Recognition

Recently, the speech community is seeing a significant trend of moving f...
research
11/13/2018

An Online Attention-based Model for Speech Recognition

Attention-based end-to-end (E2E) speech recognition models such as Liste...
research
11/20/2019

CAT: CRF-based ASR Toolkit

In this paper, we present a new open source toolkit for automatic speech...
research
03/17/2017

Deformable Convolutional Networks

Convolutional neural networks (CNNs) are inherently limited to model geo...
research
12/17/2014

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using ...
research
05/27/2020

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

In this paper, we present a new open source toolkit for speech recogniti...

Please sign up or login with your details

Forgot password? Click here to reset