Light Gated Recurrent Units for Speech Recognition

03/26/2018
by   Mirco Ravanelli, et al.
0

A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on Recurrent Neural Networks (RNNs), that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals. In this paper, we revise one of the most popular RNN models, namely Gated Recurrent Units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is two-fold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with ReLU activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues. Results show that the proposed architecture, called Light GRU (Li-GRU), not only reduces the per-epoch training time by more than 30 but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2017

Improving speech recognition by revising gated recurrent units

Speech recognition is largely taking advantage of deep learning, showing...
research
02/16/2023

Stabilising and accelerating light gated recurrent units for automatic speech recognition

The light gated recurrent units (Li-GRU) is well-known for achieving imp...
research
09/20/2021

iRNN: Integer-only Recurrent Neural Network

Recurrent neural networks (RNN) are used in many real-world text and spe...
research
09/12/2018

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Speech activity detection (SAD) plays an important role in current speec...
research
05/18/2020

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

Despite the significant progress in automatic speech recognition (ASR), ...
research
12/17/2017

Deep Learning for Distant Speech Recognition

Deep learning is an emerging technology that is considered one of the mo...
research
10/31/2022

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

The recently proposed Conformer architecture which combines convolution ...

Please sign up or login with your details

Forgot password? Click here to reset