Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

10/15/2020
by   Ludwig Kürzinger, et al.
0

Many end-to-end Automatic Speech Recognition (ASR) systems still rely on pre-processed frequency-domain features that are handcrafted to emulate the human hearing. Our work is motivated by recent advances in integrated learnable feature extraction. For this, we propose Lightweight Sinc-Convolutions (LSC) that integrate Sinc-convolutions with depthwise convolutions as a low-parameter machine-learnable feature extraction for end-to-end ASR systems. We integrated LSC into the hybrid CTC/attention architecture for evaluation. The resulting end-to-end model shows smooth convergence behaviour that is further improved by applying SpecAugment in time-domain. We also discuss filter-level improvements, such as using log-compression as activation function. Our model achieves a word error rate of 10.7 dataset, surpassing the corresponding architecture with log-mel filterbank features by an absolute 1.9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Exploration of End-to-End ASR for OpenSTT – Russian Open Speech-to-Text Dataset

This paper presents an exploration of end-to-end automatic speech recogn...
research
11/05/2021

Conformer-based Hybrid ASR System for Switchboard Dataset

The recently proposed conformer architecture has been successfully used ...
research
03/03/2023

End-to-End Speech Recognition: A Survey

In the last decade of automatic speech recognition (ASR) research, the i...
research
03/22/2023

Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network

In recent years, End-to-End speech recognition technology based on deep ...
research
01/27/2020

Scaling Up Online Speech Recognition Using ConvNets

We design an online end-to-end speech recognition system based on Time-D...
research
09/10/2021

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

When a sufficiently large far-field training data is presented, jointly ...
research
10/26/2020

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

We propose a novel decentralized feature extraction approach in federate...

Please sign up or login with your details

Forgot password? Click here to reset