Word-level Speech Recognition with a Dynamic Lexicon

06/10/2019
by   Ronan Collobert, et al.
0

We propose a direct-to-word sequence model with a dynamic lexicon. Our word network constructs word embeddings dynamically from the character level tokens. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. Sub-word units are commonly used in speech recognition yet are generated without the use of acoustic context. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. Furthermore, we empirically validate that the word-level embeddings we learn contain significant acoustic information, making them more suitable for use in speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2019

Learned In Speech Recognition: Contextual Acoustic Word Embeddings

End-to-end acoustic-to-word speech recognition models have recently gain...
research
04/24/2019

Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation

We propose a variation to the commonly used Word Error Rate (WER) metric...
research
04/12/2023

Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data

The speech signal is a consummate example of time-series data. The acous...
research
10/02/2020

Differentiable Weighted Finite-State Transducers

We introduce a framework for automatic differentiation with weighted fin...
research
07/01/2020

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...
research
12/22/2017

Letter-Based Speech Recognition with Gated ConvNets

In this paper we introduce a new speech recognition system, leveraging a...
research
06/08/2017

Optimizing expected word error rate via sampling for speech recognition

State-level minimum Bayes risk (sMBR) training has become the de facto s...

Please sign up or login with your details

Forgot password? Click here to reset