Many real-life applications of automatic speech recognition (ASR) requir...
We study a streamable attention-based encoder-decoder model in which eit...
Automatic speech recognition (ASR) systems typically use handcrafted fea...
Multi-speaker automatic speech recognition (ASR) is crucial for many
rea...
Building competitive hybrid hidden Markov model (HMM) systems for automa...
Modern public ASR tools usually provide rich support for training variou...
In the last decade of automatic speech recognition (ASR) research, the
i...
Neural speaker embeddings encode the speaker's speech characteristics th...
Recently, RNN-Transducers have achieved remarkable results on various
au...
Automatic speech recognition (ASR) has been established as a well-perfor...
We introduce a novel segmental-attention model for automatic speech
reco...
Language barriers present a great challenge in our increasingly connecte...
In this work, we compare from-scratch sequence-level cross-entropy (full...
Speaker adaptation is important to build robust automatic speech recogni...
As one of the most popular sequence-to-sequence modeling approaches for
...
In this work, we show that a factored hybrid hidden Markov model (FH-HMM...
One of the key communicative competencies is the ability to maintain flu...
To mitigate the problem of having to traverse over the full vocabulary i...
The recently proposed conformer architecture has been successfully used ...
To improve the performance of state-of-the-art automatic speech recognit...
Sequence discriminative training is a great tool to improve the performa...
The mismatch between an external language model (LM) and the implicitly
...
The peaky behavior of CTC models is well known experimentally. However, ...
As the vocabulary size of modern word-based language models becomes ever...
Subword units are commonly used for end-to-end automatic speech recognit...
In recent years, automated approaches to assessing linguistic complexity...
With the advent of direct models in automatic speech recognition (ASR), ...
Attention-based encoder-decoder (AED) models learn an implicit internal
...
Recent publications on automatic-speech-recognition (ASR) have a strong ...
Acoustic modeling of raw waveform and learning feature extractors as par...
We present our transducer model on Librispeech. We study variants to inc...
High-performance hybrid automatic speech recognition (ASR) systems are o...
End-to-end models reach state-of-the-art performance for speech recognit...
A cascaded speech translation model relies on discrete and non-different...
To join the advantages of classical and end-to-end approaches for speech...
To encourage intra-class compactness and inter-class separability among
...
Sequence-to-sequence models with an implicit alignment mechanism (e.g.
a...
Common end-to-end models like CTC or encoder-decoder-attention models us...
The RNN transducer is a promising end-to-end model candidate. We compare...
Phoneme-based acoustic modeling of large vocabulary automatic speech
rec...
In hybrid HMM based speech recognition, LSTM language models have been w...
We present a complete training pipeline to build a state-of-the-art hybr...
Recent advances in text-to-speech (TTS) led to the development of flexib...
Attention-based sequence-to-sequence models have shown promising results...
This work investigates a simple data augmentation technique, SpecAugment...
LSTM based language models are an important part of modern LVCSR systems...
Sequence discriminative training criteria have long been a standard tool...
This paper addresses the robust speech recognition problem as an adaptat...
We explore multi-layer autoregressive Transformer models in language mod...
Significant performance degradation of automatic speech recognition (ASR...