In recent years, Large Language Models (LLMs) have garnered significant
...
Large language models have proven themselves highly flexible, able to so...
Recently reported state-of-the-art results in visual speech recognition ...
Recognizing a word shortly after it is spoken is an important requiremen...
The two most popular loss functions for streaming end-to-end automatic s...
With 4.5 million hours of English speech from 10 different sources acros...
Measuring automatic speech recognition (ASR) system quality is critical ...
Transfer learning is critical for efficient information transfer across
...
Often, the storage and computational constraints of embeddeddevices dema...
As speech-enabled devices such as smartphones and smart speakers become
...
How to leverage dynamic contextual information in end-to-end speech
reco...
We propose a dynamic encoder transducer (DET) for on-device speech
recog...
Word Error Rate (WER) has been the predominant metric used to evaluate t...
Pseudo-labeling is the most adopted method for pre-training automatic sp...
Recurrent transducer models have emerged as a promising solution for spe...
End-to-end models in general, and Recurrent Neural Network Transducer (R...
There is a growing interest in the speech community in developing Recurr...
Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech...
Transformers, originally proposed for natural language processing (NLP)
...
Many semi- and weakly-supervised approaches have been investigated for
o...
We introduce a new collection of spoken English audio suitable for train...
In this paper, we introduce spatial attention for refining the informati...
Neural transducer-based systems such as RNN Transducers (RNN-T) for auto...
We explore options to use Transformer networks in neural transducer for
...
Grapheme-based acoustic modeling has recently been shown to outperform
p...
We propose and evaluate transformer-based acoustic models (AMs) for hybr...
End-to-end modeling (E2E) of automatic speech recognition (ASR) blends a...
Spoken language understanding system is traditionally designed as a pipe...