We introduce O-1, a new self-training objective to reduce training bias ...
In this work, we study the impact of Large-scale Language Models (LLM) o...
This work studies knowledge distillation (KD) and addresses its constrai...
Text-only adaptation of a transducer model remains challenging for end-t...
Attention layers are an integral part of modern end-to-end automatic spe...
Training an end-to-end (E2E) neural network speech-to-intent (S2I) syste...
An essential component of spoken language understanding (SLU) is slot
fi...
Current methods for learning visually grounded language from videos ofte...
It is generally believed that direct sequence-to-sequence (seq2seq) spee...
There has been huge progress in speech recognition over the last several...
Conventional automatic speech recognition (ASR) systems trained from
fra...
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech
r...
The performance of automatic speech recognition systems degrades with
in...
Direct acoustics-to-word (A2W) models in the end-to-end paradigm have
re...
Recent work on end-to-end automatic speech recognition (ASR) has shown t...
One of the most difficult speech recognition tasks is accurate recogniti...
End-to-end (E2E) systems have achieved competitive results compared to
c...
Modern automatic speech recognition (ASR) systems need to be robust unde...
We propose Diverse Embedding Neural Network (DENN), a novel architecture...
Diversity or complementarity of experts in ensemble pattern recognition ...