In the last decade of automatic speech recognition (ASR) research, the
i...
This work studies the use of attention masking in transformer transducer...
Graph-based temporal classification (GTC), a generalized form of the
con...
The recurrent neural network transducer (RNN-T) objective plays a major ...
In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (...
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a se...
Video captioning is an essential technology to understand scenes and des...
Attention-based end-to-end automatic speech recognition (ASR) systems ha...
Pseudo-labeling (PL) has been shown to be effective in semi-supervised
a...
This paper addresses end-to-end automatic speech recognition (ASR) for l...
Self-attention has become an important and widely used neural network
co...
This paper describes the recent development of ESPnet
(https://github.co...
The performance of automatic speech recognition (ASR) systems typically
...
Semi-supervised learning has demonstrated promising results in automatic...
In contrast with previous approaches where information flows only toward...
We propose an unsupervised speaker adaptation method inspired by the neu...
Encoder-decoder based sequence-to-sequence models have demonstrated
stat...
Sequence-to-sequence models have been widely used in end-to-end speech
p...
Attention-based methods and Connectionist Temporal Classification (CTC)
...
Sequence-to-sequence ASR models require large quantities of data to atta...
Automatic Speech Recognition (ASR) using multiple microphone arrays has
...
Attention-based methods and Connectionist Temporal Classification (CTC)
...
Attention-based encoder decoder network uses a left-to-right beam search...
This paper investigates the applications of various multilingual approac...
In this paper, we present promising accurate prefix boosting (PAPB), a
d...
Casual conversations involving multiple speakers and noises from surroun...
In this paper, we explore several new schemes to train a seq2seq model t...
This paper presents a method to train end-to-end automatic speech recogn...
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relati...
This paper investigates the impact of word-based RNN language models
(RN...
In this paper we propose a novel data augmentation method for attention-...
Dialog systems need to understand dynamic visual scenes in order to have...
Recently, there has been growing interest in multi-speaker speech
recogn...
This paper introduces a new open source platform for end-to-end speech
p...
End-to-end training of neural networks is a promising approach to automa...
We present a state-of-the-art end-to-end Automatic Speech Recognition (A...
The field of speech recognition is in the midst of a paradigm shift:
end...
Currently successful methods for video description are based on
encoder-...