What does it take to create the Babel Fish, a tool that can help individ...
Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL...
Transducer and Attention based Encoder-Decoder (AED) are two widely used...
It has been known that direct speech-to-speech translation (S2ST) models...
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...
Direct speech-to-speech translation (S2ST), in which all components can ...
We study speech-to-speech translation (S2ST) that translates speech from...
In a sentence, certain words are critical for its semantic. Among them, ...
The amount of labeled data to train models for speech tasks is limited f...
Connectionist temporal classification (CTC) -based models are attractive...
Connectionist temporal classification (CTC) -based models are attractive...
In this study, we present recent developments of models trained with the...
Non-autoregressive (NAR) models simultaneously generate multiple outputs...
In automatic speech recognition (ASR) rescoring, the hypothesis with the...
The multi-decoder (MD) end-to-end speech translation model has demonstra...
This article describes an efficient end-to-end speech translation (E2E-S...
In this work, we propose novel decoding algorithms to enable streaming
a...
This paper describes the ESPnet-ST group's IWSLT 2021 submission in the
...
While attention-based encoder-decoder (AED) models have been successfull...
A conventional approach to improving the performance of end-to-end speec...
This article describes an efficient training method for online streaming...
This paper describes the recent development of ESPnet
(https://github.co...
In this study, we present recent developments on ESPnet: End-to-End Spee...
For real-world deployment of automatic speech recognition (ASR), the sys...
Fast inference speed is an important goal towards real-world deployment ...
Attention-based sequence-to-sequence (seq2seq) models have achieved prom...
We investigate a monotonic multihead attention (MMA) by extending hard
m...
Monotonic chunkwise attention (MoChA) has been studied for the online
st...
Spoken language understanding, which extracts intents and/or semantic
co...
We present ESPnet-ST, which is designed for the quick development of
spe...
Recently, a few novel streaming attention-based sequence-to-sequence (S2...
In this paper, we propose a simple yet effective framework for multiling...
Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) sys...
Sequence-to-sequence models have been widely used in end-to-end speech
p...
In this paper, we explore several new schemes to train a seq2seq model t...
This work explores better adaptation methods to low-resource languages u...