The CHiME challenges have played a significant role in the development a...
We introduce HK-LegiCoST, a new three-way parallel corpus of
Cantonese-E...
The Streaming Unmixing and Recognition Transducer (SURT) model was propo...
This paper presents a novel algorithm for building an automatic speech
r...
Language development experts need tools that can automatically identify
...
Guided source separation (GSS) is a type of target-speaker extraction me...
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EU...
Self-supervised learning (SSL) methods which learn representations of da...
Code-switching (CS) refers to the phenomenon that languages switch withi...
Adversarial attacks are a threat to automatic speech recognition (ASR)
s...
In this paper, we propose to employ a dual-mode framework on the x-vecto...
Speech data is notoriously difficult to work with due to a variety of co...
Self-supervised model pre-training has recently garnered significant
int...
This paper introduces GigaSpeech, an evolving, multi-domain English spee...
We recently proposed DOVER-Lap, a method for combining overlap-aware spe...
The ubiquitous presence of machine learning systems in our lives necessi...
We introduce asynchronous dynamic decoder, which adopts an efficient A*
...
Large web-crawled corpora represent an excellent resource for improving ...
Low-resource Multilingual Neural Machine Translation (MNMT) is typically...
This paper proposes a parallel computation strategy and a posterior-base...
This paper provides a detailed description of the Hitachi-JHU system tha...
In this paper we address the task of recognizing assembly actions as a
s...
This paper describes a method for overlap-aware speaker diarization. Giv...
Environmental noises and reverberation have a detrimental effect on the
...
Several advances have been made recently towards handling overlapping sp...
As the performance of single-channel speech separation systems has impro...
This paper presents an efficient algorithm for n-gram language model
ada...
This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CH...
We present PyChain, a fully parallelized PyTorch implementation of end-t...
Always-on spoken language interfaces, e.g. personal digital assistants, ...
Speaker diarization is an important pre-processing step for many speech
...
We present Espresso, an open-source, modular, extensible end-to-end neur...
Deep neural network based speaker embeddings, such as x-vectors, have be...
We explore training attention-based encoder-decoder ASR for low-resource...
To date, the bulk of research on single-channel speech separation has be...
In topic identification (topic ID) on real-world unstructured audio, an ...
We describe initial work on an extension of the Kaldi toolkit that suppo...
We describe the system our team used during NIST's LoReHLT (Low Resource...
Developing speech technologies for low-resource languages has become a v...
Speech recognition systems for irregularly-spelled languages like Englis...
The paper summarizes the development of the LVCSR system built as a part...
Modern topic identification (topic ID) systems for speech use automatic
...
Acoustic unit discovery (AUD) is a process of automatically identifying ...
In this paper, we extend the deep long short-term memory (DLSTM) recurre...
We describe the neural-network training framework used in the Kaldi spee...