Discrete audio representation, aka audio tokenization, has seen renewed
...
This paper presents an overview and evaluation of some of the end-to-end...
We study speech intent classification and slot filling (SICSF) by propos...
The number of end-to-end speech recognition models grows every year. The...
Multilingual Automatic Speech Recognition (ASR) models are capable of
tr...
Contextual spelling correction models are an alternative to shallow fusi...
Conformer-based models have become the most dominant end-to-end architec...
This paper introduces a novel Token-and-Duration Transducer (TDT)
archit...
This paper presents a framework based on Weighted Finite-State Transduce...
We propose an end-to-end ASR system that can be trained on transcribed s...
In this work, we propose a zero-shot voice conversion method using speec...
This paper presents a class of new fast non-trainable entropy-based
conf...
In this paper, we extend previous self-supervised approaches for languag...
This paper proposes a modification to RNN-Transducer (RNN-T) models for
...
Fine-tuning is a popular method for adapting text-to-speech (TTS) models...
We present AmberNet, a compact end-to-end neural network for Spoken Lang...
Automatic speech recognition models are often adapted to improve their
a...
Inverse text normalization (ITN) is an essential post-processing step in...
Despite recent progress in generative adversarial network(GAN)-based
voc...
Speaker diarization systems are challenged by a trade-off between the
te...
Training neural text-to-speech (TTS) models for a new speaker typically
...
In this paper, we propose TitaNet, a novel neural network architecture f...
This paper presents novel Weighted Finite-State Transducer (WFST) topolo...
Text normalization (TN) and inverse text normalization (ITN) are essenti...
End-to-end automatic speech recognition systems have achieved great accu...
Dialogue state tracking is an essential part of goal-oriented dialogue
s...
We propose TalkNet, a non-autoregressive convolutional neural model for
...
Inverse text normalization (ITN) converts spoken-domain automatic speech...
In this paper, we introduce a new toolbox for constructing speech datase...
In the English speech-to-text (STT) machine learning task, acoustic mode...
We present MarbleNet, an end-to-end neural network for Voice Activity
De...
We analyze the training dynamics for deep linear networks using a new me...
In this work, we introduce a simple yet efficient post-processing model ...
NeMo (Neural Modules) is a Python framework-agnostic toolkit for creatin...
We propose NovoGrad, a first-order stochastic gradient method with layer...
In this paper, we report state-of-the-art results on LibriSpeech among
e...
Building an accurate automatic speech recognition (ASR) system requires ...
We present OpenSeq2Seq -- an open-source toolkit for training
sequence-t...
Deep neural networks have enabled progress in a wide variety of applicat...
Batch normalization (BN) has become a de facto standard for training dee...
A common way to speed up training of large convolutional networks is to ...
This paper proposes a novel model for the rating prediction task in
reco...
We present two simple ways of reducing the number of parameters and
acce...
We present SEBOOST, a technique for boosting the performance of existing...