Large pre-trained speech models are widely used as the de-facto paradigm...
In this work, we introduce a “score-based assessment” framework for
esti...
We introduce the Universal Speech Model (USM), a single large model that...
We introduce Noise2Music, where a series of diffusion models is trained ...
In this work, we propose a new parameter-efficient learning framework ba...
We propose a quantum kernel learning (QKL) framework to address the inhe...
Adapting a neural text-to-speech (TTS) model to a target speaker typical...
Training state-of-the-art Automated Speech Recognition (ASR) models typi...
Neural vocoder using denoising diffusion probabilistic model (DDPM) has ...
Non-autoregressive (NAR) models simultaneously generate multiple outputs...
This paper introduces WaveGrad 2, a non-autoregressive generative model ...
This paper introduces a novel method to diagnose the source-target atten...
This paper introduces WaveGrad, a conditional model for waveform generat...
Albeit recent progress in speaker verification generates powerful models...
We present Mask CTC, a novel non-autoregressive end-to-end automatic spe...
In this paper we demonstrate methods for reliable and efficient training...
In this work, we explore the dependencies between speaker recognition an...
Spoken language identification (LID) technologies have improved in recen...
Recently very deep transformers start showing outperformed performance t...
Speaker Verification still suffers from the challenge of generalization ...
Sequence-to-sequence models have been widely used in end-to-end speech
p...
We present JHU's system submission to the ASVspoof 2019 Challenge:
Anti-...
This paper introduces a new open source platform for end-to-end speech
p...