Transformer-based speech self-supervised learning (SSL) models, such as
...
Many recent loss functions in deep metric learning are expressed with
lo...
Large-scale speech self-supervised learning (SSL) has emerged to the mai...
Recent advances in sophisticated synthetic speech generated from
text-to...
Acoustic word embeddings (AWEs) are discriminative representations of sp...
The recently proposed self-attentive pooling (SAP) has shown good perfor...
Several fast text-to-speech (TTS) models have been proposed for real-tim...
Speaker verification (SV) has recently attracted considerable research
i...
While deep learning has made impressive progress in speech synthesis and...
Several papers have proposed deep-learning-based models to predict the m...
In this paper, we explore the possibility of speech synthesis from low
q...
In this paper, we explore prosody transfer for audiobook generation unde...
Keyword spotting (KWS) and speaker verification (SV) have been studied
i...
Currently, the most widely used approach for speaker verification is the...
Currently, the most widely used approach for speaker verification is the...
In realistic settings, a speaker recognition system needs to identify a
...
We propose a novel transductive inference framework for metric-based
met...
Acoustic word embeddings — fixed-dimensional vector representations of
a...
Voice activity detection (VAD), which classifies frames as speech or
non...
In this paper, we propose a new pooling method called spatial pyramid
en...
Previous researches on acoustic word embeddings used in query-by-example...