In this paper, we explored how to boost speech emotion recognition (SER)...
End-to-end models, such as the neural Transducer, have been successful i...
In spite of the excellent strides made by end-to-end (E2E) models in spe...
Self-supervised learning (SSL) proficiency in speech-related tasks has d...
Although diffusion models in text-to-speech have become a popular choice...
In recent years, speech-based self-supervised learning (SSL) has made
si...
Although high-fidelity speech can be obtained for intralingual speech
sy...
Current ASR systems are mainly trained and evaluated at the utterance le...
The excellent generalization ability of self-supervised learning (SSL) f...
Recently, end-to-end (E2E) automatic speech recognition (ASR) models hav...
The utilization of discrete speech tokens, divided into semantic tokens ...
Audio-driven talking face has attracted broad interest from academia and...
Recent years have witnessed a boom in self-supervised learning (SSL) in
...
Self-supervised learning (SSL) has achieved great success in various are...
Although current neural text-to-speech (TTS) models are able to generate...
Traditional automatic speech recognition (ASR) systems usually focus on
...
In this paper, we provide a new perspective on self-supervised speech mo...
Recent years have witnessed great strides in self-supervised learning (S...
The mainstream neural text-to-speech(TTS) pipeline is a cascade system,
...
The high memory consumption and computational costs of Recurrent neural
...
Text-only adaptation of an end-to-end (E2E) model remains a challenging ...
In recent years, end-to-end (E2E) based automatic speech recognition (AS...
Integrating external language models (LMs) into end-to-end (E2E) models
...
The efficacy of external language model (LM) integration with existing
e...
The external language models (LM) integration remains a challenging task...
Recently, Transformer based end-to-end models have achieved great succes...
LSTM language models (LSTM-LMs) have been proven to be powerful and yiel...
Many state-of-the-art results in domains such as NLP and computer vision...
We explore neural language modeling for speech recognition where the con...
State-of-the-art English automatic speech recognition systems typically ...
Recently, bidirectional recurrent network language models (bi-RNNLMs) ha...