Computational complexity is critical when deploying deep learning-based
...
Benefiting from the development of deep learning, text-to-speech (TTS)
t...
Echo cancellation and noise reduction are essential for full-duplex
comm...
Automatic speech recognition (ASR) based on transducers is widely used. ...
Various applications of voice synthesis have been developed independentl...
Audio codec models are widely used in audio communication as a crucial
t...
Expressive text-to-speech (TTS) aims to synthesize different speaking st...
Expressive text-to-speech (TTS) can synthesize a new speaking style by
i...
Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a
...
This paper is the system description of the DKU-Tencent System for the
V...
Generating sound effects that humans want is an important topic. However...
Automatic speaker verification has achieved remarkable progress in recen...
Despite the rapid progress in automatic speech recognition (ASR) researc...
Target sound extraction (TSE) aims to extract the sound part of a target...
In automatic speech recognition (ASR) research, discriminative criteria ...
This paper describes our speaker diarization system submitted to the
Mul...
Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
Human beings can perceive a target sound that we are interested in from ...
Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
Conversational bilingual speech encompasses three types of utterances: t...
Recently, the attention mechanism such as squeeze-and-excitation module ...
This paper introduces GigaSpeech, an evolving, multi-domain English spee...
For conversational text-to-speech (TTS) systems, it is vital that the sy...
Multi-source localization is an important and challenging technique for
...
This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-spee...
In this study, we investigate self-supervised representation learning fo...
Target-speaker speech recognition aims to recognize target-speaker speec...
This paper proposes a new paradigm for handling far-field multi-speaker ...
Non-autoregressive (NAR) transformer models have achieved significantly
...
Existing approaches for replay and synthetic speech detection still lack...
Peking Opera has been the most dominant form of Chinese performing art s...
Singing voice conversion is converting the timbre in the source singing ...
Purely neural network (NN) based speech separation and enhancement metho...
This paper presents a method that generates expressive singing voice of
...
We propose an algorithm that is capable of synthesizing high quality tar...
Singing voice conversion is to convert a singer's voice to another one's...
In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...
Self-attention networks (SAN) have been introduced into automatic speech...
In this paper, we present a generic and robust multimodal synthesis syst...
In this work, three lattice-free (LF) discriminative training criteria f...