Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech ...
This paper presents a novel task, zero-shot voice conversion based on fa...
Phase information has a significant impact on speech perceptual quality ...
Speech phase prediction, which is a significant research focus in the fi...
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech ...
Modeling multi-party conversations (MPCs) with graph neural networks has...
Zero-shot cross-lingual information extraction(IE) aims at constructing ...
Diffusion models have emerged as the new state-of-the-art family of deep...
Time-domain single-channel speech enhancement (SE) still remains challen...
Addressing the issues of who saying what to whom in multi-party conversa...
This paper presents a novel neural vocoder named APNet which reconstruct...
Lip-to-Speech (Lip2Speech) synthesis, which predicts corresponding speec...
This paper describes the system developed by the USTC-NELSLIP team for
S...
This paper proposes a source-filter-based generative adversarial neural
...
Zero-shot cross-lingual named entity recognition (NER) aims at transferr...
In this work, we present a novel method, named AV2vec, for learning
audi...
This paper presents a novel speech phase prediction model which predicts...
This paper proposes a multilingual speech synthesis method which combine...
Generating natural and informative texts has been a long-standing proble...
Recently, various response generation models for two-party conversations...
This paper describes the system developed by the USTC-NELSLIP team for
S...
Neural network models have achieved state-of-the-art performance on
grap...
Transformer-based models have achieved great success in various NLP, vis...
Personas are useful for dialogue response prediction. However, the perso...
Recently, various neural models for multi-party conversation (MPC) have
...
Persona can function as the prior knowledge for maintaining the consiste...
This paper presents an emotion-regularized conditional variational
autoe...
Task-oriented conversational modeling with unstructured knowledge access...
The task of multi-turn text-to-SQL semantic parsing aims to translate na...
With the development of automatic speech recognition (ASR) and text-to-s...
This paper presents an adversarial learning method for recognition-synth...
This paper presents a reverberation module for source-filter-based neura...
The challenges of building knowledge-grounded retrieval-based chatbots l...
In our previous work, we have proposed a neural vocoder called HiNet whi...
Disentanglement is a problem in which multiple conversations occur in th...
In this paper, we study the problem of employing pre-trained language mo...
The NOESIS II challenge, as the Track 2 of the 8th Dialogue System Techn...
We present our work on Track 4 in the Dialogue System Technology Challen...
This paper proposes an utterance-to-utterance interactive matching netwo...
Automatic speaker verification (ASV) is one of the most natural and
conv...
Neural language representation models such as Bidirectional Encoder
Repr...
This paper proposes a dually interactive matching network (DIM) for
pres...
This paper proposes an end-to-end emotional speech synthesis (ESS) metho...
In this paper, a method for non-parallel sequence-to-sequence (seq2seq) ...
This paper presents a neural vocoder named HiNet which reconstructs spee...
This paper presents a method of using autoregressive neural networks for...
This paper presents a multi-level matching and aggregation network (MLMA...
This paper proposes a new model, called condition-transforming variation...
Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in te...
This paper presents a neural relation extraction method to deal with the...