Zhiyong Wu

research

∙ 09/21/2023

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speake...

0 Shun Lei, et al. ∙

research

∙ 09/13/2023

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

The automatic co-speech gesture generation draws much attention in compu...

0 Sicheng Yang, et al. ∙

research

∙ 09/04/2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Mapping two modalities, speech and text, into a shared representation sp...

0 Jiaxu Zhu, et al. ∙

research

∙ 09/04/2023

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Recently, excellent progress has been made in speech recognition. Howeve...

0 Jiaxu Zhu, et al. ∙

research

∙ 09/01/2023

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

The single-speaker singing voice synthesis (SVS) usually underperforms a...

0 Shaohuan Zhou, et al. ∙

research

∙ 08/31/2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

This paper presents an end-to-end high-quality singing voice synthesis (...

0 Shaohuan Zhou, et al. ∙

research

∙ 08/31/2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

The spontaneous behavior that often occurs in conversations makes speech...

0 Weiqin Li, et al. ∙

research

∙ 08/31/2023

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...

0 Jie Chen, et al. ∙

research

∙ 08/31/2023

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Recent advances in neural text-to-speech (TTS) models bring thousands of...

0 Jie Chen, et al. ∙

research

∙ 08/26/2023

The DiffuseStyleGesture+ entry to the GENEA Challenge 2023

In this paper, we introduce the DiffuseStyleGesture+, our solution for t...

0 Sicheng Yang, et al. ∙

research

∙ 08/09/2023

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Current talking face generation methods mainly focus on speech-lip synch...

0 Liyang Chen, et al. ∙

research

∙ 07/29/2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Expressive speech synthesis is crucial for many human-computer interacti...

0 Shun Lei, et al. ∙

research

∙ 06/28/2023

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

The previous SpEx+ has yielded outstanding performance in speaker extrac...

0 Jun Chen, et al. ∙

research

∙ 06/28/2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Previously, Target Speaker Extraction (TSE) has yielded outstanding perf...

0 Jiuxin Lin, et al. ∙

research

∙ 06/25/2023

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Visual information can serve as an effective cue for target speaker extr...

0 Jiuxin Lin, et al. ∙

research

∙ 05/22/2023

Can We Edit Factual Knowledge by In-Context Learning?

Previous studies have shown that large language models (LLMs) like GPTs ...

0 Ce Zheng, et al. ∙

research

∙ 05/18/2023

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

Speech-driven gesture generation is highly challenging due to the random...

0 Sicheng Yang, et al. ∙

research

∙ 05/18/2023

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

In this paper, we present ZeroPrompt (Figure 1-(a)) and the correspondin...

0 Xingchen Song, et al. ∙

research

∙ 05/16/2023

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Nowadays, recognition-synthesis-based methods have been quite popular wi...

0 Xintao Zhao, et al. ∙

research

∙ 05/09/2023

Inter-SubNet: Speech Enhancement with Subband Interaction

Subband-based approaches process subbands in parallel through the model ...

0 Jun Chen, et al. ∙

research

∙ 05/09/2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Automatic dubbing, which generates a corresponding version of the input ...

0 Jingbei Li, et al. ∙

research

∙ 05/08/2023

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

The art of communication beyond speech there are gestures. The automatic...

0 Sicheng Yang, et al. ∙

research

∙ 04/25/2023

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Music-driven 3D dance generation has become an intensive research topic ...

0 Haolin Zhuang, et al. ∙

research

∙ 04/19/2023

CB-Conformer: Contextual biasing Conformer for biased word recognition

Due to the mismatch between the source and target domains, how to better...

0 Yaoxun Xu, et al. ∙

research

∙ 04/13/2023

Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

Recent advances in text-to-speech have significantly improved the expres...

0 Shun Lei, et al. ∙

research

∙ 03/06/2023

OpenICL: An Open-Source Framework for In-context Learning

In recent years, In-context Learning (ICL) has gained increasing attenti...

0 Zhenyu Wu, et al. ∙

research

∙ 02/11/2023

Compositional Exemplars for In-context Learning

Large pretrained language models (LMs) have shown impressive In-Context ...

0 Jiacheng Ye, et al. ∙

research

∙ 12/31/2022

A Survey for In-context Learning

With the increasing ability of large language models (LLMs), in-context ...

0 Qingxiu Dong, et al. ∙

research

∙ 12/20/2022

Self-adaptive In-context Learning

Despite the surprising few-shot performance of in-context learning (ICL)...

0 Zhiyong Wu, et al. ∙

research

∙ 12/19/2022

Explanation Regeneration via Information Bottleneck

Explaining the black-box predictions of NLP models naturally and accurat...

0 Qintong Li, et al. ∙

research

∙ 11/10/2022

Speech Enhancement with Fullband-Subband Cross-Attention Network

FullSubNet has shown its promising performance on speech enhancement by ...

0 Jun Chen, et al. ∙

research

∙ 11/01/2022

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

In this paper, we present TrimTail, a simple but effective emission regu...

0 Xingchen Song, et al. ∙

research

∙ 10/31/2022

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

The recently proposed Conformer architecture which combines convolution ...

0 Xingchen Song, et al. ∙

research

∙ 10/25/2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE

We propose an unsupervised learning method to disentangle speech into co...

0 Hui Lu, et al. ∙

research

∙ 10/22/2022

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

Recently, dataset-generation-based zero-shot learning has shown promisin...

0 Jiacheng Ye, et al. ∙

research

∙ 10/17/2022

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Recently, diffusion models have emerged as a new paradigm for generative...

0 Shansan Gong, et al. ∙

research

∙ 09/29/2022

COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization

Traditional training paradigms for extractive and abstractive summarizat...

0 Chenxin An, et al. ∙

research

∙ 08/25/2022

The ReprGesture entry to the GENEA Challenge 2022

This paper describes the ReprGesture entry to the Generation and Evaluat...

0 Sicheng Yang, et al. ∙

research

∙ 08/18/2022

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

One-shot voice conversion (VC) with only a single target speaker's speec...

5 Sicheng Yang, et al. ∙

research

∙ 07/06/2022

Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives

Ordinal regression with anchored reference samples (ORARS) has been prop...

0 Bin Su, et al. ∙

research

∙ 05/25/2022

ZeroGen^+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

Nowadays, owing to the superior capacity of the large pre-trained langua...

0 Jiahui Gao, et al. ∙

research

∙ 04/06/2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis focus on modelling the mon...

0 Shun Lei, et al. ∙

research

∙ 03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...

0 Xixin Wu, et al. ∙

research

∙ 03/31/2022

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

The accuracy of prosodic structure prediction is crucial to the naturaln...

0 Xueyuan Chen, et al. ∙

research

∙ 03/31/2022

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Although deep learning and end-to-end models have been widely used and s...

0 Jingbei Li, et al. ∙

research

∙ 03/24/2022

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

Non-parallel data voice conversion (VC) have achieved considerable break...

0 Xintao Zhao, et al. ∙

research

∙ 03/23/2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis mainly focus on current se...

0 Shun Lei, et al. ∙

research

∙ 03/23/2022

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

Previously proposed FullSubNet has achieved outstanding performance in D...

0 Jun Chen, et al. ∙

research

∙ 02/16/2022

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

There is a growing interest in dataset generation recently due to the su...

18 Jiacheng Ye, et al. ∙

research

∙ 11/18/2021

Transformer-S2A: Robust and Efficient Speech-to-Animation

We propose a novel robust and efficient Speech-to-Animation (S2A) approa...

0 Liyang Chen, et al. ∙

Zhiyong Wu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro