Xiang Yin

research

∙ 08/29/2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Co-speech gesture generation is crucial for automatic digital avatar ani...

0 Longbin Ji, et al. ∙

research

∙ 07/25/2023

Argument Attribution Explanations in Quantitative Bipolar Argumentation Frameworks (Technical Report)

Argumentative explainable AI has been advocated by several in recent yea...

0 Xiang Yin, et al. ∙

research

∙ 07/14/2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Zero-shot text-to-speech aims at synthesizing voices with unseen speech ...

0 Ziyue Jiang, et al. ∙

research

∙ 06/27/2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims t...

0 Yahuan Cong, et al. ∙

research

∙ 06/14/2023

Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects

Conversational recommender systems (CRSs) have become crucial emerging r...

0 Xinghua Qu, et al. ∙

research

∙ 06/06/2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be...

0 Ziyue Jiang, et al. ∙

research

∙ 06/06/2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

We are interested in a novel task, namely low-resource text-to-talking a...

0 Zhenhui Ye, et al. ∙

research

∙ 06/04/2023

Detector Guidance for Multi-Object Text-to-Image Generation

Diffusion models have demonstrated impressive performance in text-to-ima...

0 Luping Liu, et al. ∙

research

∙ 05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...

0 Jiawei Huang, et al. ∙

research

∙ 05/28/2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Direct speech-to-speech translation (S2ST) has gradually become popular ...

0 Kun Song, et al. ∙

research

∙ 05/24/2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Direct speech-to-speech translation (S2ST) aims to convert speech from o...

0 Rongjie Huang, et al. ∙

research

∙ 05/18/2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Improving text representation has attracted much attention to achieve ex...

0 Zhenhui Ye, et al. ∙

research

∙ 05/01/2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a cru...

8 Zhenhui Ye, et al. ∙

research

∙ 03/02/2023

LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion

As a key component of automated speech recognition (ASR) and the front-e...

0 Chunfeng Wang, et al. ∙

research

∙ 01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...

1 Rongjie Huang, et al. ∙

research

∙ 12/12/2022

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Speech-to-speech translation directly translates a speech utterance to a...

0 Junhui Zhang, et al. ∙

research

∙ 11/21/2022

Explaining Random Forests using Bipolar Argumentation and Markov Networks (Technical Report)

Random forests are decision tree ensembles that can be used to solve a v...

0 Nico Potyka, et al. ∙

research

∙ 11/08/2022

Abstraction-Based Verification of Approximate Pre-Opacity for Control Systems

In this paper, we consider the problem of verifying pre-opacity for disc...

0 Junyao Hou, et al. ∙

research

∙ 08/15/2022

Unsupervised Video Domain Adaptation: A Disentanglement Perspective

Unsupervised video domain adaptation is a practical yet challenging task...

10 Pengfei Wei, et al. ∙

research

∙ 06/10/2022

A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation

Chinese dialect text-to-speech(TTS) system usually can only be utilized ...

0 Wudi Bao, et al. ∙

research

∙ 05/19/2022

Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data

There is broad agreement in the literature that explanation methods shou...

0 Nico Potyka, et al. ∙

research

∙ 04/01/2022

To Explore or Not to Explore: Regret-Based LTL Planning in Partially-Known Environments

In this paper, we investigate the optimal robot path planning problem fo...

0 Jianing Zhao, et al. ∙

research

∙ 02/14/2022

Secure-by-Construction Synthesis of Cyber-Physical Systems

Correct-by-construction synthesis is a cornerstone of the confluence of ...

0 Siyuan Liu, et al. ∙

research

∙ 11/09/2021

Data privacy protection in microscopic image analysis for material data mining

Recent progress in material data mining has been driven by high-capacity...

5 Boyuan Ma, et al. ∙

research

∙ 10/14/2021

Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation

Clothes style transfer for person video generation is a challenging task...

0 Jingning Xu, et al. ∙

research

∙ 10/10/2021

Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Recently, phonetic posteriorgrams (PPGs) based methods have been quite p...

1 Chao Wang, et al. ∙

research

∙ 10/08/2021

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

In expressive speech synthesis, there are high requirements for emotion ...

0 Pengfei Wu, et al. ∙

research

∙ 10/28/2020

PPG-based singing voice conversion with adversarial representation learning

Singing voice conversion (SVC) aims to convert the voice of one singer t...

0 Zhonghao Li, et al. ∙

research

∙ 10/17/2020

Gradient Aware Cascade Network for Multi-Focus Image Fusion

The general aim of multi-focus image fusion is to gather focused regions...

0 Boyuan Ma, et al. ∙

research

∙ 07/12/2020

Xiaomingbot: A Multilingual Robot News Reporter

This paper proposes the building of Xiaomingbot, an intelligent, multili...

0 Runxin Xu, et al. ∙

research

∙ 05/19/2020

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Accent conversion (AC) transforms a non-native speaker's accent into a n...

0 Wenjie Li, et al. ∙

research

∙ 04/23/2020

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) sy...

0 Yu Gu, et al. ∙

research

∙ 11/11/2019

A hybrid text normalization system using multi-head self-attention for mandarin

In this paper, we propose a hybrid text normalization system using multi...

0 Junhui Zhang, et al. ∙

research

∙ 11/11/2019

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

In Mandarin text-to-speech (TTS) system, the front-end text processing m...

0 Junjie Pan, et al. ∙

research

∙ 02/09/2018

Opacity of nondeterministic transition systems: A (bi)simulation relation approach

In this paper, we propose several opacity-preserving (bi)simulation rela...

0 Kuize Zhang, et al. ∙

research

∙ 02/06/2018

Deciding Detectability for Labeled Petri Nets

Detectability of discrete event systems (DESs) is a property to determin...

0 Tomas Masopust, et al. ∙

Xiang Yin

Featured Co-authors

Sign in with Google

Consider DeepAI Pro