b'Xulong Zhang'

research

∙ 09/20/2023

Research on the Impact of Executive Shareholding on New Investment in Enterprises Based on Multivariable Linear Regression Model

Based on principal-agent theory and optimal contract theory, companies u...

0 Shanyi Zhou, et al. ∙

research

∙ 09/19/2023

A Hierarchy-based Analysis Approach for Blended Learning: A Case Study with Chinese Students

Blended learning is generally defined as the combination of traditional ...

0 Yu Ye, et al. ∙

research

∙ 09/16/2023

Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Cross-modal retrieval (CMR) has been extensively applied in various doma...

0 Kaiyi Luo, et al. ∙

research

∙ 09/16/2023

AOSR-Net: All-in-One Sandstorm Removal Network

Most existing sandstorm image enhancement methods are based on tradition...

0 Yazhong Si, et al. ∙

research

∙ 09/16/2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

This paper integrates graph-to-sequence into an end-to-end text-to-speec...

0 Jianzong Wang, et al. ∙

research

∙ 09/14/2023

DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks

Generating realistic talking faces is a complex and widely discussed tas...

0 Zipeng Qi, et al. ∙

research

∙ 08/28/2023

Machine Unlearning Methodology base on Stochastic Teacher Network

The rise of the phenomenon of the "right to be forgotten" has prompted r...

0 Xulong Zhang, et al. ∙

research

∙ 08/28/2023

Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Voice conversion is a method that allows for the transformation of speak...

0 Xulong Zhang, et al. ∙

research

∙ 08/28/2023

Symbolic Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music

Music Emotion Recognition involves the automatic identification of emoti...

0 Kexin Zhu, et al. ∙

research

∙ 08/24/2023

Sparks of Large Audio Models: A Survey and Outlook

This survey paper provides a comprehensive overview of the recent advanc...

0 Siddique Latif, et al. ∙

research

∙ 08/21/2023

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

Voice conversion as the style transfer task applied to speech, refers to...

0 Yimin Deng, et al. ∙

research

∙ 06/01/2023

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

There has been significant progress in emotional Text-To-Speech (TTS) sy...

0 Haobin Tang, et al. ∙

research

∙ 04/23/2023

SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

In recent Text-to-Speech (TTS) systems, a neural vocoder often generates...

0 Jianzong Wang, et al. ∙

research

∙ 03/14/2023

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Because of predicting all the target tokens in parallel, the non-autoreg...

0 Xulong Zhang, et al. ∙

research

∙ 03/14/2023

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Recent expressive text to speech (TTS) models focus on synthesizing emot...

0 Haobin Tang, et al. ∙

research

∙ 03/14/2023

Improving Music Genre Classification from multi-modal properties of music and genre correlations Perspective

Music genre classification has been widely studied in past few years for...

0 Ganghui Ru, et al. ∙

research

∙ 10/25/2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant impr...

0 Xulong Zhang, et al. ∙

research

∙ 10/25/2022

Improving Imbalanced Text Classification with Dynamic Curriculum Learning

Recent advances in pre-trained language models have improved the perform...

0 Xulong Zhang, et al. ∙

research

∙ 10/25/2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

Most previous neural text-to-speech (TTS) methods are mainly based on su...

0 Xulong Zhang, et al. ∙

research

∙ 10/25/2022

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse

Metaverse expands the physical world to a new dimension, and the physica...

0 Xulong Zhang, et al. ∙

research

∙ 10/25/2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Recovering the masked speech frames is widely applied in speech represen...

0 Xulong Zhang, et al. ∙

research

∙ 10/25/2022

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data

In this paper, we proposed Adapitch, a multi-speaker TTS method that mak...

0 Xulong Zhang, et al. ∙

research

∙ 10/13/2022

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

Since the beginning of the COVID-19 pandemic, remote conferencing and sc...

0 Aolan Sun, et al. ∙

research

∙ 09/21/2022

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Nonparallel multi-domain voice conversion methods such as the StarGAN-VC...

0 Shijing Si, et al. ∙

research

∙ 08/08/2022

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Non-parallel many-to-many voice conversion remains an interesting but ch...

0 Huaizhen Tang, et al. ∙

research

∙ 06/28/2022

Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation

Time-domain Transformer neural networks have proven their superiority in...

0 Jian Luo, et al. ∙

research

∙ 05/24/2022

SUSing: SU-net for Singing Voice Synthesis

Singing voice synthesis is a generative task that involves multi-dimensi...

0 Xulong Zhang, et al. ∙

research

∙ 05/24/2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

Recently, synthesizing personalized speech by text-to-speech (TTS) appli...

0 Xulong Zhang, et al. ∙

research

∙ 05/24/2022

MetaSID: Singer Identification with Domain Adaptation for Metaverse

Metaverse has stretched the real world into unlimited space. There will ...

0 Xulong Zhang, et al. ∙

research

∙ 05/24/2022

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

Metaverse is an interactive world that combines reality and virtuality, ...

0 Xulong Zhang, et al. ∙

research

∙ 02/22/2022

DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning

Any-to-any voice conversion problem aims to convert voices for source an...

0 Qiqi Wang, et al. ∙

research

∙ 02/22/2022

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech

Multi-speaker text-to-speech (TTS) using a few adaption data is a challe...

0 Botao Zhao, et al. ∙

research

∙ 02/21/2022

AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning

Voice Conversion(VC) refers to changing the timbre of a speech while ret...

0 Huaizhen Tang, et al. ∙

research

∙ 02/20/2021

Singer Identification Using Deep Timbre Feature Learning with KNN-Net

In this paper, we study the issue of automatic singer identification (SI...

0 Xulong Zhang, et al. ∙

research

∙ 04/09/2020

Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data

Models for music artist classification usually were operated in the freq...

0 Xulong Zhang, et al. ∙

research

∙ 04/08/2020

Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation

Singing voice detection is the task to identify the frames which contain...

0 Xulong Zhang, et al. ∙

Xulong Zhang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro