Chao Weng

research

∙ 09/14/2023

Complexity Scaling for Speech Denoising

Computational complexity is critical when deploying deep learning-based ...

0 Hangting Chen, et al. ∙

research

∙ 08/28/2023

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Benefiting from the development of deep learning, text-to-speech (TTS) t...

0 Qiushi Zhu, et al. ∙

research

∙ 08/21/2023

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

Echo cancellation and noise reduction are essential for full-duplex comm...

0 Hangting Chen, et al. ∙

research

∙ 08/19/2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Automatic speech recognition (ASR) based on transducers is widely used. ...

0 Jinchuan Tian, et al. ∙

research

∙ 05/30/2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Various applications of voice synthesis have been developed independentl...

0 Rongjie Huang, et al. ∙

research

∙ 05/04/2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Audio codec models are widely used in audio communication as a crucial t...

0 Dongchao Yang, et al. ∙

research

∙ 01/31/2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Expressive text-to-speech (TTS) aims to synthesize different speaking st...

0 Dongchao Yang, et al. ∙

research

∙ 11/04/2022

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Expressive text-to-speech (TTS) can synthesize a new speaking style by i...

0 Dongchao Yang, et al. ∙

research

∙ 10/14/2022

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a ...

0 Jinchuan Tian, et al. ∙

research

∙ 10/11/2022

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022

This paper is the system description of the DKU-Tencent System for the V...

0 Xiaoyi Qin, et al. ∙

research

∙ 07/20/2022

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Generating sound effects that humans want is an important topic. However...

0 Dongchao Yang, et al. ∙

research

∙ 07/13/2022

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Automatic speaker verification has achieved remarkable progress in recen...

0 Xiaoyi Qin, et al. ∙

research

∙ 06/05/2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Despite the rapid progress in automatic speech recognition (ASR) researc...

0 Jinchuan Tian, et al. ∙

research

∙ 04/02/2022

Improving Target Sound Extraction with Timestamp Information

Target sound extraction (TSE) aims to extract the sound part of a target...

0 Helin Wang, et al. ∙

research

∙ 03/29/2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition

In automatic speech recognition (ASR) research, discriminative criteria ...

0 Jinchuan Tian, et al. ∙

research

∙ 02/04/2022

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our speaker diarization system submitted to the Mul...

0 Naijun Zheng, et al. ∙

research

∙ 01/06/2022

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Despite the rapid progress of end-to-end (E2E) automatic speech recognit...

0 Jinchuan Tian, et al. ∙

research

∙ 12/19/2021

Detect what you want: Target Sound Detection

Human beings can perceive a target sound that we are interested in from ...

0 Dongchao Yang, et al. ∙

research

∙ 12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...

0 Jinchuan Tian, et al. ∙

research

∙ 11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...

0 Brian Yan, et al. ∙

research

∙ 10/13/2021

Simple Attention Module based Speaker Verification with Iterative noisy label detection

Recently, the attention mechanism such as squeeze-and-excitation module ...

0 Xiaoyi Qin, et al. ∙

research

∙ 06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...

0 Guoguo Chen, et al. ∙

research

∙ 06/11/2021

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis

For conversational text-to-speech (TTS) systems, it is vital that the sy...

0 Jingbei Li, et al. ∙

research

∙ 02/16/2021

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Multi-source localization is an important and challenging technique for ...

0 Aswin Shanmugam Subramanian, et al. ∙

research

∙ 02/12/2021

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-spee...

4 Peng Liu, et al. ∙

research

∙ 12/13/2020

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

In this study, we investigate self-supervised representation learning fo...

0 Wei Xia, et al. ∙

research

∙ 11/26/2020

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

Target-speaker speech recognition aims to recognize target-speaker speec...

0 Jiatong Shi, et al. ∙

research

∙ 10/30/2020

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

This paper proposes a new paradigm for handling far-field multi-speaker ...

0 Aswin Shanmugam Subramanian, et al. ∙

research

∙ 10/28/2020

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

Non-autoregressive (NAR) transformer models have achieved significantly ...

0 Xingchen Song, et al. ∙

research

∙ 10/28/2020

Replay and Synthetic Speech Detection with Res2net Architecture

Existing approaches for replay and synthetic speech detection still lack...

0 Xu Li, et al. ∙

research

∙ 08/07/2020

Peking Opera Synthesis via Duration Informed Attention Network

Peking Opera has been the most dominant form of Chinese performing art s...

0 Yusong Wu, et al. ∙

research

∙ 08/07/2020

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing ...

0 Liqiang Zhang, et al. ∙

research

∙ 05/08/2020

Neural Spatio-Temporal Beamformer for Target Speech Separation

Purely neural network (NN) based speech separation and enhancement metho...

0 Yong Xu, et al. ∙

research

∙ 12/27/2019

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

This paper presents a method that generates expressive singing voice of ...

0 Yusong Wu, et al. ∙

research

∙ 12/20/2019

Learning Singing From Speech

We propose an algorithm that is capable of synthesizing high quality tar...

0 Liqiang Zhang, et al. ∙

research

∙ 12/04/2019

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

Singing voice conversion is to convert a singer's voice to another one's...

0 Chengqi Deng, et al. ∙

research

∙ 11/28/2019

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...

0 Chao Weng, et al. ∙

research

∙ 10/28/2019

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Self-attention networks (SAN) have been introduced into automatic speech...

0 Zhao You, et al. ∙

research

∙ 09/04/2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

In this paper, we present a generic and robust multimodal synthesis syst...

0 Chengzhu Yu, et al. ∙

research

∙ 11/08/2018

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models

In this work, three lattice-free (LF) discriminative training criteria f...

0 Chao Weng, et al. ∙

Chao Weng

Featured Co-authors

Sign in with Google

Consider DeepAI Pro