Xie Chen

research

∙ 09/19/2023

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

In this paper, we explored how to boost speech emotion recognition (SER)...

0 Ziyang Ma, et al. ∙

research

∙ 09/18/2023

Improved Factorized Neural Transducer Model For text-only Domain Adaptation

End-to-end models, such as the neural Transducer, have been successful i...

0 Junzhe Liu, et al. ∙

research

∙ 09/14/2023

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

In spite of the excellent strides made by end-to-end (E2E) models in spe...

0 Peng Wang, et al. ∙

research

∙ 09/14/2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Self-supervised learning (SSL) proficiency in speech-related tasks has d...

0 Yifan Yang, et al. ∙

research

∙ 09/10/2023

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Although diffusion models in text-to-speech have become a popular choice...

0 Yiwei Guo, et al. ∙

research

∙ 08/28/2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

In recent years, speech-based self-supervised learning (SSL) has made si...

0 Zhisheng Zheng, et al. ∙

research

∙ 06/25/2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Although high-fidelity speech can be obtained for intralingual speech sy...

0 Sen Liu, et al. ∙

research

∙ 06/23/2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Current ASR systems are mainly trained and evaluated at the utterance le...

0 Mingyu Cui, et al. ∙

research

∙ 06/15/2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

The excellent generalization ability of self-supervised learning (SSL) f...

0 Ziyang Ma, et al. ∙

research

∙ 06/14/2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Recently, end-to-end (E2E) automatic speech recognition (ASR) models hav...

0 Zheng Liang, et al. ∙

research

∙ 06/13/2023

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

The utilization of discrete speech tokens, divided into semantic tokens ...

0 Chenpeng Du, et al. ∙

research

∙ 03/09/2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Audio-driven talking face has attracted broad interest from academia and...

0 Qi Chen, et al. ∙

research

∙ 02/18/2023

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Recent years have witnessed a boom in self-supervised learning (SSL) in ...

0 Xie Chen, et al. ∙

research

∙ 12/20/2022

Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Self-supervised learning (SSL) has achieved great success in various are...

0 Changli Tang, et al. ∙

research

∙ 11/17/2022

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Although current neural text-to-speech (TTS) models are able to generate...

0 Yiwei Guo, et al. ∙

research

∙ 11/17/2022

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

Traditional automatic speech recognition (ASR) systems usually focus on ...

0 Xun Gong, et al. ∙

research

∙ 11/14/2022

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

In this paper, we provide a new perspective on self-supervised speech mo...

0 Ziyang Ma, et al. ∙

research

∙ 10/27/2022

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Recent years have witnessed great strides in self-supervised learning (S...

0 Yujin Wang, et al. ∙

research

∙ 04/02/2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, ...

0 Chenpeng Du, et al. ∙

research

∙ 11/29/2021

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers

The high memory consumption and computational costs of Recurrent neural ...

0 Junhao Xu, et al. ∙

research

∙ 10/06/2021

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Text-only adaptation of an end-to-end (E2E) model remains a challenging ...

0 Zhong Meng, et al. ∙

research

∙ 09/27/2021

Factorized Neural Transducer for Efficient Language Model Adaptation

In recent years, end-to-end (E2E) based automatic speech recognition (AS...

0 Xie Chen, et al. ∙

research

∙ 06/04/2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

Integrating external language models (LMs) into end-to-end (E2E) models ...

0 Zhong Meng, et al. ∙

research

∙ 02/02/2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

The efficacy of external language model (LM) integration with existing e...

0 Zhong Meng, et al. ∙

research

∙ 11/03/2020

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

The external language models (LM) integration remains a challenging task...

0 Zhong Meng, et al. ∙

research

∙ 10/22/2020

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

Recently, Transformer based end-to-end models have achieved great succes...

0 Xie Chen, et al. ∙

research

∙ 10/21/2020

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

LSTM language models (LSTM-LMs) have been proven to be powerful and yiel...

0 Xie Chen, et al. ∙

research

∙ 06/16/2020

Memory-Efficient Pipeline-Parallel DNN Training

Many state-of-the-art results in domains such as NLP and computer vision...

25 Deepak Narayanan, et al. ∙

research

∙ 11/11/2019

Long-span language modeling for speech recognition

We explore neural language modeling for speech recognition where the con...

1 Sarangarajan Parthasarathy, et al. ∙

research

∙ 02/01/2018

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

State-of-the-art English automatic speech recognition systems typically ...

0 Yu Wang, et al. ∙

research

∙ 08/18/2017

Future Word Contexts in Neural Network Language Models

Recently, bidirectional recurrent network language models (bi-RNNLMs) ha...

0 Xie Chen, et al. ∙

Xie Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro