Helen Meng

research

∙ 09/21/2023

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speake...

0 Shun Lei, et al. ∙

research

∙ 09/04/2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Mapping two modalities, speech and text, into a shared representation sp...

0 Jiaxu Zhu, et al. ∙

research

∙ 09/04/2023

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Recently, excellent progress has been made in speech recognition. Howeve...

0 Jiaxu Zhu, et al. ∙

research

∙ 09/01/2023

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

The single-speaker singing voice synthesis (SVS) usually underperforms a...

0 Shaohuan Zhou, et al. ∙

research

∙ 08/31/2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

This paper presents an end-to-end high-quality singing voice synthesis (...

0 Shaohuan Zhou, et al. ∙

research

∙ 08/31/2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

The spontaneous behavior that often occurs in conversations makes speech...

0 Weiqin Li, et al. ∙

research

∙ 08/31/2023

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...

0 Jie Chen, et al. ∙

research

∙ 08/29/2023

Rethinking Machine Ethics – Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

Making moral judgments is an essential step toward developing ethical AI...

0 Jingyan Zhou, et al. ∙

research

∙ 07/29/2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Expressive speech synthesis is crucial for many human-computer interacti...

0 Shun Lei, et al. ∙

research

∙ 07/06/2023

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Accurate recognition of cocktail party speech containing overlapping spe...

0 Guinan Li, et al. ∙

research

∙ 06/27/2023

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

Automatic recognition of disordered and elderly speech remains highly ch...

0 Tianzi Wang, et al. ∙

research

∙ 06/25/2023

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Visual information can serve as an effective cue for target speaker extr...

0 Jiuxin Lin, et al. ∙

research

∙ 05/25/2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Multi-talker overlapped speech poses a significant challenge for speech ...

0 Lingwei Meng, et al. ∙

research

∙ 05/22/2023

The defender's perspective on automatic speaker verification: An overview

Automatic speaker verification (ASV) plays a critical role in security-s...

0 Haibin Wu, et al. ∙

research

∙ 05/16/2023

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Nowadays, recognition-synthesis-based methods have been quite popular wi...

0 Xintao Zhao, et al. ∙

research

∙ 05/15/2023

SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting

Building end-to-end task bots and maintaining their integration with new...

0 Xiaoying Zhang, et al. ∙

research

∙ 05/09/2023

Inter-SubNet: Speech Enhancement with Subband Interaction

Subband-based approaches process subbands in parallel through the model ...

0 Jun Chen, et al. ∙

research

∙ 05/09/2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Automatic dubbing, which generates a corresponding version of the input ...

0 Jingbei Li, et al. ∙

research

∙ 04/25/2023

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Music-driven 3D dance generation has become an intensive research topic ...

0 Haolin Zhuang, et al. ∙

research

∙ 04/19/2023

CB-Conformer: Contextual biasing Conformer for biased word recognition

Due to the mismatch between the source and target domains, how to better...

0 Yaoxun Xu, et al. ∙

research

∙ 04/13/2023

Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

Recent advances in text-to-speech have significantly improved the expres...

0 Shun Lei, et al. ∙

research

∙ 04/07/2023

Interpretable Unified Language Checking

Despite recent concerns about undesirable behaviors generated by large l...

0 Tianhua Zhang, et al. ∙

research

∙ 03/14/2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

As a common way of emotion signaling via non-linguistic vocalizations, v...

0 Jinchao Li, et al. ∙

research

∙ 03/14/2023

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

With the global population aging rapidly, Alzheimer's disease (AD) is pa...

0 Jinchao Li, et al. ∙

research

∙ 03/04/2023

Decision Support System for Chronic Diseases Based on Drug-Drug Interactions

Many patients with chronic diseases resort to multiple medications to re...

0 Tian Bian, et al. ∙

research

∙ 02/28/2023

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Automatic recognition of disordered and elderly speech remains a highly ...

0 Shujie Hu, et al. ∙

research

∙ 02/20/2023

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

Although automatic speech recognition (ASR) can perform well in common n...

0 Lingwei Meng, et al. ∙

research

∙ 02/02/2023

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Homophone characters are common in tonal syllable-based languages, such ...

0 Holam Chung, et al. ∙

research

∙ 01/31/2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Expressive text-to-speech (TTS) aims to synthesize different speaking st...

0 Dongchao Yang, et al. ∙

research

∙ 01/29/2023

Learning Analytics from Spoken Discussion Dialogs in Flipped Classroom

The flipped classroom is a new pedagogical strategy that has been gainin...

0 Hang Su, et al. ∙

research

∙ 11/10/2022

Speech Enhancement with Fullband-Subband Cross-Attention Network

FullSubNet has shown its promising performance on speech enhancement by ...

0 Jun Chen, et al. ∙

research

∙ 10/29/2022

Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating p...

0 Yi Wang, et al. ∙

research

∙ 10/25/2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE

We propose an unsupervised learning method to disentangle speech into co...

0 Hui Lu, et al. ∙

research

∙ 10/07/2022

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation

This paper investigates an unsupervised approach towards deriving a univ...

0 Liping Tang, et al. ∙

research

∙ 10/03/2022

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

Audio-visual active speaker detection (AVASD) is well-developed, and now...

0 Xuanjun Chen, et al. ∙

research

∙ 08/28/2022

Bayesian Neural Network Language Modeling for Speech Recognition

State-of-the-art neural network language models (NNLMs) represented by l...

0 Boyang Xue, et al. ∙

research

∙ 08/18/2022

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

One-shot voice conversion (VC) with only a single target speaker's speec...

5 Sicheng Yang, et al. ∙

research

∙ 06/28/2022

Exploring linguistic feature and model combination for speech recognition based automatic AD detection

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating p...

2 Yi Wang, et al. ∙

research

∙ 06/24/2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

A key challenge for automatic speech recognition (ASR) systems is to mod...

0 Jiajun Deng, et al. ∙

research

∙ 06/23/2022

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating p...

2 Tianzi Wang, et al. ∙

research

∙ 06/23/2022

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus

State of the art time automatic speech recognition (ASR) systems are bec...

0 Junhao Xu, et al. ∙

research

∙ 06/23/2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Fundamental modelling differences between hybrid and end-to-end (E2E) au...

0 Mingyu Cui, et al. ∙

research

∙ 06/15/2022

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Articulatory features are inherently invariant to acoustic signal distor...

0 Shujie Hu, et al. ∙

research

∙ 05/04/2022

Cross-lingual Word Embeddings in Hyperbolic Space

Cross-lingual word embeddings can be applied to several natural language...

0 Chandni Saxena, et al. ∙

research

∙ 04/06/2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis focus on modelling the mon...

0 Shun Lei, et al. ∙

research

∙ 04/05/2022

Audio-visual multi-channel speech separation, dereverberation and recognition

Despite the rapid advance of automatic speech recognition (ASR) technolo...

0 Guinan Li, et al. ∙

research

∙ 03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...

0 Xixin Wu, et al. ∙

research

∙ 03/31/2022

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

The accuracy of prosodic structure prediction is crucial to the naturaln...

0 Xueyuan Chen, et al. ∙

research

∙ 03/31/2022

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Although deep learning and end-to-end models have been widely used and s...

0 Jingbei Li, et al. ∙

research

∙ 03/29/2022

Spoofing-Aware Speaker Verification by Multi-Level Fusion

Recently, many novel techniques have been introduced to deal with spoofi...

0 Haibin Wu, et al. ∙

Helen Meng

Featured Co-authors

Sign in with Google

Consider DeepAI Pro