Lei Xie

research

∙ 09/02/2023

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech – A Study between English and Mandarin

While the performance of cross-lingual TTS based on monolingual corpora ...

0 Tao Li, et al. ∙

research

∙ 08/05/2023

Surrogate Empowered Sim2Real Transfer of Deep Reinforcement Learning for ORC Superheat Control

The Organic Rankine Cycle (ORC) is widely used in industrial waste heat ...

0 Runze Lin, et al. ∙

research

∙ 07/11/2023

Model-Driven Sensing-Node Selection and Power Allocation for Tracking Maneuvering Targets in Perceptive Mobile Networks

Maneuvering target tracking will be an important service of future wirel...

0 Lei Xie, et al. ∙

research

∙ 07/10/2023

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-sp...

0 Kun Song, et al. ∙

research

∙ 07/06/2023

Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation

Tractography traces the peak directions extracted from fiber orientation...

0 Yuanjing Feng, et al. ∙

research

∙ 06/28/2023

Robo-centric ESDF: A Fast and Accurate Whole-body Collision Evaluation Tool for Any-shape Robotic Planning

For letting mobile robots travel flexibly through complicated environmen...

0 Shuang Geng, et al. ∙

research

∙ 06/21/2023

MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks for 12-Lead ECG Classification

Automatic classification of electrocardiogram (ECG) signals plays a cruc...

0 Renjie Cheng, et al. ∙

research

∙ 05/28/2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Direct speech-to-speech translation (S2ST) has gradually become popular ...

0 Kun Song, et al. ∙

research

∙ 05/23/2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

The recently proposed serialized output training (SOT) simplifies multi-...

0 Yuhao Liang, et al. ∙

research

∙ 05/21/2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

Voice conversion is an increasingly popular technology, and the growing ...

0 Ziqian Ning, et al. ∙

research

∙ 05/21/2023

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

Real-world complex acoustic environments especially the ones with a low ...

0 Shubo Lv, et al. ∙

research

∙ 03/14/2023

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

In ICASSP 2023 speech signal improvement challenge, we developed a dual-...

0 Mingshuai Liu, et al. ∙

research

∙ 01/17/2023

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

It is difficult for an end-to-end (E2E) ASR system to recognize words su...

0 Zhanheng Yang, et al. ∙

research

∙ 12/03/2022

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating...

0 Yi Lei, et al. ∙

research

∙ 11/30/2022

MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

This report describes the NPU-HC speaker verification system submitted t...

0 Yue Li, et al. ∙

research

∙ 11/24/2022

TESSP: Text-Enhanced Self-Supervised Speech Pre-training

Self-supervised speech pre-training empowers the model with the contextu...

0 Zhuoyuan Yao, et al. ∙

research

∙ 11/19/2022

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling

This paper aims to synthesize target speaker's speech with desired speak...

0 Xinfa Zhu, et al. ∙

research

∙ 11/06/2022

Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Speech data on the Internet are proliferating exponentially because of t...

0 Jixun Yao, et al. ∙

research

∙ 11/06/2022

Preserving background sound in noise-robust voice conversion via multi-task learning

Background sound is an informative form of art that is helpful in provid...

0 Jixun Yao, et al. ∙

research

∙ 11/05/2022

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

End-to-end singing voice synthesis (SVS) model VISinger can achieve bett...

0 Yongmao Zhang, et al. ∙

research

∙ 11/03/2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cock...

0 Ao Zhang, et al. ∙

research

∙ 11/02/2022

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Recent development of neural vocoders based on the generative adversaria...

0 Kun Song, et al. ∙

research

∙ 10/31/2022

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

In current two-stage neural text-to-speech (TTS) paradigm, it is ideal t...

0 Kun Song, et al. ∙

research

∙ 10/30/2022

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Keyword spotting (KWS) enables speech-based user interaction and gradual...

0 Jie Wang, et al. ∙

research

∙ 10/26/2022

TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

This paper describes the TSUP team's submission to the ISCSLP 2022 conve...

0 Bowen Pang, et al. ∙

research

∙ 10/17/2022

spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

Recently, multi-channel speech enhancement has drawn much interest due t...

0 Shubo Lv, et al. ∙

research

∙ 10/11/2022

MFCCA:Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

Recently cross-channel attention, which better leverages multi-channel s...

0 Fan Yu, et al. ∙

research

∙ 09/24/2022

NWPU-ASLP System for the VoicePrivacy 2022 Challenge

This paper presents the NWPU-ASLP speaker anonymization system for Voice...

0 Jixun Yao, et al. ∙

research

∙ 09/14/2022

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS

Recent advancements in neural end-to-end TTS models have shown high-qual...

0 Liumeng Xue, et al. ∙

research

∙ 08/02/2022

OLLIE: Derivation-based Tensor Program Optimizer

Boosting the runtime performance of deep neural networks (DNNs) is criti...

0 Liyan Zheng, et al. ∙

research

∙ 07/05/2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

The zero-shot scenario for speech generation aims at synthesizing a nove...

0 Yi Lei, et al. ∙

research

∙ 07/05/2022

Backend Ensemble for Speaker Verification and Spoofing Countermeasure

This paper describes the NPU system submitted to Spoofing Aware Speaker ...

0 Li Zhang, et al. ∙

research

∙ 07/04/2022

CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer

Customized keyword spotting (KWS) has great potential to be deployed on ...

0 Zhanheng Yang, et al. ∙

research

∙ 07/04/2022

Minimizing Sequential Confusion Error in Speech Command Recognition

Speech command recognition (SCR) has been commonly used on resource cons...

0 Zhanheng Yang, et al. ∙

research

∙ 07/03/2022

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Leveraging context information is an intuitive idea to improve performan...

0 Kun Wei, et al. ∙

research

∙ 07/02/2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Building a voice conversion system for noisy target speakers, such as us...

0 Liumeng Xue, et al. ∙

research

∙ 06/15/2022

End-to-End Voice Conversion with Information Perturbation

The ideal goal of voice conversion is to convert the source speaker's sp...

0 Qicong Xie, et al. ∙

research

∙ 06/01/2022

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pr...

0 Kun Song, et al. ∙

research

∙ 05/31/2022

Collaborative Sensing in Perceptive Mobile Networks: Opportunities and Challenges

With the development of innovative applications that demand accurate env...

0 Lei Xie, et al. ∙

research

∙ 05/30/2022

Personalized Acoustic Echo Cancellation for Full-duplex Communications

Deep neural networks (DNNs) have shown promising results for acoustic ec...

0 Shimin Zhang, et al. ∙

research

∙ 05/23/2022

Networked Sensing with AI-Empowered Environment Estimation: Exploiting Macro-Diversity and Array Gain in Perceptive Mobile Networks

Sensing will be an important service for future wireless networks to ass...

0 Lei Xie, et al. ∙

research

∙ 04/07/2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

General accent recognition (AR) models tend to directly extract low-leve...

0 Qijie Shao, et al. ∙

research

∙ 03/31/2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

In this paper, we conduct a comparative study on speaker-attributed auto...

0 Fan Yu, et al. ∙

research

∙ 03/30/2022

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher

Building a high-quality singing corpus for a person who is not good at s...

0 Heyang Xue, et al. ∙

research

∙ 03/29/2022

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

Recently, we made available WeNet, a production-oriented end-to-end spee...

0 BinBin Zhang, et al. ∙

research

∙ 03/10/2022

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

DeepFake based digital facial forgery is threatening the public media se...

8 Ganglai Wang, et al. ∙

research

∙ 03/08/2022

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

Talking face generation with great practical significance has attracted ...

1 Ganglai Wang, et al. ∙

research

∙ 03/05/2022

Audio-visual speech separation based on joint feature representation with cross-modal attention

Multi-modal based speech separation has exhibited a specific advantage o...

0 Junwen Xiong, et al. ∙

research

∙ 03/04/2022

Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Active speaker detection and speech enhancement have become two increasi...

55 Junwen Xiong, et al. ∙

research

∙ 02/16/2022

Conversational Speech Recognition By Learning Conversation-level Characteristics

Conversational automatic speech recognition (ASR) is a task to recognize...

0 Kun Wei, et al. ∙

Lei Xie

Featured Co-authors

Sign in with Google

Consider DeepAI Pro