Longbiao Wang

research

∙ 09/01/2023

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

For fine-grained generation and recognition tasks such as minimally-supe...

0 Chunyu Qiang, et al. ∙

research

∙ 07/28/2023

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

Recently, there has been a growing interest in text-to-speech (TTS) meth...

0 Chunyu Qiang, et al. ∙

research

∙ 06/05/2023

Rethinking the visual cues in audio-visual speaker extraction

The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel vi...

0 Junjie Li, et al. ∙

research

∙ 05/29/2023

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

In recent years, the joint training of speech enhancement front-end and ...

0 Haoyu Lu, et al. ∙

research

∙ 03/26/2023

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

Time-domain speech enhancement (SE) has recently been intensively invest...

0 Hao Shi, et al. ∙

research

∙ 12/07/2022

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

Recently, many deep learning based beamformers have been proposed for mu...

0 Yanjie Fu, et al. ∙

research

∙ 11/03/2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cock...

0 Ao Zhang, et al. ∙

research

∙ 11/02/2022

Monolingual Recognizers Fusion for Code-switching Speech Recognition

The bi-encoder structure has been intensively investigated in code-switc...

0 Tongtong Song, et al. ∙

research

∙ 10/09/2022

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

Speaker extraction seeks to extract the target speech in a multi-talker ...

0 Junjie Li, et al. ∙

research

∙ 07/15/2022

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

Recent neural network based Direction of Arrival (DoA) estimation algori...

0 Haoran Yin, et al. ∙

research

∙ 06/29/2022

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Dual-encoder structure successfully utilizes two language-specific encod...

0 Tongtong Song, et al. ∙

research

∙ 06/24/2022

Iterative Sound Source Localization for Unknown Number of Sources

Sound source localization aims to seek the direction of arrival (DOA) of...

1 Yanjie Fu, et al. ∙

research

∙ 04/27/2022

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Talking head generation is to synthesize a lip-synchronized talking head...

0 Sen Chen, et al. ∙

research

∙ 02/21/2022

L-SpEx: Localized Target Speaker Extraction

Speaker extraction aims to extract the target speaker's voice from a mul...

0 Meng Ge, et al. ∙

research

∙ 10/19/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

The task of talking head generation is to synthesize a lip synchronized ...

0 Sen Chen, et al. ∙

research

∙ 10/09/2021

Using multiple reference audios and style embedding constraints for speech synthesis

The end-to-end speech synthesis model can directly take an utterance as ...

0 Cheng Gong, et al. ∙

research

∙ 08/04/2021

Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis

Expressive neural text-to-speech (TTS) systems incorporate a style encod...

0 Xudong Dai, et al. ∙

research

∙ 11/19/2020

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Speaker extraction uses a pre-recorded reference speech as the reference...

0 Meng Ge, et al. ∙

research

∙ 05/10/2020

SpEx+: A Complete Time Domain Speaker Extraction Network

Speaker extraction aims to extract the target speech signal from a multi...

0 Meng Ge, et al. ∙

research

∙ 05/02/2020

Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning

Spikes are the currency in central nervous systems for information trans...

7 Qiang Yu, et al. ∙

research

∙ 10/23/2019

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

Most existing AU detection works considering AU relationships are relyin...

0 Zhilei Liu, et al. ∙

research

∙ 02/04/2019

Robust Environmental Sound Recognition with Sparse Key-point Encoding and Efficient Multi-spike Learning

The capability for environmental sound recognition (ESR) can determine t...

0 Qiang Yu, et al. ∙

research

∙ 03/21/2018

Speech Emotion Recognition Considering Local Dynamic Features

Recently, increasing attention has been directed to the study of the spe...

0 Haotian Guan, et al. ∙

research

∙ 04/12/2016

Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting

In this paper, we study several microphone channel selection and weighti...

0 Zhaofeng Zhang, et al. ∙

Longbiao Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro