Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net

11/04/2022
by   Sefik Emre Eskimez, et al.
0

Personalized speech enhancement (PSE), a process of estimating a clean target speech signal in real time by leveraging a speaker embedding vector of the target talker, has garnered much attention from the research community due to the recent surge of online meetings across the globe. For practical full duplex communication, PSE models require an acoustic echo cancellation (AEC) capability. In this work, we employ a recently proposed causal end-to-end enhancement network (E3Net) and modify it to obtain a joint PSE-AEC model. We dedicate the early layers to the AEC task while encouraging later layers for personalization by adding a bypass connection from the early layers to the mask prediction layer. This allows us to employ a multi-task learning framework for joint PSE and AEC training. We provide extensive evaluation test scenarios with both simulated and real-world recordings. The results show that our joint model comes close to the expert models for each task and performs significantly better for the combined PSE-AEC scenario.

READ FULL TEXT
research
10/18/2021

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, su...
research
10/23/2020

Speech enhancement aided end-to-end multi-task learning for voice activity detection

Robust voice activity detection (VAD) is a challenging task in low signa...
research
02/23/2023

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

In this study, we present an approach to train a single speech enhanceme...
research
04/02/2022

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

This paper investigates how to improve the runtime speed of personalized...
research
11/05/2022

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Personalized speech enhancement (PSE) models achieve promising results c...
research
05/08/2021

Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation

In realistic speech enhancement settings for end-user devices, we often ...

Please sign up or login with your details

Forgot password? Click here to reset