Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

01/07/2021
by   Chiang-Jen Peng, et al.
0

Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI). The proposed ATM system consists of three parts: SE, SI, and attention-Net (AttNet). The SE part is composed of a long-short-term memory (LSTM) model, and a deep neural network (DNN) model is used to develop the SI and AttNet parts. The overall ATM system first extracts the representative features and then enhances the speech signals in LSTM-SE and specifies speaker identity in DNN-SI. The AttNet computes weights based on DNN-SI to prepare better representative features for LSTM-SE. We tested the proposed ATM system on Taiwan Mandarin hearing in noise test sentences. The evaluation results confirmed that the proposed system can effectively enhance speech quality and intelligibility of a given noisy input. Moreover, the accuracy of the SI can also be notably improved by using the proposed ATM system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2020

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

This paper investigates a self-adaptation method for speech enhancement ...
research
05/19/2020

Atss-Net: Target Speaker Separation via Attention-based Neural Network

Recently, Convolutional Neural Network (CNN) and Long short-term memory ...
research
01/13/2021

End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN

Automatic height and age estimation of speakers using acoustic features ...
research
02/20/2023

Personalized speech enhancement combining band-split RNN and speaker attentive module

Target speaker information can be utilized in speech enhancement (SE) mo...
research
04/06/2020

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

Due to the simple design pipeline, end-to-end (E2E) neural models for sp...
research
04/02/2020

Towards Relevance and Sequence Modeling in Language Recognition

The task of automatic language identification (LID) involving multiple d...
research
10/06/2020

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

Speaker verification (SV) has recently attracted considerable research i...

Please sign up or login with your details

Forgot password? Click here to reset