Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

08/18/2023
by   Ryandhimas E. Zezario, et al.
0

This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scratch and using a direct knowledge transfer mechanism. Second, the benefit of the Huber loss for improving the predictive ability of MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Recently, deep learning (DL)-based non-intrusive speech assessment model...
research
11/04/2021

InQSS: a speech intelligibility assessment model using a multi-task learning network

Speech intelligibility assessment models are essential tools for researc...
research
12/04/2022

Speech MOS multi-task learning and rater bias correction

Perceptual speech quality is an important performance metric for telecon...
research
08/19/2021

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

Non-intrusive speech quality assessment is a crucial operation in multim...
research
08/24/2023

MultiPA: a multi-task speech pronunciation assessment system for a closed and open response scenario

The design of automatic speech pronunciation assessment can be categoriz...
research
01/08/2022

Pseudo-labelling and Meta Reweighting Learning for Image Aesthetic Quality Assessment

In the tasks of image aesthetic quality evaluation, it is difficult to r...
research
09/18/2023

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Automated assessment of speech intelligibility in hearing aid (HA) devic...

Please sign up or login with your details

Forgot password? Click here to reset