STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

11/09/2020
by   Ryandhimas E. Zezario, et al.
0

The calculation of most objective speech intelligibility assessment metrics requires clean speech as a reference. Such a requirement may limit the applicability of these metrics in real-world scenarios. To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net. The input and output of STOI-Net are speech spectral features and predicted STOI scores, respectively. The model is formed by the combination of a convolutional neural network and bidirectional long short-term memory (CNN-BLSTM) architecture with a multiplicative attention mechanism. Experimental results show that the STOI score estimated by STOI-Net has a good correlation with the actual STOI score when tested with noisy and enhanced speech utterances. The correlation values are 0.97 and 0.83, respectively, for the seen test condition (the test speakers and noise types are involved in the training set) and the unseen test condition (the test speakers and noise types are not involved in the training set). The results confirm the capability of STOI-Net to accurately predict the STOI scores without referring to clean speech.

READ FULL TEXT
research
11/03/2021

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

In this study, we propose a cross-domain multi-objective speech assessme...
research
11/10/2021

HASA-net: A non-intrusive hearing-aid speech assessment network

Without the need of a clean reference, non-intrusive speech assessment m...
research
08/16/2018

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM

Nowadays, most of the objective speech quality assessment tools (e.g., p...
research
11/04/2021

InQSS: a speech intelligibility assessment model using a multi-task learning network

Speech intelligibility assessment models are essential tools for researc...
research
04/07/2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Recently, deep learning (DL)-based non-intrusive speech assessment model...
research
07/31/2020

A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

The real-world capabilities of objective speech quality measures are lim...
research
05/04/2022

Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

Perceptual evaluation of speech quality (PESQ) requires a clean speech r...

Please sign up or login with your details

Forgot password? Click here to reset