Speaker Embeddings as Individuality Proxy for Voice Stress Detection

06/09/2023
by   Zihan Wu, et al.
0

Since the mental states of the speaker modulate speech, stress introduced by cognitive or physical loads could be detected in the voice. The existing voice stress detection benchmark has shown that the audio embeddings extracted from the Hybrid BYOL-S self-supervised model perform well. However, the benchmark only evaluates performance separately on each dataset, but does not evaluate performance across the different types of stress and different languages. Moreover, previous studies found strong individual differences in stress susceptibility. This paper presents the design and development of voice stress detection, trained on more than 100 speakers from 9 language groups and five different types of stress. We address individual variabilities in voice stress analysis by adding speaker embeddings to the hybrid BYOL-S features. The proposed method significantly improves voice stress detection performance with an input audio length of only 3-5 seconds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

As a neurophysiological response to threat or adverse conditions, stress...
research
06/03/2021

An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis

Multi-speaker spoken datasets enable the creation of text-to-speech synt...
research
10/07/2021

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

Detection of common events and scenes from audio is useful for extractin...
research
06/26/2019

Stress-SGX: Load and Stress your Enclaves for Fun and Profit

The latest generation of Intel processors supports Software Guard Extens...
research
10/31/2022

Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection

The rapid spread of media content synthesis technology and the potential...
research
05/14/2022

Integration of Text and Graph-based Features for Detecting Mental Health Disorders from Voice

With the availability of voice-enabled devices such as smart phones, men...
research
05/09/2023

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings

The virtual world is being established in which digital humans are creat...

Please sign up or login with your details

Forgot password? Click here to reset