Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

03/30/2022
by   Gasser Elbanna, et al.
0

As a neurophysiological response to threat or adverse conditions, stress can affect cognition, emotion and behaviour with potentially detrimental effects on health in the case of sustained exposure. Since the affective content of speech is inherently modulated by an individual's physical and mental state, a substantial body of research has been devoted to the study of paralinguistic correlates of stress-inducing task load. Historically, voice stress analysis (VSA) has been conducted using conventional digital signal processing (DSP) techniques. Despite the development of modern methods based on deep neural networks (DNNs), accurately detecting stress in speech remains difficult due to the wide variety of stressors and considerable variability in the individual stress perception. To that end, we introduce a set of five datasets for task load detection in speech. The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers, with a cumulative number of more than a hundred speakers. We used the datasets to design and evaluate a novel self-supervised audio representation that leverages the effectiveness of handcrafted features (DSP-based) and the complexity of data-driven DNN representations. Notably, the proposed approach outperformed both extensive handcrafted feature sets and novel DNN-based audio representation learning approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

Speaker Embeddings as Individuality Proxy for Voice Stress Detection

Since the mental states of the speaker modulate speech, stress introduce...
research
05/09/2022

Insights on Modelling Physiological, Appraisal, and Affective Indicators of Stress using Audio Features

Stress is a major threat to well-being that manifests in a variety of ph...
research
11/20/2021

Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy

Human speech production encompasses physiological processes that natural...
research
06/24/2022

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

Methods for extracting audio and speech features have been studied since...
research
12/29/2020

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

This paper describes two novel complementary techniques that improve the...
research
10/13/2021

DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

Conventional vocoders are commonly used as analysis tools to provide int...
research
08/26/2021

StressNAS: Affect State and Stress Detection Using Neural Architecture Search

Smartwatches have rapidly evolved towards capabilities to accurately cap...

Please sign up or login with your details

Forgot password? Click here to reset