BWSNet: Automatic Perceptual Assessment of Audio Signals

This paper introduces BWSNet, a model that can be trained from raw human judgements obtained through a Best-Worst scaling (BWS) experiment. It maps sound samples into an embedded space that represents the perception of a studied attribute. To this end, we propose a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task. We tested our proposal on data from two BWS studies investigating the perception of speech social attitudes and timbral qualities. For both datasets, our results show that the structure of the latent space is faithful to human judgements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2020

Can you hear me now? Sensitive comparisons of human and machine perception

The rise of sophisticated machine-recognition systems has brought with i...
research
02/09/2021

CDPAM: Contrastive learning for perceptual audio similarity

Many speech processing methods based on deep learning require an automat...
research
05/03/2022

How Does Embodiment Affect the Human Perception of Computational Creativity? An Experimental Study Framework

Which factors influence the human assessment of creativity exhibited by ...
research
12/22/2020

AudioViewer: Learning to Visualize Sound

Sensory substitution can help persons with perceptual deficits. In this ...
research
02/27/2019

Ordinal Distance Metric Learning with MDS for Image Ranking

Image ranking is to rank images based on some known ranked images. In th...
research
11/10/2017

A Latent Space Model for Cognitive Social Structures Data

This paper introduces a novel approach for modeling a set of directed, b...
research
05/08/2019

PerceptNet: Learning Perceptual Similarity of Haptic Textures in Presence of Unorderable Triplets

In order to design haptic icons or build a haptic vocabulary, we require...

Please sign up or login with your details

Forgot password? Click here to reset