I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch

12/08/2020
by   Joseph Turian, et al.
0

Growing research demonstrates that synthetic failure modes imply poor generalization. We compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids. The results are surprising: many have poor sense of pitch direction. These shortcomings are exposed using simple rank assumptions. Our task is trivial for humans but difficult for these audio distances, suggesting significant progress can be made in self-supervised audio learning by improving current losses.

READ FULL TEXT

page 3

page 6

page 16

research
09/19/2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Audio-visual representation learning aims to develop systems with human-...
research
03/11/2021

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Inspired by the recent progress in self-supervised learning for computer...
research
03/02/2022

Audio Self-supervised Learning: A Survey

Inspired by the humans' cognitive ability to generalise knowledge and sk...
research
05/12/2023

Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation

In this paper, we focus on a recently proposed novel task called Audio-V...
research
02/13/2020

Self-supervised learning for audio-visual speaker diarization

Speaker diarization, which is to find the speech segments of specific sp...
research
03/02/2019

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring

In this paper, we focus on the challenging perception problem in robotic...

Please sign up or login with your details

Forgot password? Click here to reset