Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

04/10/2022
by   Alejandro Delgado, et al.
0

Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in building applications that allow artists to efficiently pick target samples from big sound libraries just by imitating them vocally. In this study, we investigated the potential of conditional autoencoder models to learn informative features for Drum Sample Retrieval by Vocalisation (DSRV). We assessed the usefulness of their embeddings using four evaluation metrics, two of them relative to their acoustic properties and two of them relative to their perceptual properties via human listeners' similarity ratings. Results suggest that models conditioned on both sound-type labels (drum vs imitation) and drum-type labels (kick vs snare vs closed hi-hat vs opened hi-hat) learn the most informative embeddings for DSRV. We finally looked into individual differences in vocal imitation style via the Mantel test and found salient differences among participants, highlighting the importance of user information when designing DSRV systems.

READ FULL TEXT
research
04/07/2022

Musical Information Extraction from the Singing Voice

Music information retrieval is currently an active research area that ad...
research
04/15/2020

Musical Features for Automatic Music Transcription Evaluation

This technical report gives a detailed, formal description of the featur...
research
07/19/2019

Sound Search by Text Description or Vocal Imitation?

Searching sounds by text labels is often difficult, as text descriptions...
research
12/15/2021

Chimpanzee voice prints? Insights from transfer learning experiments from human voices

Individual vocal differences are ubiquitous in the animal kingdom. In hu...
research
04/10/2022

Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification

Vocal Percussion Transcription (VPT) is concerned with the automatic det...
research
09/04/2023

AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning

Voice, as input, has progressively become popular on mobiles and seems t...
research
10/19/2022

Modeling Animal Vocalizations through Synthesizers

Modeling real-world sound is a fundamental problem in the creative use o...

Please sign up or login with your details

Forgot password? Click here to reset