Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification

04/10/2022
by   Alejandro Delgado, et al.
0

Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly. Classifier algorithms in VPT systems learn best from small user-specific datasets, which usually restrict modelling to small input feature sets to avoid data overfitting. This study explores several deep supervised learning strategies to obtain informative feature sets for amateur vocal percussion classification. We evaluated the performance of these sets on regular vocal percussion classification tasks and compared them with several baseline approaches including feature selection methods and a speech recognition engine. These proposed learning models were supervised with several label sets containing information from four different levels of abstraction: instrument-level, syllable-level, phoneme-level, and boxeme-level. Results suggest that convolutional neural networks supervised with syllable-level annotations produced the most informative embeddings for classification, which can be used as input representations to fit classifiers with. Finally, we used back-propagation-based saliency maps to investigate the importance of different spectrogram regions for feature learning.

READ FULL TEXT

page 3

page 7

research
06/21/2017

Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification

Music tag words that describe music audio by text have different levels ...
research
12/17/2018

Persian phonemes recognition using PPNet

In this paper a new approach for recognition of Persian phonemes on the ...
research
09/30/2022

An empirical study of weakly supervised audio tagging embeddings for general audio representations

We study the usability of pre-trained weakly supervised audio tagging (A...
research
03/14/2017

Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

The complexity of a learning task is increased by transformations in the...
research
05/28/2019

Texture Selection for Automatic Music Genre Classification

Music Genre Classification is the problem of associating genre-related l...
research
04/10/2022

Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

Imitating musical instruments with the human voice is an efficient way o...
research
02/16/2019

CruzAffect at AffCon 2019 Shared Task: A feature-rich approach to characterize happiness

We present our system, CruzAffect, for the CL-Aff Shared Task 2019. Cruz...

Please sign up or login with your details

Forgot password? Click here to reset