Raw Waveform-based Audio Classification Using Sample-level CNN Architectures

12/04/2017
by   Jongpil Lee, et al.
0

Music, speech, and acoustic scene sound are often handled separately in the audio domain because of their different signal characteristics. However, as the image domain grows rapidly by versatile image classification models, it is necessary to study extensible classification models in the audio domain as well. In this study, we approach this problem using two types of sample-level deep convolutional neural networks that take raw waveforms as input and uses filters with small granularity. One is a basic model that consists of convolution and pooling layers. The other is an improved model that additionally has residual connections, squeeze-and-excitation modules and multi-level concatenation. We show that the sample-level models reach state-of-the-art performance levels for the three different categories of sound. Also, we visualize the filters along layers and compare the characteristics of learned filters.

READ FULL TEXT
research
06/21/2017

Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification

Music tag words that describe music audio by text have different levels ...
research
10/28/2017

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

Recent work has shown that the end-to-end approach using convolutional n...
research
08/28/2019

Environment Sound Classification using Multiple Feature Channels and Deep Convolutional Neural Networks

In this paper, we propose a model for the Environment Sound Classificati...
research
11/21/2019

An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy

Audio classification can distinguish different kinds of sounds, which is...
research
05/24/2018

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Motivated by the fact that characteristics of different sound classes ar...
research
05/24/2018

Environmental Sound Classification Based on Multi-temporal Resolution CNN Network Combining with Multi-level Features

Motivated by the fact that characteristics of different sound classes ar...
research
07/30/2021

A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

In this work, we propose a multi-head relevance weighting framework to l...

Please sign up or login with your details

Forgot password? Click here to reset