Learning Environmental Sounds with Multi-scale Convolutional Neural Network

03/25/2018
by   Boqing Zhu, et al.
0

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional layers to extract features. The features extracted by single size filters are insufficient for building discriminative representation of audios. In this paper, we propose multi-scale convolution operation, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area. For leveraging the waveform-based features and spectrogram-based features in a single model, we introduce two-phase method to fuse the different features. Finally, we propose a novel end-to-end network called WaveMsNet based on the multi-scale convolution operation and two-phase method. On the environmental sounds classification datasets ESC-10 and ESC-50, the classification accuracies of our WaveMsNet achieve 93.75 respectively, which improve significantly from the previous methods.

READ FULL TEXT
research
10/01/2016

Very Deep Convolutional Neural Networks for Raw Waveforms

Learning acoustic models directly from the raw waveform data with minima...
research
05/25/2020

End-to-End Auditory Object Recognition via Inception Nucleus

Machine learning approaches to auditory object recognition are tradition...
research
04/06/2021

MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms

In this work, we aim to improve the expressive capacity of waveform-base...
research
12/27/2019

Deep progressive multi-scale attention for acoustic event classification

Convolutional neural network (CNN) is an indispensable building block fo...
research
11/03/2017

Learning Filterbanks from Raw Speech for Phone Recognition

We train a bank of complex filters that operates on the raw waveform and...
research
02/25/2019

Forecasting intracranial hypertension using multi-scale waveform metrics

Objective: Intracranial hypertension is an important risk factor of seco...
research
11/04/2022

Seismic-phase detection using multiple deep learning models for global and local representations of waveforms

The detection of earthquakes is a fundamental prerequisite for seismolog...

Please sign up or login with your details

Forgot password? Click here to reset