Very Deep Convolutional Neural Networks for Raw Waveforms

10/01/2016
by   Wei Dai, et al.
0

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few ( 2) convolutional layers, which might be insufficient for building high-level discriminative features. In this work, we propose very deep convolutional neural networks (CNNs) that directly use time-domain waveforms as inputs. Our CNNs, with up to 34 weight layers, are efficient to optimize over very long sequences (e.g., vector of size 32000), necessary for processing acoustic waveforms. This is achieved through batch normalization, residual learning, and a careful design of down-sampling in the initial layers. Our networks are fully convolutional, without the use of fully connected layers and dropout, to maximize representation learning. We use a large receptive field in the first convolutional layer to mimic bandpass filters, but very small receptive fields subsequently to control the model capacity. We demonstrate the performance gains with the deeper models. Our evaluation shows that the CNN with 18 weight layers outperform the CNN with 3 weight layers by over 15 absolute accuracy for an environmental sound recognition task and matches the performance of models using log-mel features.

READ FULL TEXT
research
03/25/2018

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Deep learning has dramatically improved the performance of sounds recogn...
research
06/21/2019

Multi-Span Acoustic Modelling using Raw Waveform Signals

Traditional automatic speech recognition (ASR) systems often use an acou...
research
12/02/2020

Data-driven Analysis of Turbulent Flame Images

Turbulent premixed flames are important for power generation using gas t...
research
11/26/2022

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

Minimal changes to neural architectures (e.g. changing a single hyperpar...
research
02/01/2022

Data-driven emergence of convolutional structure in neural networks

Exploiting data invariances is crucial for efficient learning in both ar...
research
06/05/2018

Integrating Flexible Normalization into Mid-Level Representations of Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) are becoming increasingly popu...
research
11/14/2019

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

Convolutional neural networks (CNNs) with dilated filters such as the Wa...

Please sign up or login with your details

Forgot password? Click here to reset