ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition

06/03/2021
by   Sergey Verbitskiy, et al.
0

We present a new architecture of convolutional neural networks (CNNs) based on ResNet for audio pattern recognition tasks. The main modification is introducing a new hyper-parameter for decreasing temporal sizes of tensors with increased stride sizes which we call "the decreasing temporal size parameter". Optimal values of this parameter decrease the number of multi-adds that make the system faster. This approach not only decreases computational complexity but it can save and even increase (for the AudioSet dataset) the performance for audio pattern recognition tasks. This observation can be confirmed by experiments on three datasets: the AudioSet dataset, the ESC-50 dataset, and RAVDESS. Our best system achieves the state-of-the-art performance on the AudioSet dataset with mAP of 0.450. We also transfer a model pre-trained on the AudioSet dataset to the ESC-50 dataset and RAVDESS and obtain the state-of-the-art results with accuracies of 0.961 and 0.748, respectively. We call our system "ERANN" (Efficient Residual Audio Neural Network).

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

12/21/2019

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Audio pattern recognition is an important research topic in the machine ...
08/04/2019

Efficient training and design of photonic neural network through neuroevolution

Recently, optical neural networks (ONNs) integrated in photonic chips ha...
05/23/2019

The Convolutional Tsetlin Machine

Deep neural networks have obtained astounding successes for important pa...
06/18/2017

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition

Audio-visual recognition (AVR) has been considered as a solution for spe...
12/28/2019

A Genetic Algorithm based Kernel-size Selection Approach for a Multi-column Convolutional Neural Network

Deep neural network-based architectures give promising results in variou...
04/23/2021

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

Deep neural speech and audio processing systems have a large number of t...
05/31/2019

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promis...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.