ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition

by   Sergey Verbitskiy, et al.

We present a new architecture of convolutional neural networks (CNNs) based on ResNet for audio pattern recognition tasks. The main modification is introducing a new hyper-parameter for decreasing temporal sizes of tensors with increased stride sizes which we call "the decreasing temporal size parameter". Optimal values of this parameter decrease the number of multi-adds that make the system faster. This approach not only decreases computational complexity but it can save and even increase (for the AudioSet dataset) the performance for audio pattern recognition tasks. This observation can be confirmed by experiments on three datasets: the AudioSet dataset, the ESC-50 dataset, and RAVDESS. Our best system achieves the state-of-the-art performance on the AudioSet dataset with mAP of 0.450. We also transfer a model pre-trained on the AudioSet dataset to the ESC-50 dataset and RAVDESS and obtain the state-of-the-art results with accuracies of 0.961 and 0.748, respectively. We call our system "ERANN" (Efficient Residual Audio Neural Network).



page 1

page 2

page 3

page 4


PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Audio pattern recognition is an important research topic in the machine ...

Efficient training and design of photonic neural network through neuroevolution

Recently, optical neural networks (ONNs) integrated in photonic chips ha...

The Convolutional Tsetlin Machine

Deep neural networks have obtained astounding successes for important pa...

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition

Audio-visual recognition (AVR) has been considered as a solution for spe...

A Genetic Algorithm based Kernel-size Selection Approach for a Multi-column Convolutional Neural Network

Deep neural network-based architectures give promising results in variou...

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

Deep neural speech and audio processing systems have a large number of t...

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promis...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.