Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

02/24/2017
by   Yong Xu, et al.
0

Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.

READ FULL TEXT
research
07/13/2016

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

Environmental audio tagging aims to predict only the presence or absence...
research
08/20/2018

R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection

This paper proposes a Region-based Convolutional Recurrent Neural Networ...
research
06/24/2016

Fully DNN-based Multi-label regression for audio tagging

Acoustic event detection for content analysis in most cases relies on lo...
research
05/18/2018

Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network

Audio scene classification, the problem of predicting class labels of au...
research
11/27/2019

GLA in MediaEval 2018 Emotional Impact of Movies Task

The visual and audio information from movies can evoke a variety of emot...
research
03/07/2017

Convolutional Recurrent Neural Networks for Bird Audio Detection

Bird sounds possess distinctive spectral structure which may exhibit sma...
research
12/22/2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

In this paper we propose a novel model for unconditional audio generatio...

Please sign up or login with your details

Forgot password? Click here to reset