Learning Audio Sequence Representations for Acoustic Event Classification

07/27/2017
by   Zixing Zhang, et al.
0

Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is still challenging. Previous methods mainly focused on designing the audio features in a 'hand-crafted' manner. Interestingly, data-learnt features have been recently reported to show better performance. Up to now, these were only considered on the frame-level. In this paper, we propose an unsupervised learning framework to learn a vector representation of an audio sequence for AEC. This framework consists of a Recurrent Neural Network (RNN) encoder and a RNN decoder, which respectively transforms the variable-length audio sequence into a fixed-length vector and reconstructs the input sequence on the generated vector. After training the encoder-decoder, we feed the audio sequences to the encoder and then take the learnt vectors as the audio sequence representations. Compared with previous methods, the proposed method can not only deal with the problem of arbitrary-lengths of audio streams, but also learn the salient information of the sequence. Extensive evaluation on a large-size acoustic event database is performed, and the empirical results demonstrate that the learnt audio sequence representation yields a significant performance improvement by a large margin compared with other state-of-the-art hand-crafted sequence features for AEC.

READ FULL TEXT
research
04/18/2022

Automated Audio Captioning using Audio Event Clues

Audio captioning is an important research area that aims to generate mea...
research
06/15/2020

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Audio representation learning based on deep neural networks (DNNs) emerg...
research
03/22/2022

CT-SAT: Contextual Transformer for Sequential Audio Tagging

Sequential audio event tagging can provide not only the type information...
research
06/03/2014

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

In this paper, we propose a novel neural network model called RNN Encode...
research
11/05/2017

Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Sequ...
research
05/26/2017

Biomedical Event Trigger Identification Using Bidirectional Recurrent Neural Network Based Models

Biomedical events describe complex interactions between various biomedic...
research
04/03/2018

Unsupervised Learning of Sequence Representations by Autoencoders

Traditional machine learning models have problems with handling sequence...

Please sign up or login with your details

Forgot password? Click here to reset