A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

03/16/2022
by   Jisheng Bai, et al.
0

Environmental sound recognition (ESR) is an emerging research topic in audio pattern recognition. Many tasks are presented to resort to computational systems for ESR in real-life applications. However, current systems are usually designed for individual tasks, and are not robust and applicable to other tasks. Cross-task systems, which promote unified knowledge modeling across various tasks, have not been thoroughly investigated. In this paper, we propose a cross-task system for three different tasks of ESR: acoustic scene classification, urban sound tagging, and anomalous sound detection. An architecture named SE-Trans is presented that uses attention mechanism-based Squeeze-and-Excitation and Transformer encoder modules to learn channel-wise relationship and temporal dependencies of the acoustic features. FMix is employed as the data augmentation method that improves the performance of ESR. Evaluations for the three tasks are conducted on the recent databases of DCASE challenges. The experimental results show that the proposed cross-task system achieves state-of-the-art performance on all tasks. Further analysis demonstrates that the proposed cross-task system can effectively utilize acoustic knowledge across different ESR tasks.

READ FULL TEXT
research
04/11/2019

Cross-task learning for audio tagging, sound event detection spatial localization: DCASE 2019 baseline systems

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2...
research
04/06/2019

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2...
research
11/02/2018

Acoustic Features Fusion using Attentive Multi-channel Deep Architecture

In this paper, we present a novel deep fusion architecture for audio cla...
research
01/08/2021

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

In this paper, we propose a novel four-stage data augmentation approach ...
research
08/14/2023

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

We propose a shift towards end-to-end learning in bird sound monitoring ...
research
04/10/2019

A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

One of the biggest challenges of acoustic scene classification (ASC) is ...
research
09/26/2022

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Large-scale sound recognition data sets typically consist of acoustic re...

Please sign up or login with your details

Forgot password? Click here to reset