Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Network

10/21/2020
by   Qiang Huang, et al.
0

Anomalous audio in speech recordings is often caused by speaker voice distortion, external noise, or even electric interferences. These obstacles have become a serious problem in some fields, such as recording high-quality music and speech processing. In this paper, a novel approach using a temporal convolutional attention network (TCAN) is proposed to process this problem. The use of temporal conventional network (TCN) can capture long range patterns using a hierarchy of temporal convolutional filters. To enhance the ability to tackle audio anomalies in different acoustic conditions, an attention mechanism is used in TCN, where a self-attention block is added after each temporal convolutional layer. This aims to highlight the target related features and mitigate the interferences from irrelevant information. To evaluate the performance of the proposed model, audio recordings are collected from the TIMIT dataset, and are then changed by adding five different types of audio distortions: gaussian noise, magnitude drift, random dropout, reduction of temporal resolution, and time warping. Distortions are mixed at different signal-to-noise ratios (SNRs) (5dB, 10dB, 15dB, 20dB, 25dB, 30dB). The experimental results show that the use of proposed model can yield good classification performances and outperforms some strong baseline methods, such as the LSTM and TCN based models, by about 3∼ 10% relatively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

Improving Robustness In Speaker Identification Using A Two-Stage Attention Model

In this paper a novel framework to tackle speaker recognition using a tw...
research
05/16/2020

Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models

Many applications of speech technology require more and more audio data....
research
02/19/2021

Frequency-Temporal Attention Network for Singing Melody Extraction

Musical audio is generally composed of three physical properties: freque...
research
09/28/2018

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

Recent works in speech recognition rely either on connectionist temporal...
research
11/16/2019

Music theme recognition using CNN and self-attention

We present an efficient architecture to detect mood/themes in music trac...
research
10/23/2020

A Cross-Verification Approach for Protecting World Leaders from Fake and Tampered Audio

This paper tackles the problem of verifying the authenticity of speech r...
research
10/19/2021

Temporal separation of whale vocalizations from background oceanic noise using a power calculation

The process of analyzing audio signals in search of cetacean vocalizatio...

Please sign up or login with your details

Forgot password? Click here to reset