Multiclass Language Identification using Deep Learning on Spectral Images of Audio Signals

05/10/2019
by   Shauna Revay, et al.
0

The first step in any voice recognition software is to determine what language a speaker is using, and ideally this process would be automated. The technique described in this paper, language identification for audio spectrograms (LIFAS), uses spectrograms generated from audio signals as inputs to a convolutional neural network (CNN) to be used for language identification. LIFAS requires minimal pre-processing on the audio signals as the spectrograms are generated during each batch as they are input to the network during training. LIFAS utilizes deep learning tools that are shown to be successful on image processing tasks and applies it to audio signal classification. LIFAS performs binary language classification with an accuracy of 97%, and multi-class classification with six languages at an accuracy of 89% on 3.75 second audio clips.

READ FULL TEXT
research
05/03/2022

Frequency Domain-Based Detection of Generated Audio

Attackers may manipulate audio with the intent of presenting falsified r...
research
05/20/2020

Automated Copper Alloy Grain Size Evaluation Using a Deep-learning CNN

Moog Inc. has automated the evaluation of copper (Cu) alloy grain size u...
research
09/07/2023

Topological fingerprints for audio identification

We present a topological audio fingerprinting approach for robustly iden...
research
11/16/2022

Arbitrarily Accurate Classification Applied to Specific Emitter Identification

This article introduces a method of evaluating subsamples until any pres...
research
08/17/2020

Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features

The analysis of the structure of musical pieces is a task that remains a...
research
12/08/2019

A Convolutional Neural Network for User Identification based on Motion Sensors

In this paper, we propose a deep learning approach for smartphone user i...
research
04/30/2018

Staircase Network: structural language identification via hierarchical attentive units

Language recognition system is typically trained directly to optimize cl...

Please sign up or login with your details

Forgot password? Click here to reset