Deep Learning Approaches for Understanding Simple Speech Commands

10/04/2018
by   Roman A. Solovyev, et al.
0

Automatic classification of sound commands is becoming increasingly important, especially for mobile and embedded devices. Many of these devices contain both cameras and microphones, and companies that develop them would like to use the same technology for both of these classification tasks. One way of achieving this is to represent sound commands as images, and use convolutional neural networks when classifying images as well as sounds. In this paper we consider several approaches to the problem of sound classification that we applied in TensorFlow Speech Recognition Challenge organized by Google Brain team on the Kaggle platform. Here we show different representation of sounds (Wave frames, Spectrograms, Mel-Spectrograms, MFCCs) and apply several 1D and 2D convolutional neural networks in order to get the best performance. Our experiments show that we found appropriate sound representation and corresponding convolutional neural networks. As a result we achieved good classification accuracy that allowed us to finish the challenge on 8-th place among 1315 teams.

READ FULL TEXT

page 3

page 4

research
12/17/2018

Persian phonemes recognition using PPNet

In this paper a new approach for recognition of Persian phonemes on the ...
research
08/25/2018

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

Environmental sound classification (ESC) is an important and challenging...
research
11/14/2018

To bee or not to bee: Investigating machine learning approaches for beehive sound recognition

In this work, we aim to explore the potential of machine learning method...
research
11/23/2020

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

Automatic classification of speech commands has revolutionized human com...
research
02/15/2018

Masked Conditional Neural Networks for Automatic Sound Events Recognition

Deep neural network architectures designed for application domains other...
research
10/18/2021

Analysis of French Phonetic Idiosyncrasies for Accent Recognition

Speech recognition systems have made tremendous progress since the last ...
research
01/19/2020

Towards More Efficient and Effective Inference: The Joint Decision of Multi-Participants

Existing approaches to improve the performances of convolutional neural ...

Please sign up or login with your details

Forgot password? Click here to reset