Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition

01/11/2022
by   Jun Qi, et al.
0

This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely CNN+(TT-DNN), is composed of convolutional layers at the bottom for spectral feature extraction and TT layers at the top for command classification. Compared with a traditional end-to-end CNN baseline for SCR, our proposed CNN+(TT-DNN) model replaces fully connected (FC) layers with TT ones and it can substantially reduce the number of model parameters while maintaining the baseline performance of the CNN model. We initialize the CNN+(TT-DNN) model in a randomized manner or based on a well-trained CNN+DNN, and assess the CNN+(TT-DNN) models on the Google Speech Command Dataset. Our experimental results show that the proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31 parameters than the CNN model. Furthermore, the CNN+(TT-DNN) model can obtain a 97.2

READ FULL TEXT
research
03/11/2022

Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing

This work focuses on designing low complexity hybrid tensor networks by ...
research
07/25/2020

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

This paper investigates different trade-offs between the number of model...
research
03/07/2017

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

This study proposes a fully convolutional network (FCN) model for raw wa...
research
10/31/2020

Efficient Arabic emotion recognition using deep neural networks

Emotion recognition from speech signal based on deep learning is an acti...
research
04/16/2021

Implementing CNN Layers on the Manticore Cluster-Based Many-Core Architecture

This document presents implementations of fundamental convolutional neur...
research
09/06/2017

Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art

Automatic speech recognition is used to assess spoken English learner pr...
research
06/25/2019

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

In recent years, deep learning based machine lipreading has gained promi...

Please sign up or login with your details

Forgot password? Click here to reset