Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network

03/22/2023
by   Zeyu Ren, et al.
0

In recent years, End-to-End speech recognition technology based on deep learning has developed rapidly. Due to the lack of Turkish speech data, the performance of Turkish speech recognition system is poor. Firstly, this paper studies a series of speech recognition tuning technologies. The results show that the performance of the model is the best when the data enhancement technology combining speed perturbation with noise addition is adopted and the beam search width is set to 16. Secondly, to maximize the use of effective feature information and improve the accuracy of feature extraction, this paper proposes a new feature extractor LSPC. LSPC and LiGRU network are combined to form a shared encoder structure, and model compression is realized. The results show that the performance of LSPC is better than MSPC and VGGnet when only using Fbank features, and the WER is improved by 1.01 Finally, based on the above two points, a new multi-feature fusion network is proposed as the main structure of the encoder. The results show that the WER of the proposed feature fusion network based on LSPC is improved by 0.82 1.94 feature) extraction using LSPC. Our model achieves performance comparable to that of advanced End-to-End models.

READ FULL TEXT
research
07/24/2017

Exploring Neural Transducers for End-to-End Speech Recognition

In this work, we perform an empirical comparison among the CTC, RNN-Tran...
research
09/07/2016

A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms

Visual speech recognition aims to identify the sequence of phonemes from...
research
10/15/2020

Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Many end-to-end Automatic Speech Recognition (ASR) systems still rely on...
research
04/30/2018

Investigations on End-to-End Audiovisual Fusion

Audiovisual speech recognition (AVSR) is a method to alleviate the adver...
research
04/02/2019

End-to-End Visual Speech Recognition for Small-Scale Datasets

Traditional visual speech recognition systems consist of two stages, fea...
research
10/26/2020

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

We propose a novel decentralized feature extraction approach in federate...
research
08/11/2022

Hybrid Transformer Network for Deepfake Detection

Deepfake media is becoming widespread nowadays because of the easily ava...

Please sign up or login with your details

Forgot password? Click here to reset