Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment

by   Aref Farhadipour, et al.

Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands in the smart home can be a significant achievement. In this work, we introduce gammatonegram as an effective method to represent audio files with discriminative details, which is used as input for the convolutional neural network. On the other word, we convert each speech file into an image and propose image recognition system to classify speech in different scenarios. Proposed CNN is based on the transfer learning method on the pre-trained Alexnet. In this research, the efficiency of the proposed system for speech recognition, speaker identification, and intelligibility assessment is evaluated. According to the results on the UA dataset, the proposed speech recognition system achieved 91.29 speaker-dependent mode, the speaker identification system acquired 87.74 accuracy in text-dependent mode, and the intelligibility assessment system achieved 96.47 speech recognition system that works fully automatically. This system is located in a cascade arrangement with the two-class intelligibility assessment system, and the output of this system activates each one of the speech recognition networks. This architecture achieves an accuracy of 92.3 source code of this paper is available.


page 3

page 4

page 5

page 7

page 9

page 10


Streaming Multi-talker Speech Recognition with Joint Speaker Identification

In multi-talker scenarios such as meetings and conversations, speech pro...

Speaker Identification using Speech Recognition

The audio data is increasing day by day throughout the globe with the in...

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Wav2vec2 has achieved success in applying Transformer architecture and s...

Speech Recognition: Keyword Spotting Through Image Recognition

The problem of identifying voice commands has always been a challenge du...

The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems

Speech and speaker recognition systems are employed in a variety of appl...

SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems

Speech and speaker recognition systems are employed in a variety of appl...

Untangling in Invariant Speech Recognition

Encouraged by the success of deep neural networks on a variety of visual...

Please sign up or login with your details

Forgot password? Click here to reset