LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity

07/01/2020
by   Jordan J. Bird, et al.
0

In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification. In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes. Using character level LSTMs (supervised learning) and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated by learning from the data provided on a per-subject basis. A neural network is trained to classify the data against a large dataset of Flickr8k speakers and is then compared to a transfer learning network performing the same task but with an initial weight distribution dictated by learning from the synthetic data generated by the two models. The best result for all of the 7 subjects were networks that had been exposed to synthetic data, the model pre-trained with LSTM-produced data achieved the best result 3 times and the GPT-2 equivalent 5 times (since one subject had their best result from both models at a draw). Through these results, we argue that speaker classification can be improved by utilising a small amount of user data but with exposure to synthetically-generated MFCCs which then allow the networks to achieve near maximum classification scores.

READ FULL TEXT
research
08/28/2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Transfer tasks in text-to-speech (TTS) synthesis - where one or more asp...
research
03/08/2020

Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning

Development of Automatic Speech Recognition system for Kazakh language i...
research
06/07/2023

Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak

In this paper, we are comparing several methods of training the Slovak s...
research
02/24/2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Speech models have long been known to overfit individual speakers for ma...
research
02/06/2019

Transfer Learning From Sound Representations For Anger Detection in Speech

In this work, we train fully convolutional networks to detect anger in s...
research
09/26/2022

Deep Convolutional Neural Network and Transfer Learning for Locomotion Intent Prediction

Powered prosthetic legs must anticipate the user's intent when switching...
research
02/12/2020

Constructing a Highlight Classifier with an Attention Based LSTM Neural Network

Data is being produced in larger quantities than ever before in human hi...

Please sign up or login with your details

Forgot password? Click here to reset