Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M

07/06/2023
by   Öykü Berfin Mercan, et al.
0

In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-300M models which are two pre-trained multilingual models for speech to text were examined for the Turkish language. Mozilla Common Voice version 11.0 which is prepared in Turkish language and is an open-source data set, was used in the study. The multilingual models, Whisper- Small and Wav2Vec2-XLS-R-300M were fine-tuned with this data set which contains a small amount of data. The speech to text performance of the two models was compared. WER values are calculated as 0.28 and 0.16 for the Wav2Vec2-XLS- R-300M and the Whisper-Small models respectively. In addition, the performances of the models were examined with the test data prepared with call center records that were not included in the training and validation dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2020

Testing pre-trained Transformer models for Lithuanian news clustering

A recent introduction of Transformer deep learning architecture made bre...
research
06/23/2023

Abstractive Text Summarization for Resumes With Cutting Edge NLP Transformers and LSTM

Text summarization is a fundamental task in natural language processing ...
research
11/02/2020

Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets

The availability of different pre-trained semantic models enabled the qu...
research
06/02/2021

Lightweight Adapter Tuning for Multilingual Speech Translation

Adapter modules were recently introduced as an efficient alternative to ...
research
02/25/2021

Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

Development of language proficiency models for non-native learners has b...
research
02/02/2023

idT5: Indonesian Version of Multilingual T5 Transformer

Indonesian language is spoken by almost 200 million people and is the 10...
research
09/29/2022

Facial Landmark Predictions with Applications to Metaverse

This research aims to make metaverse characters more realistic by adding...

Please sign up or login with your details

Forgot password? Click here to reset