Speech2Phone: A Multilingual and Text Independent Speaker Identification Model

02/25/2020
by   Edresson Casanova, et al.
0

Voice recognition is an area with a wide application potential. Speaker identification is useful in several voice recognition tasks, as seen in voice-based authentication, transcription systems and intelligent personal assistants. Some tasks benefit from open-set models which can handle new speakers without the need of retraining. Audio embeddings for speaker identification is a proposal to solve this issue. However, choosing a suitable model is a difficult task, especially when the training resources are scarce. Besides, it is not always clear whether embeddings are as good as more traditional methods. In this work, we propose the Speech2Phone and compare several embedding models for open-set speaker identification, as well as traditional closed-set models. The models were investigated in the scenario of small datasets, which makes them more applicable to languages in which data scarceness is an issue. The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Considering a testing dataset composed of 20 speakers, the best models reach accuracies of 100 scenarios, respectively. Results suggest that the models can perform language independent speaker identification. Among the tested models, a fully connected one, here presented as Speech2Phone, led to the higher accuracy. Furthermore, the models were tested for different languages showing that the knowledge learned was successfully transferred for close and distant languages to Portuguese (in terms of vocabulary). Finally, the models can scale and can handle more speakers than they were trained for, identifying 150 while still maintaining 55

READ FULL TEXT

page 5

page 7

research
10/22/2020

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapp...
research
08/08/2020

JukeBox: A Multilingual Singer Recognition Dataset

A text-independent speaker recognition system relies on successfully enc...
research
09/28/2022

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

A recent trend in speech processing is the use of embeddings created thr...
research
06/18/2021

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

By implicitly recognizing a user based on his/her speech input, speaker ...
research
11/01/2022

Disentangled representation learning for multilingual speaker recognition

The goal of this paper is to train speaker embeddings that are robust to...
research
04/15/2020

Speaker Recognition in Bengali Language from Nonlinear Features

At present Automatic Speaker Recognition system is a very important issu...
research
07/22/2018

Unified Hypersphere Embedding for Speaker Recognition

Incremental improvements in accuracy of Convolutional Neural Networks ar...

Please sign up or login with your details

Forgot password? Click here to reset