Joint gender and age estimation based on speech signals using x-vectors and transfer learning

12/02/2020
by   Damian Kwasny, et al.
0

In this paper we extend the x-vector framework for the task of speaker's age estimation and gender classification. In particular, we replace the baseline multilayer-TDNN architecture with QuartzNet, a convolutional architecture that has gained success in the field of speech recognition. We further propose a two-staged transfer learning scheme, utilizing large scale speech datasets: VoxCeleb and Common Voice, and usage of multitask learning to allow for joint age estimation and gender classification with a single system. We train and evaluate the performance on the TIMIT dataset. The proposed transfer learning scheme yields consecutive performance improvements in terms of both age estimation error and gender classification accuracy and the best performing system achieves new state-of-the-art results on the task of age estimation on the TIMIT TEST dataset with MAE of 5.12 and 5.29 years and RMSE of 7.24 and 8.12 years for male and female speakers respectively while maintaining a gender classification accuracy of 99.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2018

Transfer Learning with Deep CNNs for Gender Recognition and Age Estimation

In this project, competition-winning deep neural networks with pretraine...
research
06/29/2023

Speech-based Age and Gender Prediction with Transformers

We report on the curation of several publicly available datasets for age...
research
09/14/2016

Joint Gender Classification and Age Estimation by Nearly Orthogonalizing Their Semantic Spaces

In human face-based biometrics, gender classification and age estimation...
research
10/18/2022

SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning

Estimating age from a single speech is a classic and challenging topic. ...
research
10/24/2021

Learning Speaker Representation with Semi-supervised Learning approach for Speaker Profiling

Speaker profiling, which aims to estimate speaker characteristics such a...
research
08/30/2020

Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity

Alzheimer's disease is estimated to affect around 50 million people worl...
research
04/22/2023

Can Voice Assistants Sound Cute? Towards a Model of Kawaii Vocalics

The Japanese notion of "kawaii" or expressions of cuteness, vulnerabilit...

Please sign up or login with your details

Forgot password? Click here to reset