The IBM 2016 Speaker Recognition System

02/23/2016
by   Seyed Omid Sadjadi, et al.
0

In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech. In particular, we identify key techniques that contribute to significant improvements in performance of our system, and quantify their contributions. The techniques include: 1) a nearest-neighbor discriminant analysis (NDA) approach that is formulated to alleviate some of the limitations associated with the conventional linear discriminant analysis (LDA) that assumes Gaussian class-conditional distributions, 2) the application of speaker- and channel-adapted features, which are derived from an automatic speech recognition (ASR) system, for speaker recognition, and 3) the use of a deep neural network (DNN) acoustic model with a large number of output units ( 10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 speaker recognition evaluation (SRE) extended core conditions involving telephone and microphone trials. Experimental results indicate that: 1) the NDA is more effective (up to 35 parametric LDA for speaker recognition, 2) when compared to raw acoustic features (e.g., MFCCs), the ASR speaker-adapted features provide gains in speaker recognition performance, and 3) increasing the number of output units in the DNN acoustic model (i.e., increasing the senone set size from 2k to 10k) provides consistent improvements in performance (for example from 37 relative EER gains over our baseline GMM i-vector system). To our knowledge, results reported in this paper represent the best performances published to date on the NIST SRE 2010 extended core tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2016

The IBM Speaker Recognition System: Recent Advances and Error Analysis

We present the recent advances along with an error analysis of the IBM s...
research
02/21/2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
05/16/2019

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

The rapid population aging has stimulated the development of assistive d...
research
05/15/2019

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

Speech-driven visual speech synthesis involves mapping features extracte...
research
06/28/2015

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

A deep learning approach has been proposed recently to derive speaker id...
research
03/15/2020

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

In this paper we investigate the GMM-derived (GMMD) features for adaptat...
research
11/23/2018

Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

Frequency modulation features capture the fine structure of speech forma...

Please sign up or login with your details

Forgot password? Click here to reset