Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

06/21/2019
by   Ryo Nakashima, et al.
0

This paper describes a new unsupervised machine learning method for simultaneous phoneme and word discovery from multiple speakers. Human infants can acquire knowledge of phonemes and words from interactions with his/her mother as well as with others surrounding him/her. From a computational perspective, phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker because the speech signals from different speakers exhibit different acoustic features. This paper proposes an unsupervised phoneme and word discovery method that simultaneously uses nonparametric Bayesian double articulation analyzer (NPB-DAA) and deep sparse autoencoder with parametric bias in hidden layer (DSAE-PBHL). We assume that an infant can recognize and distinguish speakers based on certain other features, e.g., visual face recognition. DSAE-PBHL is aimed to be able to subtract speaker-dependent acoustic features and extract speaker-independent features. An experiment demonstrated that DSAE-PBHL can subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than on the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PB outperformed the available methods in phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers.

READ FULL TEXT

page 1

page 5

page 14

page 15

research
10/03/2017

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

In machine lip-reading, which is identification of speech from visual-on...
research
06/22/2015

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

Human infants can discover words directly from unsegmented speech signal...
research
01/18/2022

Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues

Human infants acquire their verbal lexicon from minimal prior knowledge ...
research
03/15/2021

Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery

Infants acquire words and phonemes from unsegmented speech signals using...
research
01/19/2022

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

The prediction of valence from speech is an important, but challenging p...
research
10/03/2017

Visual gesture variability between talkers in continuous visual speech

Recent adoption of deep learning methods to the field of machine lipread...

Please sign up or login with your details

Forgot password? Click here to reset