Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

by   Juliette Millet, et al.

In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well as the results of 91 English- and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from corpora of natural running read speech, used for the 2017 Zero Resource Speech Challenge. We provide a method to compare humans' perceptual space with models' representational space, and we apply it to models previously submitted to the Challenge. We show that, unlike unsupervised models and supervised multilingual models, a standard supervised monolingual HMM-GMM phone recognition system, while good at discriminating phones, yields a representational space very different from that of human native listeners.



There are no comments yet.


page 1

page 2

page 3

page 4


The Perceptimatic English Benchmark for Speech Perception Models

We present the Perceptimatic English Benchmark, an open experimental ben...

Evaluating computational models of infant phonetic learning across languages

In the first year of life, infants' speech perception becomes attuned to...

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

In this paper, we explore the learning of neural network embeddings for ...

Speaker discrimination in humans and machines: Effects of speaking style variability

Does speaking style variation affect humans' ability to distinguish indi...

Probing phoneme, language and speaker information in unsupervised speech representations

Unsupervised models of representations based on Contrastive Predictive C...

Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese

We investigate the behaviour of attention in neural models of visually g...

Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Deep learning has revolutionised synthetic speech quality. However, it h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.