Self-supervised models of audio effectively explain human cortical responses to speech

05/27/2022
by   Aditya R. Vaidya, et al.
0

Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either hand-constructed acoustic filters or representations from supervised audio neural networks. In this work, we capitalize on the progress of self-supervised speech representation learning (SSL) to create new state-of-the-art models of the human auditory system. Compared against acoustic baselines, phonemic features, and supervised models, representations from the middle layers of self-supervised models (APC, wav2vec, wav2vec 2.0, and HuBERT) consistently yield the best prediction performance for fMRI recordings within the auditory cortex (AC). Brain areas involved in low-level auditory processing exhibit a preference for earlier SSL model layers, whereas higher-level semantic areas prefer later layers. We show that these trends are due to the models' ability to encode information at multiple linguistic levels (acoustic, phonetic, and lexical) along their representation depth. Overall, these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.

READ FULL TEXT

page 5

page 6

page 12

page 13

page 14

page 15

page 16

research
02/03/2023

SPADE: Self-supervised Pretraining for Acoustic DisEntanglement

Self-supervised representation learning approaches have grown in popular...
research
02/28/2023

BrainBERT: Self-supervised representation learning for intracranial recordings

We create a reusable Transformer, BrainBERT, for intracranial recordings...
research
05/19/2023

North Sámi Dialect Identification with Self-supervised Speech Models

The North Sámi (NS) language encapsulates four primary dialectal variant...
research
06/09/2023

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Textless self-supervised speech models have grown in capabilities in rec...
research
02/15/2022

Don't stop the training: continuously-updating self-supervised algorithms best account for auditory responses in the cortex

Over the last decade, numerous studies have shown that deep neural netwo...
research
09/04/2023

3D View Prediction Models of the Dorsal Visual Stream

Deep neural network representations align well with brain activity in th...
research
06/05/2022

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

The success of deep learning comes from its ability to capture the hiera...

Please sign up or login with your details

Forgot password? Click here to reset