Toward a realistic model of speech processing in the brain with self-supervised learning

06/03/2022
by   Juliette Millet, et al.
0

Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts of data, (2) unobtainable supervised labels, (3) textual rather than raw sensory input, and / or (4) implausibly large memory (e.g. thousands of contextual words). These elements highlight the need to identify algorithms that, under these limitations, would suffice to account for both behavioral and brain responses. Focusing on the issue of speech processing, we here hypothesize that self-supervised algorithms trained on the raw waveform constitute a promising candidate. Specifically, we compare a recent self-supervised architecture, Wav2Vec 2.0, to the brain activity of 412 English, French, and Mandarin individuals recorded with functional Magnetic Resonance Imaging (fMRI), while they listened to  1h of audio books. Our results are four-fold. First, we show that this algorithm learns brain-like representations with as little as 600 hours of unlabelled speech – a quantity comparable to what infants can be exposed to during language acquisition. Second, its functional hierarchy aligns with the cortical hierarchy of speech processing. Third, different training regimes reveal a functional specialization akin to the cortex: Wav2Vec 2.0 learns sound-generic, speech-specific and language-specific representations similar to those of the prefrontal and temporal cortices. Fourth, we confirm the similarity of this specialization with the behavior of 386 additional participants. These elements, resulting from the largest neuroimaging benchmark to date, show how self-supervised learning can account for a rich organization of speech processing in the brain, and thus delineate a path to identify the laws of language acquisition which shape the human brain.

READ FULL TEXT

page 3

page 4

page 5

page 6

research
02/15/2022

Don't stop the training: continuously-updating self-supervised algorithms best account for auditory responses in the cortex

Over the last decade, numerous studies have shown that deep neural netwo...
research
02/25/2021

Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech

Our ability to comprehend speech remains, to date, unrivaled by deep lea...
research
10/12/2021

Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects

A popular approach to decompose the neural bases of language consists in...
research
02/11/2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Auditory attention decoding (AAD) is a technique used to identify and am...
research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
03/02/2021

Decomposing lexical and compositional syntax and semantics with deep language models

The activations of language transformers like GPT2 have been shown to li...
research
11/03/2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity

Meaningful and simplified representations of neural activity can yield i...

Please sign up or login with your details

Forgot password? Click here to reset