Brain encoding models based on multimodal transformers can transfer across language and vision

05/20/2023
by   Jerry Tang, et al.
0

Encoding models have been used to assess how the human brain represents concepts in language and vision. While language and vision rely on similar concept representations, current encoding models are typically trained and tested on brain responses to each modality in isolation. Recent advances in multimodal pretraining have produced transformers that can extract aligned representations of concepts in language and vision. In this work, we used representations from multimodal transformers to train encoding models that can transfer across fMRI responses to stories and movies. We found that encoding models trained on brain responses to one modality can successfully predict brain responses to the other modality, particularly in cortical regions that represent conceptual meaning. Further analysis of these encoding models revealed shared semantic dimensions that underlie concept representations in language and vision. Comparing encoding models trained using representations from multimodal and unimodal transformers, we found that multimodal transformers learn more aligned representations of concepts in language and vision. Our results demonstrate how multimodal transformers can provide insights into the brain's capacity for multimodal processing.

READ FULL TEXT

page 6

page 9

page 14

page 16

page 18

page 19

page 20

research
01/31/2021

Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers

Recently multimodal transformer models have gained popularity because th...
research
04/18/2022

Visio-Linguistic Brain Encoding

Enabling effective brain-computer interfaces requires understanding how ...
research
11/15/2017

Investigating Inner Properties of Multimodal Representation and Semantic Compositionality with Brain-based Componential Semantics

Multimodal models have been proven to outperform text-based approaches o...
research
12/08/2020

Parameter Efficient Multimodal Transformers for Video Representation Learning

The recent success of Transformers in the language domain has motivated ...
research
05/19/2023

Scaling laws for language encoding models in fMRI

Representations from transformer-based unidirectional language models ar...
research
04/28/2023

An Empirical Study of Multimodal Model Merging

Model merging (e.g., via interpolation or task arithmetic) fuses multipl...
research
06/06/2023

Identifying Shared Decodable Concepts in the Human Brain Using Image-Language Foundation Models

We introduce a method that takes advantage of high-quality pretrained mu...

Please sign up or login with your details

Forgot password? Click here to reset