BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI

02/25/2023
by   Yulong Liu, et al.
1

Reconstructing perceived natural images or decoding their categories from fMRI signals are challenging tasks with great scientific significance. Due to the lack of paired samples, most existing methods fail to generate semantically recognizable reconstruction and are difficult to generalize to novel classes. In this work, we propose, for the first time, a task-agnostic brain decoding model by unifying the visual stimulus classification and reconstruction tasks in a semantic space. We denote it as BrainCLIP, which leverages CLIP's cross-modal generalization ability to bridge the modality gap between brain activities, images, and texts. Specifically, BrainCLIP is a VAE-based architecture that transforms fMRI patterns into the CLIP embedding space by combining visual and textual supervision. Note that previous works rarely use multi-modal supervision for visual stimulus decoding. Our experiments demonstrate that textual supervision can significantly boost the performance of decoding models compared to the condition where only image supervision exists. BrainCLIP can be applied to multiple scenarios like fMRI-to-image generation, fMRI-image-matching, and fMRI-text-matching. Compared with BraVL, a recently proposed multi-modal method for fMRI-based brain decoding, BrainCLIP achieves significantly better performance on the novel class classification task. BrainCLIP also establishes a new state-of-the-art for fMRI-based natural image reconstruction in terms of high-level image features.

READ FULL TEXT

page 3

page 7

research
10/18/2021

Natural Image Reconstruction from fMRI using Deep Learning: A Survey

With the advent of brain imaging techniques and machine learning tools, ...
research
01/09/2017

Deep driven fMRI decoding of visual categories

Deep neural networks have been developed drawing inspiration from the br...
research
03/26/2023

Semantic Neural Decoding via Cross-Modal Generation

Semantic neural decoding aims to elucidate the cognitive processes of th...
research
04/18/2022

Visio-Linguistic Brain Encoding

Enabling effective brain-computer interfaces requires understanding how ...
research
07/03/2019

From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI

Reconstructing observed images from fMRI brain recordings is challenging...
research
06/09/2021

More than meets the eye: Self-supervised depth reconstruction from brain activity

In the past few years, significant advancements were made in reconstruct...
research
10/13/2022

Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features

Decoding human visual neural representations is a challenging task with ...

Please sign up or login with your details

Forgot password? Click here to reset