DeepAI
Log In Sign Up

Language models and brain alignment: beyond word-level semantics and prediction

Pretrained language models that have been trained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language. Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. Recent works suggest that the prediction of the next word is a key mechanism that contributes to the alignment between the two. What is not yet understood is whether prediction of the next word is necessary for this observed alignment or simply sufficient, and whether there are other shared mechanisms or information that is similarly important. In this work, we take a first step towards a better understanding via two simple perturbations in a popular pretrained language model. The first perturbation is to improve the model's ability to predict the next word in the specific naturalistic stimulus text that the brain recordings correspond to. We show that this indeed improves the alignment with the brain recordings. However, this improved alignment may also be due to any improved word-level or multi-word level semantics for the specific world that is described by the stimulus narrative. We aim to disentangle the contribution of next word prediction and semantic knowledge via our second perturbation: scrambling the word order at inference time, which reduces the ability to predict the next word, but maintains any newly learned word-level semantics. By comparing the alignment with brain recordings of these differently perturbed models, we show that improvements in alignment with brain recordings are due to more than improvements in next word prediction and word-level semantics.

READ FULL TEXT VIEW PDF

page 6

page 10

page 17

page 19

page 21

12/15/2022

Joint processing of linguistic properties in brains and language models

Language models have been shown to be very effective in predicting brain...
09/17/2020

Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction

How meaning is represented in the brain is still one of the big open que...
12/21/2022

Training language models for deeper understanding improves brain alignment

Building systems that achieve a deeper understanding of language is one ...
10/20/2016

Clinical Text Prediction with Numerically Grounded Conditional Language Models

Assisted text input techniques can save time and effort and improve text...
07/29/2021

Demystifying Neural Language Models' Insensitivity to Word-Order

Recent research analyzing the sensitivity of natural language understand...
01/29/2021

Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Neuroscientists evaluate deep neural networks for natural language proce...
07/06/2017

An Embedded Deep Learning based Word Prediction

Recent developments in deep learning with application to language modeli...

1 Introduction

Language models that have been pretrained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language (Wehbe et al., 2014b; Jain and Huth, 2018; Toneva and Wehbe, 2019; Caucheteux and King, 2020; Schrimpf et al., 2021; Goldstein et al., 2022). Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. Recent works suggest that the prediction of the next word is a key mechanism that contributes to the alignment between the two (Goldstein et al., 2022). What is not yet understood is whether prediction of the next word is necessary for this observed alignment or simply sufficient, and whether there are other shared mechanisms or information that is similarly important.

Figure 1: An illustration of additional types of linguistic information that may be important for alignment between language models and brain recordings. One of the simple perturbations that we employ (word scrambling) aims to affect the multi-word semantics, while preserving the word-level semantics.

One difficulty in investigating other shared mechanisms is that a model that is able to predict the next word well is also likely able to build good representations of word-level and multi-word semantics, which may also be important for brain alignment. Here, we use "word-level semantics" to refer to the non-contextualized meaning of an individual word, i.e. the lexical meaning of a word without considering a specific context, and "multi-word semantics" as the meaning that emerges at the phrasal-level (see Fig. 1). For example, in the phrase "Harry throws the broom", each word has a non-contextualized meaning and the phrase has a different meaning depending on the word order (“Harry throws the broom” vs. “The broom throws Harry”). Both multi-word semantics and word-level semantics may be affected by improvements in language modeling performance.

In this work, we aim to disentangle the contributions of these different types of information to the brain alignment via simple perturbations in a popular pretrained language model. The key idea behind our approach is to measure how these perturbations affect the alignment of a pretrained model with the brain recordings. We do this for two reasons: 1) comparing a model’s brain alignment with that of a perturbed version of the same model enables stronger inferences that would not be possible if we compare models that have different architectures and have been trained on different types or amounts of data, and 2) increasing the control over what information is added or eliminated from the model allows us to better interpret the reasons for the alignment with the brain much more than when using a model that is pretrained on an unreleased web corpus (e.g. GPT-2 (Radford et al., 2019)). We further track the effect of these perturbations on the next word prediction performance to verify that the perturbations in fact affect the next word prediction as we expect. Using these perturbations, we can control for the effect of word-level semantics and next word prediction on brain alignment.

We show that when controlling for the word-level semantics and next word prediction, a strong brain alignment is still observable, in particular in two specific brain areas that are thought to process language (Fedorenko et al., 2010; Fedorenko and Thompson-Schill, 2014)–the inferior frontal gyrus (IFG) and the angular gyrus (AG)–suggesting that the brain alignment between the language model and these brain regions is due to more than next word prediction and word-level semantics. We speculate that this alignment is due to multi-word semantics, which is consistent with previous findings about processing in these regions (Friederici, 2012; Humphreys et al., 2021).

Our main contributions are as follows:

  1. propose perturbations to pretrained language models that, when combined, can control for the effects of next word prediction and word-level semantics on the alignment with brain recordings

  2. demonstrate that tuning a language model on a validation stimulus text can increase the alignment with brain recordings that correspond to a heldout text

  3. reveal that alignment between brain recordings in two specific regions and language models is due to more than next word prediction and word-level semantics

2 Methods

2.1 Baseline Model

We use GPT-2 (Radford et al., 2019)

as the baseline pretrained language model. GPT-2 achieves strong results on a variety of natural language processing tasks such as question answering, summarization, and translation, without any specific training beyond the next word prediction

(Radford et al., 2019). GPT-2 is a causal (unidirectional) transformer pretrained using language modeling on  40 GB of text data, which were scraped using Reddit. The training objective of GPT-2 is to predict the next word given the previous words within some context. We use the small pretrained version provided by Huggingface111https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py, which has 117M parameters, 12 layers, and an embedding size of 768.

2.2 fMRI Data

To evaluate the brain alignment of GPT-2 and of its perturbations, we use publicly available fMRI data provided by Wehbe et al., 2014a, one of the largest publicly available fMRI datasets in terms of samples per subject. fMRI data were obtained from eight participants as they read chapter 9 of Harry Potter and the Sorcerer’s Stone (Rowling et al., 1998) word-by-word. The time resolution used for acquiring brain signals (TR) was 2 seconds. The chapter was divided into four runs of approximately equal length, and participants were allowed a short break at the end of each run. Each word of the chapter was presented for 0.5 seconds, after which a new word was presented immediately.

2.3 Evaluation Tasks

We use two tasks to evaluate all baseline and perturbed models: next word prediction and brain alignment. Importantly, both tasks are evaluated using the same text which corresponds to the fMRI stimulus. This text contains 5174 words, which we split in train and validation (75%) and test sets (25%).

Next word prediction

To generate the next token, we follow best practices for generation using GPT-2-based models which consist of a prediction head that is a linear layer with weights tied to the input embeddings (Wolf et al., 2020). For consistency, we use the same number of words to evaluate both the next word prediction and brain alignment, which is

consecutive words. We evaluate the next word prediction performance using the perplexity measure of a probability model defined as:

where is a probability model, in our case GPT-2, is the test set and is the size of the test set.

Brain alignment

To measure the brain alignment between a GPT-2-based model and the fMRI recordings, we employ a standard linear prediction head. This prediction head is akin to an encoding model which learns a function that maps input stimulus representations to output brain recordings and is frequently used to measure how well word representations obtained from a language model can predict brain recordings (Jain and Huth, 2018; Toneva and Wehbe, 2019; Schrimpf et al., 2021). Similarly to previous work that uses the same fMRI dataset (Toneva and Wehbe, 2019) to predict the fMRI recordings corresponding to a given TR, the prediction head averages the embeddings corresponding to each fMRI image (i.e. TR) and uses a concatenation of the previous averaged TR embeddings. The averaging is done in order to down-sample the word embeddings (words presented at 0.5 seconds) to the TR rate (2 seconds). The features of the words presented in the previous TRs are included in order to account for the lag in the hemodynamic response that fMRI records. Because the response measured by fMRI is an indirect consequence of brain activity that peaks about 6 seconds after stimulus onset, predictive methods commonly include preceding timepoints (Nishimoto et al., 2011; Wehbe et al., 2014a; Huth et al., 2016)

. This allows for a data-driven estimation of the hemodynamic response functions (HRFs) for each voxel, which is preferable to assuming one because different voxels may exhibit different HRFs.

The weights of this linear prediction head are trained using the training set and the MSE objective, and the final brain alignment is evaluated on the heldout test set. The training hyperparameters–weight decay, learning rate, and number of training epochs–are selected using a cross-validated random search (see Appendix

Model selection hyperparameters for more details about the range of hyperparameter values). We use a batch size of 32 or 16, depending on the number of voxels in the fMRI recording of each subject, and AdamW (Loshchilov and Hutter, 2018)

as the optimizer with a linear learning rate schedule. For each participant, we train four different models, where each model has a different test set corresponding to one of the four runs in the fMRI recordings. We observed that a lower MSE did not always lead to a higher correlation, and we supposed this is due to the noisy nature of fMRI data. Thus, to select the best model on the validation set, we use the skew value

(Zwillinger and Kokoska, 1999)

, that measures the deviation of a random variable’s given distribution from the normal distribution. At inference time, each model predicts one of the four runs that is heldout from its training and validation sets, and the final brain alignment results are averaged. This approach produces

models runs participants.

We evaluate the brain alignment using Pearson correlation, computed between the predictions of heldout fMRI recordings and the true corresponding data. Specifically, for a model and voxel with corresponding heldout fMRI recordings , the brain alignment is computed as follows:

where , is the input text sample to model , and are the learned prediction weights corresponding to this voxel.

All voxel-wise brain alignment scores are visualized on the corresponding participant’s brain surface using PyCortex (Gao et al., 2015).

2.4 Perturbations

Stimulus-tuning

The first perturbation is designed to improve the next-word prediction capabilities of the baseline model for the specific stimulus text that has corresponding brain recordings. Our hypothesis is that this perturbation will increase the baseline model’s alignment with brain areas that process information related to prediction of upcoming words, word-level semantics, and multi-word semantics. We achieve this by finetuning the baseline model with the language modeling objective on a portion of the stimulus text that the brain recordings correspond to. The training samples consist of non-overlapping sequences of 80 consecutive words. The training hyperparameters are selected using a cross-validated random search, which selects the best weight decay, learning rate, and number of training epochs on a validation set (see Appendix Model selection hyperparameters for more details about the range of hyperparameter values). We use a batch size of 16 and AdamW (Loshchilov and Hutter, 2018) as optimizer with a linear learning rate schedule.

By comparing the performances of the stimulus-tuned model versus the baseline on a given task, we can observe the effect of stimulus-tuning. Because we hypothesize that stimulus-tuning leads to both improved next-word prediction capabilities and improved representations of stimulus-specific semantics, stimulus-tuning itself is not sufficient to investigate the independent effect of either type of information on brain alignment.

Input scrambling

The purpose of the second perturbation is to control for the effect of word-level semantics on brain alignment. This perturbation consists of scrambling the words at inference time in each text sequence that we use to predict one fMRI TR image (i.e. 20 consecutive words) and observing the performance of a model after this perturbation. Scrambling the words at inference time may impact the next-word prediction capabilities and the multi-word semantic knowledge, but it does not affect the word-level semantics. Our hypothesis is that this perturbation will decrease a model’s alignment with brain areas that process information related to prediction of upcoming words and multi-word semantics, and that this decrease will not be related to word-level semantic information.

Figure 2: Performances of the baseline and perturbed models at the two evaluation tasks: brain alignment (A-E) and next word prediction (F). Stimulus-tuning improves both the next word prediction (stimulus-tuned vs baseline in (F)) and brain alignment (stimulus-tuned in (B) vs baseline in (A)). In contrast, scrambling reduces the next word prediction (baseline vs baseline scrambled in (F)) and reduces the brain alignment (baseline in (A) vs baseline scrambled in (D)). Despite the reduction in alignment due to the scrambling perturbation, all four models (A,B,D,E) exhibit significant alignment in language processing regions, which are visualized in (C). These observations are quantified across the language ROI in Appendix Figure 7.

3 Results

To investigate the contributions to the alignment between language in machines and language in the brain, we test the baseline and perturbed models on both brain alignment and next word prediction.

3.1 Perturbation I: stimulus-tuning

We evaluate the impact of the stimulus-tuning perturbation by comparing the performances of the stimulus-tuned and the baseline models on both evaluation tasks.

Next word prediction

In Figure 2F, we report the next word prediction performances of the stimulus-tuned and baseline models and observe the stimulus-tuned model performs better in this task. This verifies that the stimulus-tuning indeed improves the model’s ability to predict the next word in the stimulus set, as it was designed to do.

Brain alignment

Figure 2A and 2

B show the brain alignment performances (i.e. Pearson correlation) for the baseline and stimulus-tuned models for one representative subject. Only voxels that are significantly predicted are shown in the figure (one sample t-test, FDR corrected for multiple comparisons across voxels at significance level 0.05). The results for the remaining subjects are largely consistent and are shown in Appendix Figure

6. We observe that the stimulus-tuned models better align with the brain recording, particularly in many brain areas that have been previously implicated in language-specific processing (Fedorenko et al., 2010; Fedorenko and Thompson-Schill, 2014) and word semantics (Binder et al., 2009), which are visualized in Figure 2C. These observations are quantified across the language ROI in Appendix Figure 7. Furthermore, we quantify the improvement in brain alignment due to stimulus-tuning across language processing regions in Figure 3. Here, we show the average gain provided by finetuning on text specific data versus the baseline in each ROI, measured using the average performance of each model across voxels within an ROI that are significantly predicted by the stimulus-tuned model. The voxels that are significantly predicted by the stimulus-tuned model show a gain over the baseline model, across language ROI. We observe similar results when all voxels are included in this analysis (See Appendix Figure 10). We focus on the significantly predicted voxels in the main paper because not all voxels in the same brain region need to be affected by the language stimulus in the same way (i.e. the brain regions are not necessarily homogeneous). For example, previous work has found that representations from pretrained GPT-2 significantly predict 20-50% of voxels across the same language regions Toneva et al. (2020). This is especially important when comparing the predictive performance of two models–if a large number of voxels in a brain region are not predicted well by either model, then it would be difficult to detect any existing difference between the two models on the remaining voxels.

While we show that stimulus-tuning leads to both an improved ability to predict the next word and an improved alignment with fMRI recordings, we are not yet able to conclude that the improvement in alignment with the brain is due to the improved prediction of the next word. The reason is that improving a model’s ability to predict the next word may also improve other aspects of the model that are brain-relevant, such as its ability to represent word-level or multi-word semantics that are specific to the stimulus narrative.

0

5

10

15

20

25

30

Percentage gain by stimulus-tuned model over baseline

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

Figure 3:

Improvement in brain alignment due to stimulus-tuning in language processing regions. We present the percentage change in brain alignment of the stimulus-tuned model over the baseline model. Each bar corresponds to the average change across the voxels within each ROI that are significantly predicted by the stimulus-tuned model. We display the mean percentage change and standard error of the mean across the 8 participants. The voxels that are significantly predicted by the stimulus-tuned model show a large gain over the baseline model, across language ROI.

3.2 Perturbation II: input scrambling

We aim to disentangle the contribution of word-level semantics to brain alignment using an additional perturbation–input scrambling.

Next word prediction

In Figure 2F, we report the next word prediction performance of the baseline model under the scrambling perturbation. As expected, the next word prediction performance is worse than that of the baseline model.

Brain alignment

In Figure 2D, we report the brain alignment performance for one representative subject. The results for the remaining subjects are largely consistent and are shown in Appendix Figure 6. Only voxels that are significantly predicted are shown in the figure (one sample t-test, FDR corrected for multiple comparisons across voxels at significance level 0.05). Despite the scrambling procedure, we observe that the model still significantly aligns with the fMRI recordings, particularly in the ROIs related to language processing. We show the gain of performance of the baseline over the scrambled baseline in Appendix Figure 8. We observe that all language ROI align better with the baseline model by above the scrambled baseline. This suggests that even when controlling for word-level semantics, a language model is able to strongly align with brain areas that are thought to process language. While we show that scrambling the text input to a model leads to both a decrease in a model’s ability to predict the next word and a decrease in its brain alignment, we are not yet able to conclude that the decrease in alignment with the brain is due to the decreased ability to predict the next word. The reason is that the scrambling procedure can only help control for word-level semantics, but not any possible changes in multi-word semantics, which may also contribute to the decrease of alignment with the language processing brain areas.

3.3 Joint perturbations: input scrambling of stimulus-tuned models

We lastly make use of both perturbations at the same time to disentangle the effect of next word prediction and semantics on brain alignment.

Next word prediction

In Figure 2F, we report the next word prediction performance of the stimulus-tuned model under the scrambling perturbation. As expected, the next word prediction performance is worse than that of the stimulus-tuned model. However, the next word prediction performance is still better than that of the baseline model, indicating that the information gained by stimulus-tuning is not entirely counteracted by the scrambling perturbation. In addition, when compared to their unperturbed counterparts, the scrambled stimulus-tuned model and the scrambled baseline model deteriorate by a similar magnitude (2.4 points).

Brain alignment

In Figure 2E, we report the performances on the brain alignment task, showing the correlation of each voxel for one representative subject. Only voxels that are significantly predicted are shown in the figure (one sample t-test, FDR corrected for multiple comparisons across voxels at significance level 0.05). The results for the remaining subjects are largely consistent and are shown in Appendix Figure 6. Similarly to the results of applying the scrambling perturbation to the baseline model, the results from the scrambled stimulus-tuned model are also significant in many language regions. We show the gain of performance of the stimulus-tuned over the scrambled stimulus-tuned in Appendix Figure 9. We observe that all language ROI align better with the stimulus-tuned model by above the scrambled stimulus-tuned model. Similarly to the (baseline - baseline scrambled) comparison above, the (stimulus-tuned - stimulus-tuned scrambled) comparison controls for word-level semantics, but not for the difference in next word prediction or any possible changes in multi-word semantics and therefore cannot isolate the independent brain alignment due to the next word prediction or multi-word semantics.

Figure 4: Voxel-wise brain alignment for each participant from contrast that controls for effect of next word prediction and word-level semantics on brain alignment: (baseline - baseline scrambled) vs. (stimulus-tuned - stimulus-tuned scrambled). We only display the performance in voxels that were originally significantly predicted by the stimulus-tuned model. Voxels that appear in blue are better predicted by the stimulus-tuned model, even when accounting for next word prediction and word-level semantics. Voxels that appear in red are better predicted by the baseline model. Despite some variation across participants, several language regions appear in blue. We quantify these observations in Figure 5.

Cross-perturbation contrasts

To disentangle the effects of multi-word semantics and next word prediction on brain alignment, we capitalize on the fact that two contrasts–(baseline - baseline scrambled) and (stimulus-tuned - stimulus-tuned scrambled)–have a very similar next word prediction drop (2.4 points). Because previous work has shown that next-word prediction performance, similarly measured using perplexity, has a strong positive linear relationship with brain alignment (Schrimpf et al., 2021), a comparison of the brain alignment differences between these two contrasts is largely controlled for the effect of next word prediction. Additionally, the scrambling perturbation controls for the word-level semantics in each of the individual contrasts as it does not alter the non-contextualized representation of each word. Therefore, a comparison of the brain alignment differences of (baseline - baseline scrambled) and (stimulus-tuned - stimulus-tuned scrambled) is controlled for both word-level semantics and next word prediction. Any observed difference in brain alignment between these two contrasts would then be due to more than word-level semantics and next word prediction, and may be related to multi-word semantics. In Figure 4, we report the (baseline - baseline scrambled) vs. (stimulus-tuned - stimulus-tuned scrambled) contrast for each subject (also see Appendix Figure 12 for the same comparison when constrained only to the language ROI). We only show the voxels that are significantly predicted by the stimulus-tuned model (one sample t-test, FDR corrected for multiple comparisons across voxels at significance level 0.05). Voxels that appear in blue are better predicted by the stimulus-tuned model, even when controlling for next word prediction and word-level semantics. Despite some variation across participants, several language regions appear in blue. Specifically, two of the regions that were predicted better by the stimulus-tuned model (see Fig. 5)–IFG and Angular Gyrus–are also predicted better when controlling for next word prediction and word-level semantics, suggesting that the alignment with the language model in these areas is potentially due to multi-word semantics.

4 Related Works

A number of previous works have investigated the alignment between pretrained language models and brain recordings of people comprehending language. Wehbe et al. (2014b)

aligned MEG brain recordings with a Recurrent Neural Network (RNN), trained on an online archive of Harry Potter Fan Fiction.

Jain and Huth (2018)

aligned layers from a Long Short-Term Memory (LSTM) model to fMRI recordings of subjects listening to stories.

Toneva and Wehbe (2019) aligned layers from several transformer-based and recurrence-based pretrained language models with fMRI recordings of people reading a chapter of a book. Schrimpf et al. (2021) investigated the alignment of fMRI and ECoG recordings of people reading and listening to language to representations obtained from more recent NLP systems. Similarly, Caucheteux and King (2020)

find that the representations from a large number of neural networks align well with MEG recordings of people reading, specifically when the representations are obtained from the middle layers of these networks. This finding replicates the results of

Jain and Huth (2018) and Toneva and Wehbe (2019) that were obtained using fMRI data. Goldstein et al. (2022) align representations from a neural network and ECoG recordings of people listening to stories and find that ECoG electrodes can predict the neural network representation of upcoming words in the narrative.

0

10

20

30

40

50

60

70

Percentage gain by (stimulus-tuned - stimulus-tuned

scrambled) over (baseline - baseline scrambled)

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

Figure 5: Quantification of the impact of the scrambling perturbation on the stimulus-tuned model versus the impact of this perturbation on the baseline model in language processing regions. Each bar corresponds to the average change across the significantly predicted voxels by the stimulus-tuned model within each ROI. We display the mean percentage change and standard error of the mean across the 8 participants. Two of the four language regions that were predicted better by the stimulus-tuned model (see Fig.3)–the IFG and Angular Gyrus–are also predicted better when controlling for next word prediction and word-level semantics.

Some works have also investigated the alignment with brain recordings of fine-tuned language models. Schwartz et al. (2019) finetuned a pretrained BERT to predict fMRI and MEG recordings of people reading a chapter of a book. The resulting fine-tuned model leads to improved prediction of previously unseen brain recordings, specifically in regions that are known to support language processing. However, it is not clear what type of information has been induced in the fine-tuned BERT model that has contributed to the improved alignment with brain recordings. Oota et al. (2022) investigate the alignment between fMRI recordings of people comprehending language and BERT that has been fine-tuned on a set of NLP tasks, such as co-reference resolution and question-answering. The authors find that the best aligning task-tuned BERT varies according to whether the brain recordings correspond to reading or listening to the language stimuli. Our approach and research questions are complementary to these previous works.

Our work also relates to a growing body of research on disentangling the contributions of different types of information to the alignment between brain recordings and language models. Toneva et al. (2020) present an approach to disentangle supra-word meaning from lexical meaning in language models and show that the supra-word meaning is predictive of fMRI recordings in two language regions (anterior and posterior temporal lobes). Caucheteux et al. (2021) and Reddy and Wehbe (2021) aim to disentangle alignment due to syntactic and semantic processing. Toneva et al. (2022) examine whether a representations obtained from a language model aligns with different language processing regions in a similar or different ways.

Similarly some works use some perturbations related to word order to investigate some properties of language models. Pandia and Ettinger (2021) introduced distracting content to test how robustly language models retain and use that information for prediction, proving that language models appear particularly susceptible to factors of semantic similarity and word position. Papadimitriou et al. (2022)

applied a perturbation (scrambling method) to investigate where the semantic and syntactical processing is taking place in BERT, revealing that early layers care more about lexicon while the later layer care more about word order. Our current work contributes to this research direction by examining the effects of scrambling on both brain alignment as well as language modeling performance.

5 Discussion and Conclusion

This work aims to deepen our understanding of the existing alignment between language models and brain recordings. We proposed two perturbations to pretrained language models that, when used together, can control for the effects of next word prediction and word-level semantics on the alignment with brain recordings.

We showed that the first perturbation that we termed stimulus-tuning (i.e. finetuning a pretrained model on a validation stimulus text) can increase the alignment with brain recordings that correspond to a heldout text, particularly in several language processing brain areas. We quantified this improvement by comparing the stimulus-tuned model and the baseline in these brain areas. Stimulus-tuning may improve brain alignment due to improvements in next word prediction or improved ability to represent word-level or multi-word semantics that are specific to the stimulus narrative.

Using the second perturbation that we termed text scrambling, we showed that the improved next-word prediction capabilities of the stimulus-tuned model is not the only reason for improved brain alignment. We showed that applying the scrambling perturbation to the pretrained baseline and stimulus-tuned models leads to a very similar drop in performance at next word prediction, but to different reductions in brain alignment. Specifically, we show that improvements in alignment with brain recordings in two language processing regions–Inferior Frontal Gyrus (IFG) and Angular Gyrus (AG)–are due to more than improvements in next word prediction and word-level semantics. One possible reason for this improvement in brain alignment is improved capabilities to represent multi-word semantics that are specific to the stimulus text. This hypothesis aligns with previous work that has found the IFG to be sensitive to syntax (Friederici et al., 2003; Friederici, 2012) and the AG to multi-word event structure (Ramanan et al., 2018; Humphreys et al., 2021). Note that the fact that we do not find strong effects in other language regions does not necessarily mean that they do not process multi-word semantics. Future work can build on the approach presented here to further investigate this hypothesis and gain a deeper understanding of the type of multi-word semantics that best align with each brain region. Furthermore, despite the presence of a strong positive correlation between next-word prediction and brain alignment reported by Schrimpf et al. (2021), this relationship is not perfectly linear so it’s possible that the subtraction that we employ does not perfectly control for the effect of the next word prediction capabilities. However our analyses are informative as they take a step towards a deeper understanding for the reasons behind the significant alignment between brain recordings and recent NLP models.

Our findings are also relevant to the research direction in NLP that examines what language models can learn from text only. We show that finetuning a language model with a small amount of text can significantly increase its alignment with never-before-seen brain recordings, and that this improvement in brain alignment is not purely due to next word prediction or word-level meaning. This finding suggests that training a language model with little additional text can improve its representations of multi-word semantics in a brain-relevant way.

Acknowledgments

The authors would like to thank Shailee Jain for helpful feedback on an earlier version of this manuscript.

References

  • J. R. Binder, R. H. Desai, W. W. Graves, and L. L. Conant (2009) Where is the semantic system? a critical review and meta-analysis of 120 functional neuroimaging studies. Cerebral cortex 19 (12), pp. 2767–2796. Cited by: §3.1.
  • C. Caucheteux, A. Gramfort, and J. King (2021) Decomposing lexical and compositional syntax and semantics with deep language models. arXiv preprint arXiv:2103.01620. Cited by: §4.
  • C. Caucheteux and J. King (2020) Language processing in brains and deep neural networks: computational convergence and its limits. BioRxiv. Cited by: §1, §4.
  • E. Fedorenko, P.-J. Hsieh, A. Nieto-Castanon, S. Whitfield-Gabrieli, and N. Kanwisher (2010) New method for fMRI investigations of language: defining ROIs functionally in individual subjects. Journal of Neurophysiology 104 (2), pp. 1177–1194. Cited by: §1, §3.1.
  • E. Fedorenko and S. L. Thompson-Schill (2014) Reworking the language network. Trends in cognitive sciences 18 (3), pp. 120–126. Cited by: §1, §3.1.
  • A. D. Friederici, S. Rüschemeyer, A. Hahne, and C. J. Fiebach (2003) The role of left inferior frontal and superior temporal cortex in sentence comprehension: localizing syntactic and semantic processes. Cerebral cortex 13 (2), pp. 170–177. Cited by: §5.
  • A. D. Friederici (2012) The cortical language circuit: from auditory perception to sentence comprehension. Trends in cognitive sciences 16 (5), pp. 262–268. Cited by: §1, §5.
  • J. S. Gao, A. G. Huth, M. D. Lescroart, and J. L. Gallant (2015) Pycortex: an interactive surface visualizer for fmri. Frontiers in neuroinformatics, pp. 23. Cited by: §2.3.
  • A. Goldstein, Z. Zada, E. Buchnik, M. Schain, A. Price, B. Aubrey, S. A. Nastase, A. Feder, D. Emanuel, A. Cohen, et al. (2022) Shared computational principles for language processing in humans and deep language models. Nature neuroscience 25 (3), pp. 369–380. Cited by: §1, §4.
  • G. F. Humphreys, M. A. L. Ralph, and J. S. Simons (2021) A unifying account of angular gyrus contributions to episodic and semantic cognition. Trends in Neurosciences 44 (6), pp. 452–463. Cited by: §1, §5.
  • A. G. Huth, W. A. De Heer, T. L. Griffiths, F. E. Theunissen, and J. L. Gallant (2016) Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532 (7600), pp. 453–458. Cited by: §2.3.
  • S. Jain and A. Huth (2018) Incorporating context into language encoding models for fMRI. In Advances in Neural Information Processing Systems, pp. 6628–6637. Cited by: §1, §2.3, §4.
  • I. Loshchilov and F. Hutter (2018) Decoupled weight decay regularization. In International Conference on Learning Representations, Cited by: §2.3, §2.4.
  • S. Nishimoto, A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, and J. L. Gallant (2011) Reconstructing visual experiences from brain activity evoked by natural movies. Current biology 21 (19), pp. 1641–1646. Cited by: §2.3.
  • S. R. Oota, J. Arora, V. Agarwal, M. Marreddy, M. Gupta, and B. R. Surampudi (2022) Neural language taskonomy: which nlp tasks are the most predictive of fmri brain activity?. arXiv preprint arXiv:2205.01404. Cited by: §4.
  • L. Pandia and A. Ettinger (2021) Sorting through the noise: testing robustness of information processing in pre-trained language models. arXiv preprint arXiv:2109.12393. Cited by: §4.
  • I. Papadimitriou, R. Futrell, and K. Mahowald (2022)

    When classifying grammatical role, bert doesn’t care about word order… except when it matters

    .
    arXiv preprint arXiv:2203.06204. Cited by: §4.
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. (2019) Language models are unsupervised multitask learners. OpenAI blog 1 (8), pp. 9. Cited by: §1, §2.1.
  • S. Ramanan, O. Piguet, and M. Irish (2018) Rethinking the role of the angular gyrus in remembering the past and imagining the future: the contextual integration model. The Neuroscientist 24 (4), pp. 342–352. Cited by: §5.
  • A. J. Reddy and L. Wehbe (2021) Can fmri reveal the representation of syntactic structure in the brain?. Advances in Neural Information Processing Systems 34, pp. 9843–9856. Cited by: §4.
  • J.K. Rowling, M. GrandPre, M. GrandPré, T. Taylor, A. A. L. Books, and S. Inc (1998) Harry potter and the sorcerer’s stone. Harry Potter, A.A. Levine Books. External Links: ISBN 9780590353403, LCCN 97039059, Link Cited by: §2.2.
  • M. Schrimpf, I. A. Blank, G. Tuckute, C. Kauf, E. A. Hosseini, N. Kanwisher, J. B. Tenenbaum, and E. Fedorenko (2021) The neural architecture of language: integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences 118 (45). Cited by: §1, §2.3, §3.3, §4, §5.
  • D. Schwartz, M. Toneva, and L. Wehbe (2019) Inducing brain-relevant bias in natural language processing models. Advances in neural information processing systems 32. Cited by: §4.
  • M. Toneva, T. M. Mitchell, and L. Wehbe (2020) Combining computational controls with natural text reveals new aspects of meaning composition. bioRxiv. Cited by: §3.1, §4.
  • M. Toneva and L. Wehbe (2019) Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). Advances in Neural Information Processing Systems 32. Cited by: §1, §2.3, §4.
  • M. Toneva, J. Williams, A. Bollu, C. Dann, and L. Wehbe (2022) Same cause; different effects in the brain. Causal Learning and Reasoning. Cited by: §4.
  • L. Wehbe, B. Murphy, P. Talukdar, A. Fyshe, A. Ramdas, and T. Mitchell (2014a) Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. PloS one 9 (11), pp. e112575. Cited by: §2.2, §2.3.
  • L. Wehbe, A. Vaswani, K. Knight, and T. Mitchell (2014b) Aligning context-based statistical models of language with brain activity during reading. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 233–243. Cited by: §1, §4.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush (2020) Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, pp. 38–45. External Links: Link, Document Cited by: §2.3.
  • D. Zwillinger and S. Kokoska (1999) CRC standard probability and statistics tables and formulae. Crc Press. Cited by: §2.3.

Appendix

Model selection hyperparameters

The training hyperparameters for both types of models are selected using a Random Search cross-validation with AdamW as optimizer. For each subject and each model we tested 100 hyperparameter combinations sampled in a random manner. In particular the possible values for the learning rate were between and , the possible values for the weight decay were between and . Depending on the fMRI dimensionality of each subject and the available memory we use 32 or 16 as batch size. For each trial we set a limit of 40 epochs, with a linear learning rate scheduler: the learning rate decreases linearly from the initial value to 0. However we use the Early stopping to interrupt a trial if the error on the validation loss starts increasing. The best parameter are used to train the final model using the validation set.

Additional resuls

Figure 6: Performances of the baseline and perturbed models of all participants at the brain alignment task. Stimulus-tuning improves the brain alignment (stimulus-tuned in (.b) vs baseline in (.a)) for almost all participants. In contrast, scrambling reduces the brain alignment (baseline in (.a) vs baseline scrambled in (.c)). Despite the reduction in alignment due to the scrambling perturbation, all four models (.a,.b,.c,.d) exhibit significant alignment in language processing regions.

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

0.04

0.06

0.08

0.10

0.12

0.14

Pearson Correlation

Baseline

Baseline scrambled

Stimulus-tuned

Stimulus-tuned scrambled

Figure 7: Average correlations and standard errors of the four models in each ROI across all subjects. The averages are computed across only those voxels within each ROI that are significantly predicted by the corresponding model (one sample t-test, FDR corrected for multiple comparisons across voxels at significance level 0.05). When considering only voxels that are significantly predicted by each model, we find that the stimulus-tuned model has a higher correlation than the baseline in a number of language ROI–IFG, Anterior Temporal, Posterior Temporal, and Angular Gyrus. Scrambling reduces the correlation of both the baseline and the stimulus-tuned models.

0

20

40

60

80

Percentage gain by baseline model over baseline scrambled

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

Figure 8: Quantification of the impact of the scrambling perturbation on the baseline model in language processing regions. Each bar corresponds to the average change across the voxels within each ROI. We display the mean percentage change and standard error of the mean across the 8 participants. All the ROIs are predicted better by the baseline model (see Fig.10).

0

20

40

60

80

Percentage gain by stimulus-tuned over

stimulus-tuned scrambled

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

Figure 9: Quantification of the impact of the scrambling perturbation on the stimulus-tuned model in language processing regions. Each bar corresponds to the average change across the voxels within each ROI. We display the mean percentage change and standard error of the mean across the 8 participants. All the ROIs are predicted better by the stimulus-tuned model (see Fig.10).

10

0

10

20

30

Percentage gain by stimulus-tuned model

over baseline - all voxels

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

Figure 10:

Quantification of the improvement in brain alignment due to stimulus-tuning in language processing regions. We present the percentage change in brain alignment of the stimulus-tuned model over the baseline model. Each bar corresponds to the average change across all the voxels within each ROI. We display the mean percentage change and standard error of the mean across the 8 participants. The stimulus-tuned model predicts better almost all the language ROI, with a large variance with respect to the comparison across only the significantly predicted voxels by the stimulus-tuned model (see Fig.

3)

40

20

0

20

40

Percentage gain by (stimulus-tuned - stimulus-tuned

scrambled) over (baseline - baseline scrambled)

- all voxels

MFG

IFG

IFGorb

AntTemp

PostTemp

AngularG

pCingulate

dmpfc

Figure 11: Quantification of the impact of the scrambling perturbation on the stimulus-tuned model versus the impact of this perturbation on the baseline model in language processing regions. Each bar corresponds to the average change across the voxels within each ROI. We display the mean percentage change and standard error of the mean across the 8 participants. Two of the four language regions that were predicted better by the stimulus-tuned model (see Fig.3)–the IFG and Angular Gyrus–are also predicted better when controlling for next word prediction and word-level semantics, with a large variance with respect to the comparison across only the significantly predicted voxels by the stimulus-tuned model (see Fig.5)
Figure 12: Qualitative comparison of the impact of scrambling on brain alignment for all 8 participants using two different models: stimulus-tuned and baseline. We only display the performance in voxels that were originally significantly predicted by the stimulus-tuned model belonging to the language ROIs. Voxels that appear in blue are better predicted by the stimulus-tuned model, even when accounting for next word prediction and word-level semantics. Voxels that appear in red are better predicted by the baseline model. Despite some variation across participants, several language regions appear in blue.