The Fluidity of Concept Representations in Human Brain Signals

02/20/2020 ∙ by Eva Hendrikx, et al. ∙ University of Amsterdam 0

Cognitive theories of human language processing often distinguish between concrete and abstract concepts. In this work, we analyze the discriminability of concrete and abstract concepts in fMRI data using a range of analysis methods. We find that the distinction can be decoded from the signal with an accuracy significantly above chance, but it is not found to be a relevant structuring factor in clustering and relational analyses. From our detailed comparison, we obtain the impression that human concept representations are more fluid than dichotomous categories can capture. We argue that fluid concept representations lead to more realistic models of human language processing because they better capture the ambiguity and underspecification present in natural language use.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Language researchers often group words into categories. Lexicographers categorize words by their syntactic category, historical linguists categorize them by their ancestors, computational linguists categorize by frequency, and psycholinguists distinguish words by categories such as concreteness, imageability, meaningfulness, and age of acquisition Coltheart (1981).

The distributional word representations that are most commonly used by computer scientists nowadays are high-dimensional vectors. The dimensions of these vectors cannot easily be interpreted as linguistic categories

Lewis and Steedman (2013). Relations between words are instead characterized by their representational similarity, which is measured as proximity in the vector space. Several gold standard datasets for similarity assessment exist and provide a fundamental resource for the development and evaluation of computational word representations. Limitations of similarity judgments are low inter-dataset and inter-annotator agreement Batchkarov et al. (2016). This arises because similarity between words is a highly fluid concept that ranges over multiple properties (e.g., shape, semantic category, emotion associated with the word).

In cognitive science research, psycholinguistic categories play an important role and the low interpretability of distributional representations can pose a challenge for interdisciplinary research. Therefore, recent work by faruqui2015sparse aims to project distributional representations back onto interpretable linguistic dimensions. To which extent these dichotomous linguistic categories of words are reflected in the human brain remains a topic of debate, as various studies come to different conclusions. For example, rapp2002 claim that patient data convincingly shows different underlying representations for verbs and nouns, whereas bird2003 find no differences in processing, when controlling for confounding factors, such as imageability.

In this work, we focus on the concreteness category because it has been vividly discussed in light of the embodiment debate Barsalou (2008); Pecher et al. (2011). It is often assumed that concrete and abstract words are represented differently in the brain, but topological analyses have not yet reached consensus on the involved brain areas Mestres-Missé et al. (2009); Tettamanti et al. (2008); Wallentin et al. (2005); Wise et al. (2000). We use brain activation data from fMRI analyses because they provide high spatial resolution. Previous studies which examined linking hypotheses between computational representations and fMRI patterns of concrete and abstract words yielded contradictory results Anderson et al. (2017); Bulat et al. (2017). We aim at consolidating their analyses by using multiple context paradigms.

Previous work indicates that the interpretation of fMRI data strongly depends on the analysis method and the evaluation metrics even within a consistent experimental paradigm

Beinborn et al. (2019). To increase the transparency of our results we apply a range of computational analysis methods and make our code publicly available at Our analysis of the concreteness category is the first to examine relational effects of various computational representations for different context paradigms (to the best of our knowledge). Furthermore, we compare data-driven searchlight analysis with more traditional region of interest selections.

2 Computational Analysis of Cognitive Concept Representations

We briefly sketch previous computational analyses with cognitive data and then focus on the distinction between concrete and abstract concepts.

2.1 Analyzing Cognitive Data

Computational analyses of cognitive data provide an interdisciplinary bridge between computational and cognitive models of language. On the one hand, experimental data such as response times or subjective ratings have a strong influence on the development and evaluation of computational models of language Fernandez Monsalve et al. (2012); Resnik and Lin (2010). Eye-tracking measures are used to investigate human attention and guide the development of models for sentence understanding Barrett et al. (2018)

, sentiment analysis,

Hollenstein et al. (2019) and multi-modal processing Sugano and Bulling (2016). Fine-grained syntactic and semantic processing is often modeled using electroencephalography data Hale et al. (2018); Fyshe et al. (2016); Frank et al. (2013); Sudre et al. (2012).

On the other hand, cognitive researchers conduct computational analyses to detect patterns in experimental data. This is particularly important when dealing with high-dimensional data from magnetoencephalography and functional magnetic resonance imaging (fMRI) scans to study the functional localization of language processing. We focus on fMRI data in this paper and distinguish between two types of computational analyses: discriminative and relational analyses.

Discriminative analyses investigate whether it is possible to group the representational patterns into classes. A popular discriminative analysis is known as decoding. For this task, a computational model learns to discriminate the fMRI patterns for different linguistic categories, e.g., abstract and concrete concepts Anderson et al. (2017), various syntactic classes Bingel et al. (2016); Li et al. (2018), or levels of syntactic complexity Brennan et al. (2016). Another discriminative analysis is clustering. For clustering analyses, the categories of interest are not defined a priori, but the data is automatically grouped according to shared characteristics in the representations.

Relational analyses aim at establishing a link between the fMRI signal and a computational representation of the stimulus. The results by Mitchell2008 indicate that it is not only possible to distinguish between semantic categories, but that a model can even learn to directly encode which word a participant is reading. These encoding analyses provide an interesting evaluation paradigm for the cognitive plausibility of computational representations of language Abnar et al. (2018); Anderson et al. (2017); Bulat et al. (2017); Xu et al. (2016); Fyshe et al. (2014).

Recent studies including natural speech find more complex and widespread activation patterns which indicate that fine-grained categorical and topological differences observed for controlled language stimuli cannot be generalized Huth et al. (2016); Hamilton and Huth (2018). The results of computational studies using brain data from humans processing full sentences Pereira et al. (2018) and even full stories Jain and Huth (2018); Dehghani et al. (2017); Brennan et al. (2016); Wehbe et al. (2014) are hard to generalize because the interpretation of language–brain modelling experiments strongly depends on the chosen evaluation metric Beinborn et al. (2019) and the differences between models Gauthier and Ivanova (2018). We therefore also choose to simplify the encoding task and apply representational similarity analysis which makes it possible to compare relations between concepts in computational and cognitive representations more directly Kriegeskorte et al. (2008).

Figure 1: Three context paradigms for the concept tree. Sentence is shown on top, picture in the middle, and word cloud in the bottom. The examples are extracted from the stimuli set that was used by pereira2018 during the collection of fMRI data.

2.2 Concreteness of Words

Concreteness describes the extent to which a word can be embodied by perceptual experiences Walker and Hulme (1999). Concrete words refer to concepts that are easily perceivable by the senses, for example, a banana has a recognizable outlook, feel and taste. Abstract concepts describe theoretical concepts that cannot be directly grounded in perception, for example democracy. Psycholinguistic research on lexical access indicates that concrete words are usually processed more rapidly and accurately than abstract concepts Kroll and Merves (1986). However, some patients show a reversed pattern Breedin et al. (1994). This hints at different underlying brain representations for abstract and concrete concepts and led to the development of two main theories.

The dual coding theory implies that all concepts are stored verbally (as linguistic relations between related concepts) in the brain, while only concrete concepts are simultaneously stored non-verbally (according to their perceptual information; Paivio 1991). In contrast, the context availability theory disregards non-verbal storage and attributes the effects to concrete concepts having richer semantic relations Schwanenflugel and Shoben (1983). It has been shown that the recognition of abstract concepts is improved in a supporting context Schwanenflugel et al. (1988).

Computational representations of words can be based on textual or visual distributional characteristics. Several researchers assume that visual representations are beneficial for modeling concrete concepts and textual representations are better at reflecting relations between abstract concepts Bruni et al. (2014); Hill et al. (2014); Lazaridou et al. (2015); Beinborn et al. (2018). Recent encoding analyses led to inconsistent results. bulat2017 find that visual representations are better for predicting brain activation patterns of concrete concepts, whereas anderson2017 do not find a difference between modalities. Interestingly, the stimuli used in the two studies differ (words + drawings vs words, respectively). In the current work, we analyze the same concepts presented in different presentation contexts to further explore the influence of experimental paradigms.

3 Data

In this section, we provide details on the fMRI dataset and the preprocessing methods.

3.1 FMRI Dataset

pereira2018 presented 180 concepts (128 nouns, 22 verbs, 23 adjectives, 6 adverbs, 1 function word) to 16 participants and measured their fMRI response. Participants were instructed to think of the target concept in terms of the corresponding context. In line with pereira2018, we use the term concept instead of word to account for this particular experimental setting.


The concepts were presented to the participants within varying context paradigms: sentences with the target concept in a bold font; pictures presented alongside the target concept; and word clouds with the target concept in the center, surrounded by semantically related words. Figure 1 provides examples for the different presentation contexts for the concept tree.


The concepts are annotated with concreteness ratings Brysbaert et al. (2014)

. We categorize concepts as concrete if their concreteness score lies at least half a standard deviation above the mean, and abstract if it lies at least half a standard deviation below the mean.

111mean = 3.49, std = 1.08 This results in 69 concrete and 63 abstract concepts.

Figure 2: Accuracy scores for the decoding analysis for different regions. Each color represents a subject. Significant results are indicated by filled circles. The abbreviations stand for the regions Inferior Frontal Gyrus (IFG), Middle Temporal Gyrus (MTG), FusiForm Gyrus (FFG), posterior cingulate (PCC), precuneus (PCUN), and parahipocampal gyrus (PHG), and for the stable voxel selection across paradigms (stable).

3.2 Pre-processing

The fMRI dataset has already undergone standard pre-processing. During the experiment by pereira2018, each concept has been presented to the participants four to six times in each context paradigm. The scans have been averaged over these instances to obtain a generalized concept scan for each paradigm.

FMRI scans are commonly represented by a matrix of voxels. A voxel can conceptually be understood as a 3-dimensional fixed-size pixel in a brain scan.222In the data by pereira2018, the dimensions of these voxels are 2x2x2 mm and the number of voxels per participant ranges from 145,303 to 212,742. As not all voxels are expected to represent relevant information, computational analyses are usually performed on a subset of the total voxels Wehbe et al. (2014); Brennan et al. (2016). In our analysis, we compare different subsets by selecting regions of interest, determining stable voxels, and performing a searchlight analysis.

Regions of Interest (ROI)

A common reduction method restricts the brain response to voxels that fall within a pre-selected set of brain regions. For the discrimination between abstract and concrete concepts, the regions of interests found in previous studies vary Mestres-Missé et al. (2009); Tettamanti et al. (2008); Wallentin et al. (2005); Wise et al. (2000). A large meta-analysis finds abstract concepts elicit more activation in the inferior frontal gyrus (IFG) and middle temporal gyrus (MTG), and concrete concepts elicit more activation in the posterior cingulate (PCC), precuneus (PCUN), fusiform gyrus (FFG), and parahippocampal gyrus (PHG) Wang et al. (2010). All discriminating regions were located in the left hemisphere. For our analyses, we select these regions.

Stable Voxels

Selecting stable voxels is a more data-driven approach to voxel selection. This strategy aims to select voxels that consistently capture the representation of a concept. The concepts have been presented to the participants in three paradigms, which require different processing steps, e.g., sentences require reading skills to combine characters into a meaning representation, pictures require visual processing to combine pixels into an image representation, and word clouds require spatial understanding to parse the word cloud. To approximate the joint underlying conceptual representations, we determine 500 voxels with the most stable activation pattern across presentation paradigms. This selection method is inspired by the pre-processing applied in Mitchell2008 to detect stable voxels across experimental runs.


KriegeskorteSearchLight proposed another type of data-driven voxel selection. They only analyze a small sphere of neighbouring voxels and assign the result to the center voxel of the sphere. The sphere moves through the entire brain, so that each voxel is used as the center once. We use a 4mm radius resulting in a sphere of 33 voxels (except for the margins of the brain) for each analysis step.

4 Discriminative Analyses

Discriminative analyses such as decoding and clustering operate only on the fMRI scans. The algorithms identify hubs in the high-dimensional data and the output is evaluated with respect to our linguistic category concreteness. Decoding is a common analysis method for fMRI data Poldrack et al. (2011)

. The data is split into training and test data and the algorithm learns to find a hyperplane separating the representations in the training data according to the corresponding class annotations (concrete/abstract in our case). Based on this separation, the class for the representations in the test data is predicted. In clustering, the goal is to automatically find a separation of the data into

homogeneous groups without any prior class bias. Our experimental code is based on the fMRI evaluation framework by beinborn2019. For classification and clustering, we use standard algorithms from the python library scikit-learn Pedregosa et al. (2011).

(a) sentence
(b) picture
(c) word cloud
Figure 3: Decoding results of the searchlight analyses for the three context paradigms. We highlight areas with an average decoding accuracy . The colors indicate the average rank of the area (relative to all brain areas) over all subjects.

4.1 Decoding

We split the dataset into 11 folds of 12 concepts and perform cross-validation using a support vector machine with hyperparameter settings as recommended by song2011.

333Radial basis function kernel with the gamma coefficient set to scale.

ROI & Stable Voxel Results

The accuracy scores for the decoding task for different regions are visualized in Figure 2. The three subplots refer to the three presentation paradigms sentences (left), pictures (middle), and word clouds (right). Significance of results is determined by comparing to the average score of a baseline on randomly permuted labels which was repeated 1,000 times.444The significance level was set to 0.05. Note that the smallest p-value obtainable with this distribution is 0.001. Filled circles visualize subjects with significant decoding accuracy, empty circles indicate insignificant results.

We can see that the model learns to discriminate between the two categories for more than 75% of the subjects in the regions IFG, MTG, and FFG. This finding is consistent across context paradigms with the best results obtained for the picture

paradigm. As in previous work, the variance between subjects is quite high with accuracy values over 80%. The results for the remaining regions are close to random. It should be noted, that some of these regions include less voxels which might degrade the expressivity of the model. The data-driven approach to select stable voxels across paradigms yields accuracy scores that are competitive with the best regions.

Searchlight results

In order to abstract from the size of a region, we conduct the searchlight analysis. For each sphere, we calculate the decoding accuracy. To compare the sphere results across participants, we calculate the average decoding accuracy of all spheres within a brain area.555Brain areas are determined by a mapping according to the automated anatomical labeling atlas as indicated by pereira2018. We then rank the areas from highest to lowest accuracy for all participants. In Figure 3, we visualize the average ranks of the areas with a decoding accuracy .

We identify a larger number of decoding regions in the picture context than in the sentence and wordcloud context. Strikingly, we consistently obtain the highest ranks for the middle temporal gyrus (MTG) in all context paradigms. However, in the paradigms that only use linguistic stimuli (i.e., sentence and word cloud), the left MTG is ranked on top, while it is the right MTG for the picture context. In line with previous work stating that linguistic processing mainly elicits activity in a left lateralized network (e.g., Frost et al. 1999; Knecht et al. 2000), the majority of the highly ranked areas in the linguistic pardigms are located in the left hemisphere.

4.2 Clustering

We have seen that it is generally possible to distinguish between concrete and abstract concepts to a certain extent. This indicates that the representational patterns for the two categories differ in at least one dimension. The clustering analysis provides further information of the concreteness distinction could be considered as a ”natural” class for the fMRI representations of semantic concepts. We set

to 2 and run a -means algorithm to categorize the fMRI representations for all concepts. We then analyze to which extent these two clusters correspond to our abstract/concrete distinction.


Table 1 provides the results of the clustering averaged over subjects. We investigate the proportion of abstract concepts in each cluster. For readability, we only report the first cluster and only the three regions that worked well in decoding.666The tendencies remain consistent for the other cluster and regions. We see that across regions and paradigms the proportion of abstract concepts in the cluster always reflects the proportions of the dataset (0.48). This shows that concreteness is not the most prevalent category for grouping the fMRI patterns of the stimuli. Another latent factor seems to be more dominant. A first manual inspection of the clustering results did not yet raise a hypothesis regarding the characteristics of this latent factor.

We have seen that the concreteness distinction can to a certain extent be decoded from the fMRI signal. However, this does not mean that it is the predominant distinctive feature for the stimuli. In order to get a better idea about the structure of the representations, we conduct relational analyses.

5 Relational Analyses

While discriminative analyses examine whether two classes of concepts can generally be separated, relational analyses provide more information on the representations of individual concepts. They indicate to which extent computational models simulate the relations between concepts that are observed in the human cognitive signal.

Computational Representations

We use Glove embeddings as a textual representation of the concepts Pennington et al. (2014) because they are recommended for psycholinguistic experiments Pereira et al. (2016).777We used the 300-dimensional model trained on 42 billion tokens available here: We generate visual representations of each concept by running a pre-trained ResNet from the keras library on the six images representing the concept in the picture paradigm and averaging over the six resulting image vectors.

5.1 Encoding

For encoding, a linear regression model learns to predict the corresponding fMRI pattern for a given computational representation of a concept in the training data. We split the data into 11 folds of 12 concepts and perform cross-validation. The trained model is evaluated by predicting the fMRI pattern for the concepts in the test data. We calculate the pairwise accuracy based on the implementation in beinborn2019 using the

single match definition and cosine distance. This evaluation metric tests if the model prediction for a test concept, e.g. tree, is more similar to the observed scan for tree than to the observed scans for another concept. This comparison is performed for all concepts in the dataset and the accuracy for the test concept is averaged. Accuracy is calculated separately for abstract and concrete concepts.

Region sentence picture word cloud
IFG 0.49 ( 0.05) 0.46 ( 0.11) 0.49 ( 0.05)
MTG 0.48 ( 0.07) 0.45 ( 0.14) 0.46 ( 0.05)
FFG 0.50 ( 0.09) 0.49 ( 0.07) 0.48 ( 0.05)
Stable 0.52 ( 0.13) 0.45 ( 0.15) 0.47 ( 0.11)
Table 1: Proportion of abstract concepts in the first cluster learned for different regions averaged over all subjects. The results for the second cluster are similar. Note that the results reflect the proportion of abstract concepts in the whole dataset (0.48).
(a) sentence
(b) picture
(c) word cloud
Figure 4:

Accuracy results for the encoding analysis as a density estimation over all subjects for the Inferior Frontal Gyrus (IFG), Middle Temporal Gyrus (MTG), FusiForm Gyrus (FFG), and the selected stable voxels (Stable). Accuracy is calculated separately for abstract and concrete concepts. The stimuli are encoded by the model using textual, visual, and random computational representations.


Figure 4 shows the encoding accuracy for all participants with significant results in decoding as violin plots. For clarity, we only plot the regions for which we obtained good decoding results in the previous analysis. The average accuracy is calculated separately for concrete and abstract concepts. We compare the encoding results for the textual and visual representations to randomly initialized vectors. The random results are averaged over 1,000 different initializations.

We see that all encoding results are close to random and we do not find a strong difference between textual and visual representations. The accuracy is slightly higher in the picture paradigm for the FFG area, but the variance of the results is also higher. Interestingly, we observe the pattern that abstract concepts seem to be slightly easier to encode than concrete concepts. The effect is very small, but it is consistent across all experimental settings. However, we even observe it for the random representations which indicates that the effect cannot be explained linguistically, because the random representations do not distinguish between concrete and abstract concepts. We have not yet converged on a convincing hypothesis for explaining this effect. Given the small dataset, it might just be a distributional artifact.

5.2 Representational Similarity Analysis

For conducting representational similarity analysis Kriegeskorte et al. (2008), we do not need to learn an intermediate mapping model. Instead, the relations between the fMRI activation vectors are directly compared to the relations that can be observed in the computational representations of the stimulus Anderson et al. (2013). We measure the relations between vectors within cognitive and computational representations by cosine distance and compare between fMRI data and the computational representations using Spearman correlation.

(a) sentence
(b) picture
(c) word cloud
Figure 5: Spearman correlation between computational representations of the stimuli (textual, visual, random) and the fMRI activation patterns in the Inferior Frontal Gyrus (IFG), Middle Temporal Gyrus (MTG), FusiForm Gyrus (FFG), and the selected stable voxels (stable). Correlation is calculated separately for abstract and concrete concepts.


Figure 5 visualizes the results for the Spearman correlation in violin plots for participants with significant results in decoding. We can see that the correlation values are very low in all cases. The higher variance and slightly higher accuracies of the visual representations could be related to the perception of object form in the image stimulus, rather than semantic processing of the context (which has been shown to occur in the FFG by e.g., Whatmough et al. 2002).

The results for the encoding task and the representational similarity analysis indicate that the way relations between words are modeled in the computational representations differs from the relations observed in human language processing. In contrast to previous analyses, we do not find a difference between the modelling quality of concrete and abstract concepts in textual and visual representations. While the decoding analysis supported the assumption that words can be categorized based on the concreteness distinction, the relational analyses reveal that the problem is more complex.

6 Discussion

We have seen that different analysis techniques have a strong influence on the interpretation of the results. Based on the decoding results, it can be tempting to draw too simplistic conclusions regarding the categorizability of words, which are not supported by the relational analyses. Our comparison only represents a first exploratory analysis. In order to come to more stable conclusions, we plan to explore a wider range of linguistic categories and conduct more fine-grained error analyses. We believe that it is important to focus less on the significance of results in a single experimental paradigm and instead explore a range of analysis techniques to test alternative explanations.

The robustness of results can be attacked by conducting sensitive sanity checks and comparing to more reasonable baselines. Seemingly positive results can sometimes even be obtained with a scrambled signal or turn out to be not significantly different from results obtained by a carefully fine-tuned random baseline involving no linguistic knowledge. The interpretation of the results is further complicated by the problem that fMRI scans produce a high-dimensional and noisy signal that needs to be pre-processed using several statistical correction techniques. Such pre-processing steps have a strong influence on the results of an fMRI study Strother (2006) and their effect on modelling analyses has not yet been sufficiently studied. Furthermore, the high inter-subjective variance in fMRI analyses poses an additional challenge.

In line with the idea by hamilton2018revolution, we have seen that the kind of context in which a word is presented also plays an important role. To develop a better understanding of the differences and commonalities of visual and textual semantic processing, grounded multimodal language models are a promising development Baroni (2016). We hypothesize that even within one modality, context can determine whether a word is experienced as more abstract or concrete. Contextualized language models like ELMO Peters et al. (2018) and BERT Devlin et al. (2018) might therefore provide more suitable representations.

From our detailed analysis, we obtain the impression that human concept representations are more fluid than dichotomous categories can capture. We are dealing with high-dimensional data and one-dimensional explanations can only provide one perspective. Recent analyses on the interpretability of computational language models attempt to decode linguistic features such as syntactic structure from computational models Alishahi et al. (2019). The challenges for interpretation are similar: the observation that we can decode a category from the signal does not necessarily imply that the signal is structured according to the category. We believe that embracing the many shades of grey between concepts will lead to more realistic models of cognitive processing because natural language is fluid, ambiguous, and multi-faceted.

7 Conclusion

We analyzed to which extent the distinction between concrete and abstract concepts can be extracted from fMRI data using a range of discriminative and relational analysis methods. We find that the distinction can be decoded from the signal with an accuracy significantly above chance, but it is not found to be a relevant structuring factor in the clustering and relational analyses.

We do not discourage the use of dichotomous categories for analysis because they might be a useful explanatory simplification for linguistic phenomena. However, our exploratory analyses indicate that researchers should be aware that the cognitive processing structure is more complex. Language meaning is constantly evolving, and susceptible to manipulation. In the long run, accepting the fluidity of concept representations seems more fruitful than artificially mapping the high-dimensional probabilistic representations back into binary categories.


The work by L. Beinborn was funded by the Netherlands Organisation for Scientific Research (NWO), through a Gravitation Grant (024.001.006) to the Language in Interaction Consortium.


  • S. Abnar, R. Ahmed, M. Mijnheer, and W. Zuidema (2018) Experiential, distributional and dependency-based word embeddings have complementary roles in decoding brain activity. In Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018), pp. 57–66. External Links: Link Cited by: §2.1.
  • A. Alishahi, G. Chrupała, and T. Linzen (2019)

    Analyzing and interpreting neural networks for nlp: a report on the first blackboxnlp workshop

    Natural Language Engineering 25, pp. 543–557. External Links: Link Cited by: §6.
  • A. J. Anderson, E. Bruni, U. Bordignon, M. Poesio, and M. Baroni (2013) Of words, eyes and brains: correlating image-based distributional semantic models with neural representations of concepts. In

    Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

    pp. 1960–1970. External Links: Link Cited by: §5.2.
  • A. J. Anderson, D. Kiela, S. Clark, and M. Poesio (2017) Visually grounded and textual semantic models differentially decode brain activity associated with concrete and abstract nouns. Transactions of the Association for Computational Linguistics 5, pp. 17–30. External Links: Link Cited by: §1, §2.1, §2.1.
  • M. Baroni (2016) Grounding distributional semantics in the visual world. Language and Linguistics Compass 10, pp. 3–13. External Links: Link Cited by: §6.
  • M. Barrett, J. Bingel, N. Hollenstein, M. Rei, and A. Søgaard (2018) Sequence classification with human attention. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 302–312. External Links: Link Cited by: §2.1.
  • L. W. Barsalou (2008) Grounded cognition. Annu. Rev. Psychol. 59, pp. 617–645. External Links: Link Cited by: §1.
  • M. Batchkarov, T. Kober, J. Reffin, J. Weeds, and D. Weir (2016) A critique of word similarity as a method for evaluating distributional semantic models. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pp. 7–12. External Links: Link Cited by: §1.
  • L. Beinborn, S. Abnar, and R. Choenni (2019) Robust evaluation of language-brain encoding experiments. arXiv preprint arXiv:1904.02547. External Links: Link Cited by: §1, §2.1.
  • L. Beinborn, T. Botschen, and I. Gurevych (2018) Multimodal grounding for language processing. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 2325–2339. External Links: Link Cited by: §2.2.
  • J. Bingel, M. Barrett, and A. Søgaard (2016) Extracting token-level signals of syntactic processing from fMRI - with an application to PoS induction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 747–755. External Links: Link Cited by: §2.1.
  • S. D. Breedin, E. M. Saffran, and H. B. Coslett (1994) Reversal of the concreteness effect in a patient with semantic dementia. Cognitive neuropsychology 11, pp. 617–660. External Links: Link Cited by: §2.2.
  • J. R. Brennan, E. P. Stabler, S. E. Van Wagenen, W. Luh, and J. T. Hale (2016) Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and language 157, pp. 81–94. External Links: Link Cited by: §2.1, §2.1, §3.2.
  • E. Bruni, N. Tran, and M. Baroni (2014) Multimodal distributional semantics.

    Journal of Artificial Intelligence Research

    49, pp. 1–47.
    External Links: Link Cited by: §2.2.
  • M. Brysbaert, A. B. Warriner, and V. Kuperman (2014) Concreteness ratings for 40 thousand generally known english word lemmas. Behavior research methods 46, pp. 904–911. External Links: Link Cited by: §3.1.
  • L. Bulat, S. Clark, and E. Shutova (2017) Speaking, seeing, understanding: correlating semantic models with conceptual representation in the brain. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1081–1091. External Links: Link Cited by: §1, §2.1.
  • M. Coltheart (1981) The mrc psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A 33, pp. 497–505. External Links: Link Cited by: §1.
  • M. Dehghani, R. Boghrati, K. Man, J. Hoover, S. I. Gimbel, A. Vaswani, J. D. Zevin, M. H. Immordino-Yang, A. S. Gordon, A. Damasio, et al. (2017) Decoding the neural representation of story meanings across languages. Human brain mapping 38, pp. 6096–6106. External Links: Link Cited by: §2.1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. External Links: Link Cited by: §6.
  • I. Fernandez Monsalve, S. L. Frank, and G. Vigliocco (2012) Lexical surprisal as a general predictor of reading time. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 398–408. External Links: Link Cited by: §2.1.
  • S. L. Frank, L. J. Otten, G. Galli, and G. Vigliocco (2013) Word surprisal predicts n400 amplitude during reading. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 878–883. External Links: Link Cited by: §2.1.
  • J. A. Frost, J. R. Binder, J. A. Springer, T. A. Hammeke, P. S. F. Bellgowan, S. M. Rao, and R. W. Cox (1999) Language processing is strongly left lateralized in both sexes: evidence from functional mri. Brain 122, pp. 199–208. External Links: Link Cited by: §4.1.
  • A. Fyshe, G. Sudre, L. Wehbe, N. Rafidi, and T. M. Mitchell (2016) The semantics of adjective noun phrases in the human brain. bioRxiv, pp. 089615. External Links: Link Cited by: §2.1.
  • A. Fyshe, P. P. Talukdar, B. Murphy, and T. M. Mitchell (2014) Interpretable semantic vectors from a joint model of brain- and text- based meaning. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 489–499. External Links: Link Cited by: §2.1.
  • J. Gauthier and A. Ivanova (2018) Does the brain represent words? An evaluation of brain decoding studies of language understanding. arXiv:1806.00591. External Links: Link Cited by: §2.1.
  • J. Hale, C. Dyer, A. Kuncoro, and J. R. Brennan (2018) Finding syntax in human encephalography with beam search. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers), pp. 2727–2736. External Links: Link Cited by: §2.1.
  • L. S. Hamilton and A. G. Huth (2018) The revolution will not be controlled: natural stimuli in speech neuroscience. Language, Cognition and Neuroscience, pp. 1–10. External Links: Link Cited by: §2.1.
  • F. Hill, R. Reichart, and A. Korhonen (2014) Multi-modal models for concrete and abstract concept meaning. Transactions of the Association for Computational Linguistics 2, pp. 285–296. External Links: Link Cited by: §2.2.
  • N. Hollenstein, M. Barrett, M. Troendle, F. Bigiolli, N. Langer, and C. Zhang (2019) Advancing NLP with cognitive language processing signals. arXiv preprint arXiv:1904.02682. External Links: Link Cited by: §2.1.
  • A. G. Huth, W. A. de Heer, T. L. Griffiths, F. E. Theunissen, and J. L. Gallant (2016) Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, pp. 453. External Links: Link Cited by: §2.1.
  • S. Jain and A. G. Huth (2018) Incorporating context into language encoding models for fmri. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 6628–6637. External Links: Link Cited by: §2.1.
  • S. Knecht, M. Deppe, B. Dräger, L. Bobe, H. Lohmann, E. B. Ringelstein, and H. Henningsen (2000) Language lateralization in healthy right-handers. Brain 123, pp. 74–81. External Links: Link Cited by: §4.1.
  • N. Kriegeskorte, M. Mur, and P. A. Bandettini (2008) Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience 2, pp. 4. External Links: Link Cited by: §2.1, §5.2.
  • J. F. Kroll and J. S. Merves (1986) Lexical access for concrete and abstract words.. Journal of Experimental Psychology: Learning, Memory, and Cognition 12, pp. 92. External Links: Link Cited by: §2.2.
  • A. Lazaridou, N. T. Pham, and M. Baroni (2015) Combining language and vision with a multimodal skip-gram model. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 153–163. External Links: Link Cited by: §2.2.
  • M. Lewis and M. Steedman (2013) Combined distributional and logical semantics. Transactions of the Association for Computational Linguistics 1, pp. 179–192. External Links: Link Cited by: §1.
  • J. Li, M. Fabre, W. Luh, and J. Hale (2018) The role of syntax during pronoun resolution: evidence from fMRI. In Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing, pp. 56–64. External Links: Link Cited by: §2.1.
  • A. Mestres-Missé, T. F. Münte, and A. Rodriguez-Fornells (2009) Functional neuroanatomy of contextual acquisition of concrete and abstract words. Journal of Cognitive neuroscience 21, pp. 2154–2171. External Links: Link Cited by: §1, §3.2.
  • A. Paivio (1991) Dual coding theory: retrospect and current status.. Canadian Journal of Psychology/Revue canadienne de psychologie 45, pp. 255. External Links: Link Cited by: §2.2.
  • D. Pecher, I. Boot, and S. van Dantzig (2011) Abstract concepts: sensory-motor grounding, metaphors, and beyond. In Psychology of learning and motivation, B. Ross (Ed.), pp. 217–248. External Links: Link Cited by: §1.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011)

    Scikit-learn: machine learning in Python

    Journal of Machine Learning Research 12, pp. 2825–2830. External Links: Link Cited by: §4.
  • J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. External Links: Link Cited by: §5.
  • F. Pereira, S. Gershman, S. Ritter, and M. Botvinick (2016) A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data. Cognitive neuropsychology 33, pp. 175–190. External Links: Link Cited by: §5.
  • F. Pereira, B. Lou, B. Pritchett, S. Ritter, S. J. Gershman, N. Kanwisher, M. Botvinick, and E. Fedorenko (2018) Toward a universal decoder of linguistic meaning from brain activation. Nature communications 9, pp. 963. External Links: Link Cited by: §2.1.
  • M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer (2018) Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. External Links: Link Cited by: §6.
  • R. A. Poldrack, J. A. Mumford, and T. E. Nichols (2011) Handbook of functional mri data analysis. Cambridge University Press. External Links: Link Cited by: §4.
  • P. Resnik and J. Lin (2010) Evaluation of NLP systems. The handbook of computational linguistics and natural language processing 57, pp. 271–295. External Links: Link Cited by: §2.1.
  • P. J. Schwanenflugel, K. K. Harnishfeger, and R. W. Stowe (1988) Context availability and lexical decisions for abstract and concrete words. Journal of Memory and Language 27, pp. 499–520. External Links: Link Cited by: §2.2.
  • P. J. Schwanenflugel and E. J. Shoben (1983) Differential context effects in the comprehension of abstract and concrete verbal materials.. Journal of Experimental Psychology: Learning, Memory, and Cognition 9, pp. 82. External Links: Link Cited by: §2.2.
  • S. C. Strother (2006) Evaluating fMRI preprocessing pipelines. IEEE Engineering in Medicine and Biology Magazine 25, pp. 27–41. External Links: Link Cited by: §6.
  • G. Sudre, D. Pomerleau, M. Palatucci, L. Wehbe, A. Fyshe, R. Salmelin, and T. Mitchell (2012) Tracking neural coding of perceptual and semantic features of concrete nouns. NeuroImage 62, pp. 451–463. External Links: Link Cited by: §2.1.
  • Y. Sugano and A. Bulling (2016)

    Seeing with humans: gaze-assisted neural image captioning

    arXiv preprint arXiv:1608.05203. External Links: Link Cited by: §2.1.
  • M. Tettamanti, R. Manenti, P. A. Della Rosa, A. Falini, D. Perani, S. F. Cappa, and A. Moro (2008) Negation in the brain: modulating action representations. Neuroimage 43, pp. 358–367. External Links: Link Cited by: §1, §3.2.
  • I. Walker and C. Hulme (1999) Concrete words are easier to recall than abstract words: evidence for a semantic contribution to short-term serial recall.. Journal of Experimental Psychology: Learning, Memory, and Cognition 25, pp. 1256. External Links: Link Cited by: §2.2.
  • M. Wallentin, S. Østergaard, T. E. Lund, L. Østergaard, and A. Roepstorff (2005) Concrete spatial language: See what I mean?. Brain and language 92, pp. 221–233. External Links: Link Cited by: §1, §3.2.
  • J. Wang, J. A. Conder, D. N. Blitzer, and S. V. Shinkareva (2010) Neural representation of abstract and concrete concepts: a meta-analysis of neuroimaging studies. Human brain mapping 31, pp. 1459–1468. External Links: Link Cited by: §3.2.
  • L. Wehbe, B. Murphy, P. Talukdar, A. Fyshe, A. Ramdas, and T. Mitchell (2014) Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. PloS one 9, pp. e112575. External Links: Link Cited by: §2.1, §3.2.
  • C. Whatmough, H. Chertkow, S. Murtha, and K. Hanratty (2002) Dissociable brain regions process object meaning and object structure during picture naming. Neuropsychologia 40, pp. 174–186. External Links: Link Cited by: §5.2.
  • R. J. S. Wise, D. H. Howard, C. J. Mummery, P. C. Fletcher, A. P. Leff, C. Büchel, and S. K. Scott (2000) Noun imageability and the temporal lobes. Neuropsychologia 38, pp. 985–994. External Links: Link Cited by: §1, §3.2.
  • H. Xu, B. Murphy, and A. Fyshe (2016) BrainBench: a brain-image test suite for distributional semantic models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2017–2021. External Links: Link Cited by: §2.1.