Are words equally surprising in audio and audio-visual comprehension?

07/14/2023
by   Pranava Madhyastha, et al.
0

We report a controlled study investigating the effect of visual information (i.e., seeing the speaker) on spoken language comprehension. We compare the ERP signature (N400) associated with each word in audio-only and audio-visual presentations of the same verbal stimuli. We assess the extent to which surprisal measures (which quantify the predictability of words in their lexical context) are generated on the basis of different types of language models (specifically n-gram and Transformer models) that predict N400 responses for each word. Our results indicate that cognitive effort differs significantly between multimodal and unimodal settings. In addition, our findings suggest that while Transformer-based models, which have access to a larger lexical context, provide a better fit in the audio-only setting, 2-gram language models are more effective in the multimodal setting. This highlights the significant impact of local lexical context on cognitive processing in a multimodal environment.

READ FULL TEXT
research
07/20/2021

Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

Despite being designed for performance rather than cognitive plausibilit...
research
02/03/2023

Controlling for Stereotypes in Multimodal Language Model Evaluation

We propose a methodology and design two benchmark sets for measuring to ...
research
10/22/2022

A Visual Tour Of Current Challenges In Multimodal Language Models

Transformer models trained on massive text corpora have become the de fa...
research
08/11/2023

Evidence of Human-Like Visual-Linguistic Integration in Multimodal Large Language Models During Predictive Language Processing

The advanced language processing abilities of large language models (LLM...
research
05/29/2020

A Comparative Study of Lexical Substitution Approaches based on Neural Language Models

Lexical substitution in context is an extremely powerful technology that...
research
05/28/2023

Lexical Retrieval Hypothesis in Multimodal Context

Multimodal corpora have become an essential language resource for langua...
research
07/11/2023

Detection Threshold of Audio Haptic Asynchrony in a Driving Context

In order to provide perceptually accurate multimodal feedback during dri...

Please sign up or login with your details

Forgot password? Click here to reset