A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

02/09/2023
by   Uri Berger, et al.
0

We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between visual input and linguistic choices by training classifiers to predict the probability of expressing a property from raw images, and find evidence supporting the claim that linguistic properties are constrained by visual context across languages. We complement this investigation with a corpus study, taking the test case of numerals. Specifically, we use existing annotations (number or type of objects) to investigate the effect of different visual conditions on the use of numeral expressions in captions, and show that similar patterns emerge across languages. Our methods and findings both confirm and extend existing research in the cognitive literature. We additionally discuss possible applications for language generation.

READ FULL TEXT

page 1

page 4

page 8

page 14

research
04/28/2017

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

We present SuperPivot, an analysis method for low-resource languages tha...
research
05/18/2018

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Linguistic style is an essential part of written communication, with the...
research
05/28/2021

Linguistic Structures as Weak Supervision for Visual Scene Graph Generation

Prior work in scene graph generation requires categorical supervision at...
research
03/22/2021

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

With the success of large-scale pre-training and multilingual modeling i...
research
08/16/2021

A visual remote associates test and its validation

The Remote Associates Test (RAT) is a widely used test for measuring cre...
research
09/20/2022

Register Variation Remains Stable Across 60 Languages

This paper measures the stability of cross-linguistic register variation...
research
05/25/2023

Linguistic Properties of Truthful Response

We investigate the phenomenon of an LLM's untruthful response using a la...

Please sign up or login with your details

Forgot password? Click here to reset