TraVLR: Now You See It, Now You Don't! Evaluating Cross-Modal Transfer of Visio-Linguistic Reasoning

11/21/2021
by   Keng Ji Chow, et al.
24

Numerous visio-linguistic (V+L) representation learning methods have been developed, yet existing datasets do not evaluate the extent to which they represent visual and linguistic concepts in a unified space. Inspired by the crosslingual transfer and psycholinguistics literature, we propose a novel evaluation setting for V+L models: zero-shot cross-modal transfer. Existing V+L benchmarks also often report global accuracy scores on the entire dataset, rendering it difficult to pinpoint the specific reasoning tasks that models fail and succeed at. To address this issue and enable the evaluation of cross-modal transfer, we present TraVLR, a synthetic dataset comprising four V+L reasoning tasks. Each example encodes the scene bimodally such that either modality can be dropped during training/testing with no loss of relevant information. TraVLR's training and testing distributions are also constrained along task-relevant dimensions, enabling the evaluation of out-of-distribution generalisation. We evaluate four state-of-the-art V+L models and find that although they perform well on the test set from the same modality, all models fail to transfer cross-modally and have limited success accommodating the addition or deletion of one modality. In alignment with prior work, we also find these models to require large amounts of data to learn simple spatial relationships. We release TraVLR as an open challenge for the research community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2016

Cross-Modal Scene Networks

People can recognize scenes across many different modalities beyond natu...
research
09/20/2019

CNN-based RGB-D Salient Object Detection: Learn, Select and Fuse

The goal of this work is to present a systematic solution for RGB-D sali...
research
10/19/2020

DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Cross-modal retrieval relies on accurate models to retrieve relevant res...
research
11/07/2021

Cross-modal Zero-shot Hashing by Label Attributes Embedding

Cross-modal hashing (CMH) is one of the most promising methods in cross-...
research
03/28/2018

Probabilistic Knowledge Transfer for Deep Representation Learning

Knowledge Transfer (KT) techniques tackle the problem of transferring th...
research
04/28/2019

Translate-to-Recognize Networks for RGB-D Scene Recognition

Cross-modal transfer is helpful to enhance modality-specific discriminat...
research
10/08/2019

A Test for Shared Patterns in Cross-modal Brain Activation Analysis

Determining the extent to which different cognitive modalities (understo...

Please sign up or login with your details

Forgot password? Click here to reset