ShapeWorld - A new test methodology for multimodal language understanding

04/14/2017
by   Alexander Kuhnle, et al.
0

We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities. In this approach, artificial data is automatically generated according to the experimenter's specifications. The content of the data, both during training and evaluation, can be controlled in detail, which enables tasks to be created that require true generalization abilities, in particular the combination of previously introduced concepts in novel ways. We demonstrate the potential of our methodology by evaluating various visual question answering models on four different tasks, and show how our framework gives us detailed insights into their capabilities and limitations. By open-sourcing our framework, we hope to stimulate progress in the field of multimodal language understanding.

READ FULL TEXT

page 2

page 6

page 8

research
06/05/2017

Deep learning evaluation using deep linguistic processing

We discuss problems with the standard approaches to evaluation for tasks...
research
05/16/2023

DLUE: Benchmarking Document Language Understanding

Understanding documents is central to many real-world tasks but remains ...
research
03/01/2023

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

The GPT-3.5 models have demonstrated impressive performance in various N...
research
09/10/2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

As large-scale, pre-trained language models achieve human-level and supe...
research
11/01/2018

How2: A Large-scale Dataset for Multimodal Language Understanding

In this paper, we introduce How2, a multimodal collection of instruction...
research
08/06/2020

Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

Humans are remarkably flexible when understanding new sentences that inc...
research
12/23/2020

A Multimodal Framework for the Detection of Hateful Memes

An increasingly common expression of online hate speech is multimodal in...

Please sign up or login with your details

Forgot password? Click here to reset