Im-Promptu: In-Context Composition from Image Prompts

05/26/2023
by   Bhishma Dedhia, et al.
0

Large language models are few-shot learners that can solve diverse tasks from a handful of demonstrations. This implicit understanding of tasks suggests that the attention mechanisms over word tokens may play a role in analogical reasoning. In this work, we investigate whether analogical reasoning can enable in-context composition over composable elements of visual stimuli. First, we introduce a suite of three benchmarks to test the generalization properties of a visual in-context learner. We formalize the notion of an analogy-based in-context learner and use it to design a meta-learning framework called Im-Promptu. Whereas the requisite token granularity for language is well established, the appropriate compositional granularity for enabling in-context generalization in visual stimuli is usually unspecified. To this end, we use Im-Promptu to train multiple agents with different levels of compositionality, including vector representations, patch representations, and object slots. Our experiments reveal tradeoffs between extrapolation abilities and the degree of compositionality, with non-compositional representations extending learned composition rules to unseen domains but performing poorly on combinatorial tasks. Patch-based representations require patches to contain entire objects for robust extrapolation. At the same time, object-centric tokenizers coupled with a cross-attention module generate consistent and high-fidelity solutions, with these inductive biases being particularly crucial for compositional generalization. Lastly, we demonstrate a use case of Im-Promptu as an intuitive programming interface for image generation.

READ FULL TEXT

page 17

page 24

page 26

page 27

page 28

page 29

page 32

page 33

research
08/01/2023

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

We consider the problem of eliciting compositional generalization capabi...
research
05/08/2023

How Do In-Context Examples Affect Compositional Generalization?

Compositional generalization–understanding unseen combinations of seen p...
research
07/17/2023

Does Visual Pretraining Help End-to-End Reasoning?

We aim to investigate whether end-to-end learning of visual reasoning ca...
research
03/31/2018

Visual Question Reasoning on General Dependency Tree

The collaborative reasoning for understanding each image-question pair i...
research
07/08/2020

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

In this work, we focus on an analogical reasoning task that contains ric...
research
01/28/2021

COMPAS: Representation Learning with Compositional Part Sharing for Few-Shot Classification

Few-shot image classification consists of two consecutive learning proce...
research
12/19/2020

On (Emergent) Systematic Generalisation and Compositionality in Visual Referential Games with Straight-Through Gumbel-Softmax Estimator

The drivers of compositionality in artificial languages that emerge when...

Please sign up or login with your details

Forgot password? Click here to reset