Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

06/28/2023
by   William Berrios, et al.
0

We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.

READ FULL TEXT
research
08/02/2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

We introduce OpenFlamingo, a family of autoregressive vision-language mo...
research
03/14/2023

Eliciting Latent Predictions from Transformers with the Tuned Lens

We analyze transformers from the perspective of iterative inference, see...
research
07/10/2023

AmadeusGPT: a natural language interface for interactive animal behavioral analysis

The process of quantifying and analyzing animal behavior involves transl...
research
02/13/2023

Implications of the Convergence of Language and Vision Model Geometries

Large-scale pretrained language models (LMs) are said to “lack the abili...
research
12/29/2020

Visual Probing and Correction of Object Recognition Models with Interactive user feedback

With the advent of state-of-the-art machine learning and deep learning t...
research
07/05/2021

Exploring Data Pipelines through the Process Lens: a Reference Model forComputer Vision

Researchers have identified datasets used for training computer vision (...
research
05/28/2023

InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion

This paper introduces a novel approach to evaluating deep learning model...

Please sign up or login with your details

Forgot password? Click here to reset