Towards General Purpose Vision Systems

04/01/2021
by   Tanmay Gupta, et al.
0

A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 12

02/04/2022

Webly Supervised Concept Expansion for General Purpose Vision Models

General purpose vision (GPV) systems are models that are designed to sol...
03/07/2022

One Model, Multiple Tasks: Pathways for Natural Language Understanding

This paper presents a Pathways approach to handle many tasks at once. Ou...
04/28/2022

GRIT: General Robust Image Task Benchmark

Computer vision models excel at making predictions when the test distrib...
04/26/2022

SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach

We present SkillNet-NLG, a sparsely activated approach that handles many...
09/24/2021

Towards A Measure Of General Machine Intelligence

To build increasingly general-purpose artificial intelligence systems th...
06/17/2022

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

We propose Unified-IO, a model that performs a large variety of AI tasks...
06/15/2022

A Unified Sequence Interface for Vision Tasks

While language tasks are naturally expressed in a single, unified, model...