Can Language Models Understand Physical Concepts?

05/23/2023
by   Lei Li, et al.
0

Language models (LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite. However, it is not yet clear whether LMs can understand physical concepts in the human world. To investigate this, we design a benchmark VEC that covers the tasks of (i) Visual concepts, such as the shape and material of objects, and (ii) Embodied Concepts, learned from the interaction with the world such as the temperature of objects. Our zero (few)-shot prompting results show that the understanding of certain visual concepts emerges as scaling up LMs, but there are still basic concepts to which the scaling law does not apply. For example, OPT-175B performs close to humans with a zero-shot accuracy of 85% on the material concept, yet behaves like random guessing on the mass concept. Instead, vision-augmented LMs such as CLIP and BLIP achieve a human-level understanding of embodied concepts. Analysis indicates that the rich semantics in visual representation can serve as a valuable source of embodied knowledge. Inspired by this, we propose a distillation method to transfer embodied knowledge from VLMs to LMs, achieving performance gain comparable with that by scaling up the parameters of LMs 134x. Our dataset is available at <https://github.com/TobiasLee/VEC>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

Recently, Pretrained Language Models (PLMs) have been serving as general...
research
09/12/2022

Leveraging Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...
research
06/12/2023

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

The visual classification performance of vision-language models such as ...
research
02/04/2022

Webly Supervised Concept Expansion for General Purpose Vision Models

General purpose vision (GPV) systems are models that are designed to sol...
research
12/06/2018

Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs

Humans can infer concepts from image pairs and apply those in the physic...
research
02/04/2020

Visual Concept-Metaconcept Learning

Humans reason with concepts and metaconcepts: we recognize red and green...
research
07/25/2022

On the Learnability of Physical Concepts: Can a Neural Network Understand What's Real?

We revisit the classic signal-to-symbol barrier in light of the remarkab...

Please sign up or login with your details

Forgot password? Click here to reset