How to Tidy Up a Table: Fusing Visual and Semantic Commonsense Reasoning for Robotic Tasks with Vague Objectives

07/21/2023
by   Yiqing Xu, et al.
0

Vague objectives in many real-life scenarios pose long-standing challenges for robotics, as defining rules, rewards, or constraints for optimization is difficult. Tasks like tidying a messy table may appear simple for humans, but articulating the criteria for tidiness is complex due to the ambiguity and flexibility in commonsense reasoning. Recent advancement in Large Language Models (LLMs) offers us an opportunity to reason over these vague objectives: learned from extensive human data, LLMs capture meaningful common sense about human behavior. However, as LLMs are trained solely on language input, they may struggle with robotic tasks due to their limited capacity to account for perception and low-level controls. In this work, we propose a simple approach to solve the task of table tidying, an example of robotic tasks with vague objectives. Specifically, the task of tidying a table involves not just clustering objects by type and functionality for semantic tidiness but also considering spatial-visual relations of objects for a visually pleasing arrangement, termed as visual tidiness. We propose to learn a lightweight, image-based tidiness score function to ground the semantically tidy policy of LLMs to achieve visual tidiness. We innovatively train the tidiness score using synthetic data gathered using random walks from a few tidy configurations. Such trajectories naturally encode the order of tidiness, thereby eliminating the need for laborious and expensive human demonstrations. Our empirical results show that our pipeline can be applied to unseen objects and complex 3D arrangements.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
10/13/2022

Large Language Models are few(1)-shot Table Reasoners

Recent literature has shown that large language models (LLMs) are genera...
research
04/01/2021

Commonsense Spatial Reasoning for Visually Intelligent Agents

Service robots are expected to reliably make sense of complex, fast-chan...
research
06/30/2023

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

Large language models have recently shown human level performance on a v...
research
06/07/2018

A Simple Method for Commonsense Reasoning

Commonsense reasoning is a long-standing challenge for deep learning. Fo...
research
08/28/2022

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

Building a conversational embodied agent to execute real-life tasks has ...
research
05/22/2022

Housekeep: Tidying Virtual Households using Commonsense Reasoning

We introduce Housekeep, a benchmark to evaluate commonsense reasoning in...
research
10/19/2021

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Geometric organization of objects into semantically meaningful arrangeme...

Please sign up or login with your details

Forgot password? Click here to reset