Semantic constraints to represent common sense required in household actions for multi-modal Learning-from-observation robot

by   Katsushi Ikeuchi, et al.

The paradigm of learning-from-observation (LfO) enables a robot to learn how to perform actions by observing human-demonstrated actions. Previous research in LfO have mainly focused on the industrial domain which only consist of the observable physical constraints between a manipulating tool and the robot's working environment. In order to extend this paradigm to the household domain which consists non-observable constraints derived from a human's common sense; we introduce the idea of semantic constraints. The semantic constraints are represented similar to the physical constraints by defining a contact with an imaginary semantic environment. We thoroughly investigate the necessary and sufficient set of contact state and state transitions to understand the different types of physical and semantic constraints. We then apply our constraint representation to analyze various actions in top hit household YouTube videos and real home cooking recordings. We further categorize the frequently appearing constraint patterns into physical, semantic, and multistage task groups and verify that these groups are not only necessary but a sufficient set for covering standard household actions. Finally, we conduct a preliminary experiment using textual input to explore the possibilities of combining verbal and visual input for recognizing the task groups. Our results provide promising directions for incorporating common sense in the literature of robot teaching.


page 7

page 14


Applying Learning-from-observation to household service robots: three common-sense formulation

Utilizing a robot in a new application requires the robot to be programm...

Learning to Act and Observe in Partially Observable Domains

We consider a learning agent in a partially observable environment, with...

Markerless Visual Robot Programming by Demonstration

In this paper we present an approach for learning to imitate human behav...

Learning the Effects of Physical Actions in a Multi-modal Environment

Large Language Models (LLMs) handle physical commonsense information ina...

Organization and Understanding of a Tactile Information Dataset TacAct During Physical Human-Robot Interactions

Advanced service robots require superior tactile intelligence to guarant...

Generalizable task representation learning from human demonstration videos: a geometric approach

We study the problem of generalizable task learning from human demonstra...

Please sign up or login with your details

Forgot password? Click here to reset