Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach
We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75 parameters. Besides, our method generalizes better to both unseen objects and instructions. Moreover, with only 25 top competing approach.
READ FULL TEXT