CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation

09/17/2023
by   Chen Jiang, et al.
0

The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP's strong vision-language representations to segment regions from referring expressions, while utilizing its “U-shaped” encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120 assist real-world UIBVS control in an unstructured manipulation environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

research
03/07/2021

Developing a Data-Driven Categorical Taxonomy of Emotional Expressions in Real World Human Robot Interactions

Emotions are reactions that can be expressed through a variety of social...
research
04/23/2019

RERERE: Remote Embodied Referring Expressions in Real indoor Environments

One of the long-term challenges of robotics is to enable humans to commu...
research
04/15/2019

Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Referring to objects in a natural and unambiguous manner is crucial for ...
research
07/18/2022

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

To bridge the gap between supervised semantic segmentation and real-worl...
research
07/22/2021

Investigating the effects of exploration dynamics on stiffness perception

The utility of Human-in-the-loop telerobotic systems (HiLTS) is driven i...
research
03/20/2016

Segmentation from Natural Language Expressions

In this paper we approach the novel problem of segmenting an image based...
research
09/21/2019

Language-guided Adaptive Perception with Hierarchical Symbolic Representations for Mobile Manipulators

Language is an effective medium for bi-directional communication in huma...

Please sign up or login with your details

Forgot password? Click here to reset