Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-robot Handovers

12/02/2020
by   Vladimir Iashin, et al.
0

Human-robot object handover is a key skill for the future of human-robot collaboration. CORSMAL 2020 Challenge focuses on the perception part of this problem: the robot needs to estimate the filling mass of a container held by a human. Although there are powerful methods in image processing and audio processing individually, answering such a problem requires processing data from multiple sensors together. The appearance of the container, the sound of the filling, and the depth data provide essential information. We propose a multi-modal method to predict three key indicators of the filling mass: filling type, filling level, and container capacity. These indicators are then combined to estimate the filling mass of a container. Our method obtained Top-1 overall performance among all submissions to CORSMAL 2020 Challenge on both public and private subsets while showing no evidence of overfitting. Our source code is publicly available: https://github.com/v-iashin/CORSMAL

READ FULL TEXT
research
03/03/2022

Audio-Visual Object Classification for Human-Robot Collaboration

Human-robot collaboration requires the contactless estimation of the phy...
research
03/02/2022

Container Localisation and Mass Estimation with an RGB-D Camera

In the research area of human-robot interactions, the automatic estimati...
research
07/27/2022

Learning to Assess Danger from Movies for Cooperative Escape Planning in Hazardous Environments

There has been a plethora of work towards improving robot perception and...
research
06/11/2023

REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge

The Multi-modal Multiple Appropriate Facial Reaction Generation Challeng...
research
08/17/2021

AGNet: Weighing Black Holes with Deep Learning

Supermassive black holes (SMBHs) are ubiquitously found at the centers o...
research
04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...
research
07/27/2021

The CORSMAL benchmark for the prediction of the properties of containers

Acoustic and visual sensing can support the contactless estimation of th...

Please sign up or login with your details

Forgot password? Click here to reset