Learning Object-Action Relations from Bimanual Human Demonstration Using Graph Networks

08/22/2019
by   Christian R. G. Dreher, et al.
0

Recognising human actions is a vital task for a humanoid robot, especially in domains like programming by demonstration. Previous approaches on action recognition primarily focused on the overall prevalent action being executed, but we argue that bimanual human motion cannot always be described sufficiently with a single label. We therefore present a novel approach for action classification and segmentation by learning object-action relations, while considering the actions executed by each hand individually. Interpreting the scene as a graph of symbolic spatial relations between the hands and objects enables us to train a neural network architecture specifically designed to operate on variable-sized graphs. In order to produce scene graphs, we present a feature extraction pipeline involving human pose estimation and object detection for the calculation of the spatial relations from RGB-D videos. We evaluated the proposed classifier on a new RGB-D video dataset showing daily action sequences focusing on bimanual manipulation actions. It consists of 6 subjects performing 9 tasks with 10 repetitions each, which leads to 540 video recordings with 2 hours and 18 minutes total playtime and per-hand ground truth action labels for each frame. We show that our classifier is able to reliably identify (macro F1-score of 0.86) the true executed action of each hand within its top 3 predictions on a frame-by-frame basis without prior temporal action segmentation.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset