Egocentric Hand-object Interaction Detection
In this paper, we propose a method to jointly determine the status of hand-object interaction. This is crucial for egocentric human activity understanding and interaction. From a computer vision perspective, we believe that determining whether a hand is interacting with an object depends on whether there is an interactive hand pose and whether the hand is touching the object. Thus, we extract the hand pose, hand-object masks to jointly determine the interaction status. In order to solve the problem of hand pose estimation due to in-hand object occlusion, we use a multi-cam system to capture hand pose data from multiple perspectives. We evaluate and compare our method with the most recent work from Shan et al. <cit.> on selected images from EPIC-KITCHENS <cit.> dataset and achieve 89% accuracy on HOI (hand-object interaction) detection which is comparative to Shan's (92%). However, for real-time performance, our method can run over 30 FPS which is much more efficient than Shan's (1∼2 FPS). A demo can be found from https://www.youtube.com/watch?v=XVj3zBuynmQ
READ FULL TEXT