Predicting Ergonomic Risks During Indoor Object Manipulation Using Spatiotemporal Convolutional Networks

02/14/2019
by   Behnoosh Parsa, et al.
0

Automated real-time prediction of the ergonomic risks of manipulating objects is a key unsolved challenge in developing effective human-robot collaboration systems for logistics and manufacturing applications. We present a foundational paradigm to address this challenge by formulating the problem as one of action segmentation from RGB-D camera videos. Spatial features are first learned using a deep convolutional model from the video frames, which are then fed sequentially to temporal convolutional networks to semantically segment the frames into a hierarchy of actions, which are either ergonomically safe, require monitoring, or need immediate attention. For performance evaluation, in addition to an open-source kitchen dataset, we collected a new dataset comprising twenty individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights. Results show very high (87-94) labels for videos lasting over two minutes and comprising a large number of actions.

READ FULL TEXT

page 1

page 4

page 5

page 6

research
07/28/2021

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

The dominant paradigm in spatiotemporal action detection is to classify ...
research
11/22/2017

Video Semantic Object Segmentation by Self-Adaptation of DCNN

This paper proposes a new framework for semantic segmentation of objects...
research
05/01/2020

The AVA-Kinetics Localized Human Actions Video Dataset

This paper describes the AVA-Kinetics localized human actions video data...
research
05/22/2015

Efficient Large Scale Video Classification

Video classification has advanced tremendously over the recent years. A ...
research
03/24/2022

Egocentric Prediction of Action Target in 3D

We are interested in anticipating as early as possible the target locati...
research
02/02/2020

Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

A number of techniques for interpretability have been presented for deep...

Please sign up or login with your details

Forgot password? Click here to reset