Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

10/01/2017
by   Anh Nguyen, et al.
0

We present a new method to translate videos to commands for robotic manipulation using Deep Recurrent Neural Networks (RNN). Our framework first extracts deep features from the input video frames with a deep Convolutional Neural Networks (CNN). Two RNN layers with an encoder-decoder architecture are then used to encode the visual features and sequentially generate the output words as the command. We demonstrate that the translation accuracy can be improved by allowing a smooth transaction between two RNN layers and using the state-of-the-art feature extractor. The experimental results on our new challenging dataset show that our approach outperforms recent methods by a fair margin. Furthermore, we combine the proposed translation module with the vision and planning system to let a robot perform various manipulation tasks. Finally, we demonstrate the effectiveness of our framework on a full-size humanoid robot WALK-MAN.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
03/23/2019

V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic Manipulation

We propose V2CNet, a new deep learning framework to automatically transl...
research
06/22/2016

Learning text representation using recurrent convolutional neural network with highway layers

Recently, the rapid development of word embedding and neural networks ha...
research
04/12/2016

Video Description using Bidirectional Recurrent Neural Networks

Although traditionally used in the machine translation field, the encode...
research
11/16/2017

An Encoder-Decoder Framework Translating Natural Language to Database Queries

Machine translation is going through a radical revolution, driven by the...
research
05/23/2023

FlowChroma – A Deep Recurrent Neural Network for Video Colorization

We develop an automated video colorization framework that minimizes the ...
research
06/28/2021

Recurrent neural network transducer for Japanese and Chinese offline handwritten text recognition

In this paper, we propose an RNN-Transducer model for recognizing Japane...
research
11/13/2017

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

With the rising number of interconnected devices and sensors, modeling d...

Please sign up or login with your details

Forgot password? Click here to reset