Recurrent Models for Situation Recognition

03/18/2017
by   Arun Mallya, et al.
0

This work proposes Recurrent Neural Network (RNN) models to predict structured 'image situations' -- actions and noun entities fulfilling semantic roles related to the action. In contrast to prior work relying on Conditional Random Fields (CRFs), we use a specialized action prediction network followed by an RNN for noun prediction. Our system obtains state-of-the-art accuracy on the challenging recent imSitu dataset, beating CRF-based models, including ones trained with additional data. Further, we show that specialized features learned from situation prediction can be transferred to the task of image captioning to more accurately describe human-object interactions.

READ FULL TEXT

page 6

page 7

page 8

research
07/26/2018

Recurrent Fusion Network for Image Captioning

Recently, much advance has been made in image captioning, and an encoder...
research
08/28/2019

Image Captioning with Sparse Recurrent Neural Network

Recurrent Neural Network (RNN) has been deployed as the de facto model t...
research
05/07/2015

Language Models for Image Captioning: The Quirks and What Works

Two recent approaches have achieved state-of-the-art results in image ca...
research
11/14/2016

A New Recurrent Neural CRF for Learning Non-linear Edge Features

Conditional Random Field (CRF) and recurrent neural models have achieved...
research
03/26/2020

Grounded Situation Recognition

We introduce Grounded Situation Recognition (GSR), a task that requires ...
research
11/18/2019

Action Anticipation with RBF KernelizedFeature Mapping RNN

We introduce a novel Recurrent Neural Network-based algorithm for future...
research
11/18/2019

Action Anticipation with RBF Kernelized Feature Mapping RNN

We introduce a novel Recurrent Neural Network-based algorithm for future...

Please sign up or login with your details

Forgot password? Click here to reset