DeepAI AI Chat
Log In Sign Up

Neural Baby Talk

by   Jiasen Lu, et al.
Georgia Institute of Technology

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. Our approach reconciles classical slot filling approaches (that are generally better grounded in images) with modern neural captioning approaches (that are generally more natural sounding and accurate). Our approach first generates a sentence `template' with slot locations explicitly tied to specific image regions. These slots are then filled in by visual concepts identified in the regions by object detectors. The entire architecture (sentence template generation and slot filling with object detectors) is end-to-end differentiable. We verify the effectiveness of our proposed model on different image captioning tasks. On standard image captioning and novel object captioning, our model reaches state-of-the-art on both COCO and Flickr30k datasets. We also demonstrate that our model has unique advantages when the train and test distributions of scene compositions -- and hence language priors of associated captions -- are different. Code has been made available at:


page 2

page 6

page 7

page 8


Neural Twins Talk

Inspired by how the human brain employs more neural pathways when increa...

Learning Neural Templates for Recommender Dialogue System

Though recent end-to-end neural models have shown promising progress on ...

Object Counts! Bringing Explicit Detections Back into Image Captioning

The use of explicit object detectors as an intermediate step to image ca...

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...

Comprehending and Ordering Semantics for Image Captioning

Comprehending the rich semantics in an image and ordering them in lingui...

CaMEL: Mean Teacher Learning for Image Captioning

Describing images in natural language is a fundamental step towards the ...

ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

Expansion methods explore the possibility of performance bottlenecks in ...

Code Repositories


Implementation of CVPR 2016 paper

view repo