DeepAI AI Chat
Log In Sign Up

Neural Baby Talk

03/27/2018
by   Jiasen Lu, et al.
Georgia Institute of Technology
0

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. Our approach reconciles classical slot filling approaches (that are generally better grounded in images) with modern neural captioning approaches (that are generally more natural sounding and accurate). Our approach first generates a sentence `template' with slot locations explicitly tied to specific image regions. These slots are then filled in by visual concepts identified in the regions by object detectors. The entire architecture (sentence template generation and slot filling with object detectors) is end-to-end differentiable. We verify the effectiveness of our proposed model on different image captioning tasks. On standard image captioning and novel object captioning, our model reaches state-of-the-art on both COCO and Flickr30k datasets. We also demonstrate that our model has unique advantages when the train and test distributions of scene compositions -- and hence language priors of associated captions -- are different. Code has been made available at: https://github.com/jiasenlu/NeuralBabyTalk

READ FULL TEXT

page 2

page 6

page 7

page 8

09/26/2020

Neural Twins Talk

Inspired by how the human brain employs more neural pathways when increa...
09/25/2021

Learning Neural Templates for Recommender Dialogue System

Though recent end-to-end neural models have shown promising progress on ...
04/23/2018

Object Counts! Bringing Explicit Detections Back into Image Captioning

The use of explicit object detectors as an intermediate step to image ca...
11/26/2018

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...
06/14/2022

Comprehending and Ordering Semantics for Image Captioning

Comprehending the rich semantics in an image and ordering them in lingui...
02/21/2022

CaMEL: Mean Teacher Learning for Image Captioning

Describing images in natural language is a fundamental step towards the ...
08/13/2022

ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

Expansion methods explore the possibility of performance bottlenecks in ...

Code Repositories

DCC

Implementation of CVPR 2016 paper


view repo