Neural Baby Talk

03/27/2018
by   Jiasen Lu, et al.
0

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. Our approach reconciles classical slot filling approaches (that are generally better grounded in images) with modern neural captioning approaches (that are generally more natural sounding and accurate). Our approach first generates a sentence `template' with slot locations explicitly tied to specific image regions. These slots are then filled in by visual concepts identified in the regions by object detectors. The entire architecture (sentence template generation and slot filling with object detectors) is end-to-end differentiable. We verify the effectiveness of our proposed model on different image captioning tasks. On standard image captioning and novel object captioning, our model reaches state-of-the-art on both COCO and Flickr30k datasets. We also demonstrate that our model has unique advantages when the train and test distributions of scene compositions -- and hence language priors of associated captions -- are different. Code has been made available at: https://github.com/jiasenlu/NeuralBabyTalk

READ FULL TEXT

page 2

page 6

page 7

page 8

research
09/26/2020

Neural Twins Talk

Inspired by how the human brain employs more neural pathways when increa...
research
09/25/2021

Learning Neural Templates for Recommender Dialogue System

Though recent end-to-end neural models have shown promising progress on ...
research
04/23/2018

Object Counts! Bringing Explicit Detections Back into Image Captioning

The use of explicit object detectors as an intermediate step to image ca...
research
11/26/2018

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...
research
02/21/2022

CaMEL: Mean Teacher Learning for Image Captioning

Describing images in natural language is a fundamental step towards the ...
research
09/10/2020

Weakly Supervised Content Selection for Improved Image Captioning

Image captioning involves identifying semantic concepts in the scene and...
research
03/22/2021

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

Controllable Image Captioning (CIC) – generating image descriptions foll...

Please sign up or login with your details

Forgot password? Click here to reset