Dependent Multi-Task Learning with Causal Intervention for Image Captioning

05/18/2021
by   Wenqing Chen, et al.
0

Recent work for image captioning mainly followed an extract-then-generate paradigm, pre-extracting a sequence of object-based features and then formulating image captioning as a single sequence-to-sequence task. Although promising, we observed two problems in generated captions: 1) content inconsistency where models would generate contradicting facts; 2) not informative enough where models would miss parts of important information. From a causal perspective, the reason is that models have captured spurious statistical correlations between visual features and certain expressions (e.g., visual features of "long hair" and "woman"). In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI). Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning. The intermediate task would help the model better understand the visual features and thus alleviate the content inconsistency problem. Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders and thus letting models focus on the causal visual features. Specifically, the high-frequency concept set is considered as the proxy confounders where the real confounders are inferred in the continuous space. Finally, we use a multi-agent reinforcement learning (MARL) strategy to enable end-to-end training and reduce the inter-task error accumulations. The extensive experiments show that our model outperforms the baseline models and achieves competitive performance with state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2020

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attent...
research
07/20/2022

GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

Current state-of-the-art methods for image captioning employ region-base...
research
04/23/2018

Object Counts! Bringing Explicit Detections Back into Image Captioning

The use of explicit object detectors as an intermediate step to image ca...
research
10/08/2020

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

We introduce dense relational captioning, a novel image captioning task ...
research
11/02/2020

Dual Attention on Pyramid Feature Maps for Image Captioning

Generating natural sentences from images is a fundamental learning task ...
research
02/11/2022

Bench-Marking And Improving Arabic Automatic Image Captioning Through The Use Of Multi-Task Learning Paradigm

The continuous increase in the use of social media and the visual conten...
research
10/20/2022

Content-based Graph Privacy Advisor

People may be unaware of the privacy risks of uploading an image online....

Please sign up or login with your details

Forgot password? Click here to reset