RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

02/05/2021
by   Lin Sun, et al.
0

Recently multimodal named entity recognition (MNER) has utilized images to improve the accuracy of NER in tweets. However, most of the multimodal methods use attention mechanisms to extract visual clues regardless of whether the text and image are relevant. Practically, the irrelevant text-image pairs account for a large proportion in tweets. The visual clues that are unrelated to the texts will exert uncertain or even negative effects on multimodal model learning. In this paper, we introduce a method of text-image relation propagation into the multimodal BERT model. We integrate soft or hard gates to select visual clues and propose a multitask algorithm to train on the MNER datasets. In the experiments, we deeply analyze the changes in visual attention before and after the use of text-image relation propagation. Our model achieves state-of-the-art performance on the MNER datasets.

READ FULL TEXT

page 1

page 7

research
02/22/2018

Multimodal Named Entity Recognition for Short Social Media Posts

We introduce a new task called Multimodal Named Entity Recognition (MNER...
research
05/15/2023

A Novel Framework for Multimodal Named Entity Recognition with Multi-level Alignments

Mining structured knowledge from tweets using named entity recognition (...
research
04/02/2019

Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition

With massive explosion of social media such as Twitter and Instagram, pe...
research
10/23/2020

A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition

Multimodal named entity recognition (MNER) requires to bridge the gap be...
research
08/03/2023

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

The challenge posed by multimodal named entity recognition (MNER) is mai...
research
09/13/2022

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

We present a new multimodal dataset called Visual Recipe Flow, which ena...
research
10/12/2022

That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

Pretraining multimodal models on Electronic Health Records (EHRs) provid...

Please sign up or login with your details

Forgot password? Click here to reset