Named Entity and Relation Extraction with Multi-Modal Retrieval

12/03/2022
by   Xinyu Wang, et al.
0

Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware, resulting in information that may not be highly relevant. In this paper, we propose a novel Multi-modal Retrieval based framework (MoRe). MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively. Next, the retrieval results are sent to the textual and visual models respectively for predictions. Finally, a Mixture of Experts (MoE) module combines the predictions from the two models to make the final decision. Our experiments show that both our textual model and visual model can achieve state-of-the-art performance on four multi-modal NER datasets and one multi-modal RE dataset. With MoE, the model performance can be further improved and our analysis demonstrates the benefits of integrating both textual and visual cues for such tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2021

ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lo...
research
05/16/2023

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Multi-modal visual understanding of images with prompts involves using v...
research
08/22/2022

Revising Image-Text Retrieval via Multi-Modal Entailment

An outstanding image-text retrieval model depends on high-quality labele...
research
11/21/2020

Deep learning for video game genre classification

Video game genre classification based on its cover and textual descripti...
research
02/15/2022

Towards Effective Multi-Task Interaction for Entity-Relation Extraction: A Unified Framework with Selection Recurrent Network

Entity-relation extraction aims to jointly solve named entity recognitio...
research
10/23/2018

How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval

Automatic art analysis has been mostly focused on classifying artworks i...
research
04/05/2023

Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck

This paper studies the multimodal named entity recognition (MNER) and mu...

Please sign up or login with your details

Forgot password? Click here to reset