Multi-Grained Multimodal Interaction Network for Entity Linking

07/19/2023
by   Pengfei Luo, et al.
0

Multimodal entity linking (MEL) task, which aims at resolving ambiguous mentions to a multimodal knowledge graph, has attracted wide attention in recent years. Though large efforts have been made to explore the complementary effect among multiple modalities, however, they may fail to fully absorb the comprehensive expression of abbreviated textual context and implicit visual indication. Even worse, the inevitable noisy data may cause inconsistency of different modalities during the learning process, which severely degenerates the performance. To address the above issues, in this paper, we propose a novel Multi-GraIned Multimodal InteraCtion Network (MIMIC) framework for solving the MEL task. Specifically, the unified inputs of mentions and entities are first encoded by textual/visual encoders separately, to extract global descriptive features and local detailed features. Then, to derive the similarity matching score for each mention-entity pair, we device three interaction units to comprehensively explore the intra-modal interaction and inter-modal fusion among features of entities and mentions. In particular, three modules, namely the Text-based Global-Local interaction Unit (TGLU), Vision-based DuaL interaction Unit (VDLU) and Cross-Modal Fusion-based interaction Unit (CMFU) are designed to capture and integrate the fine-grained representation lying in abbreviated text and implicit visual cues. Afterwards, we introduce a unit-consistency objective function via contrastive learning to avoid inconsistency and model degradation. Experimental results on three public benchmark datasets demonstrate that our solution outperforms various state-of-the-art baselines, and ablation studies verify the effectiveness of designed modules.

READ FULL TEXT

page 2

page 8

page 11

research
03/22/2023

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

Text-to-image person retrieval aims to identify the target person based ...
research
06/28/2023

Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection

The explosive growth of rumors with text and images on social media plat...
research
10/01/2020

Referring Image Segmentation via Cross-Modal Progressive Comprehension

Referring image segmentation aims at segmenting the foreground masks of ...
research
08/23/2022

Flat Multi-modal Interaction Transformer for Named Entity Recognition

Multi-modal named entity recognition (MNER) aims at identifying entity s...
research
08/11/2020

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Visual dialogue is a challenging task that needs to extract implicit inf...
research
04/06/2022

Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension

Procedural Multimodal Documents (PMDs) organize textual instructions and...
research
05/27/2023

Towards Better Entity Linking with Multi-View Enhanced Distillation

Dense retrieval is widely used for entity linking to retrieve entities f...

Please sign up or login with your details

Forgot password? Click here to reset