Unimodal and Multimodal Representation Training for Relation Extraction

11/11/2022
by   Ciaran Cooney, et al.
0

Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.

READ FULL TEXT
research
05/24/2023

RE^2: Region-Aware Relation Extraction from Visually Rich Documents

Current research in form understanding predominantly relies on large pre...
research
03/14/2022

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

Recently, various multimodal networks for Visually-Rich Document Underst...
research
11/14/2022

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

Multimodal relation extraction is an essential task for knowledge graph ...
research
05/10/2021

GroupLink: An End-to-end Multitask Method for Word Grouping and Relation Extraction in Form Understanding

Forms are a common type of document in real life and carry rich informat...
research
04/04/2019

Document-Level N-ary Relation Extraction with Multiscale Representation Learning

Most information extraction methods focus on binary relations expressed ...
research
04/05/2023

Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck

This paper studies the multimodal named entity recognition (MNER) and mu...
research
10/11/2022

PP-StructureV2: A Stronger Document Analysis System

A large amount of document data exists in unstructured form such as raw ...

Please sign up or login with your details

Forgot password? Click here to reset