Neural Collaborative Graph Machines for Table Structure Recognition

11/26/2021
by   Hao Liu, et al.
3

Recently, table structure recognition has achieved impressive progress with the help of deep graph models. Most of them exploit single visual cues of tabular elements or simply combine visual cues with other modalities via early fusion to reason their graph relationships. However, neither early fusion nor individually reasoning in terms of multiple modalities can be appropriate for all varieties of table structures with great diversity. Instead, different modalities are expected to collaborate with each other in different patterns for different table cases. In the community, the importance of intra-inter modality interactions for table structure reasoning is still unexplored. In this paper, we define it as heterogeneous table structure recognition (Hetero-TSR) problem. With the aim of filling this gap, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks, which alternatively extracts intra-modality context and models inter-modality interactions in a hierarchical way. It can represent the intra-inter modality relationships of tabular elements more robustly, which significantly improves the recognition performance. We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases. Experimental results on benchmarks demonstrate our proposed NCGM achieves state-of-the-art performance and beats other contemporary methods by a large margin especially under challenging scenarios.

READ FULL TEXT

page 8

page 15

research
12/13/2018

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

Learning effective fusion of multi-modality features is at the heart of ...
research
09/07/2022

DM^2S^2: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention

There is increasing interest in the use of multimodal data in various we...
research
02/26/2021

A Universal Model for Cross Modality Mapping by Relational Reasoning

With the aim of matching a pair of instances from two different modaliti...
research
08/30/2023

Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems

In sequential recommendation, multi-modal information (e.g., text or ima...
research
07/16/2022

Visually-aware Acoustic Event Detection using Heterogeneous Graphs

Perception of auditory events is inherently multimodal relying on both a...
research
03/16/2023

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

Recently, Table Structure Recognition (TSR) task, aiming at identifying ...
research
03/18/2022

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

Scene Graph Generation, which generally follows a regular encoder-decode...

Please sign up or login with your details

Forgot password? Click here to reset