Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach

08/11/2022
by   Bin Xiao, et al.
6

Due to the characteristics of Information and Communications Technology (ICT) products, the critical information of ICT devices is often summarized in big tabular data shared across supply chains. Therefore, it is critical to automatically interpret tabular structures with the surging amount of electronic assets. To transform the tabular data in electronic documents into a machine-interpretable format and provide layout and semantic information for information extraction and interpretation, we define a Table Structure Recognition (TSR) task and a Table Cell Type Classification (CTC) task. We use a graph to represent complex table structures for the TSR task. Meanwhile, table cells are categorized into three groups based on their functional roles for the CTC task, namely Header, Attribute, and Data. Subsequently, we propose a multi-task model to solve the defined two tasks simultaneously by using the text modal and image modal features. Our experimental results show that our proposed method can outperform state-of-the-art methods on ICDAR2013 and UNLV datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

research
11/03/2022

Efficient Information Sharing in ICT Supply Chain Social Network via Table Structure Recognition

The global Information and Communications Technology (ICT) supply chain ...
research
03/15/2023

An End-to-End Multi-Task Learning Model for Image-based Table Recognition

Image-based table recognition is a challenging task due to the diversity...
research
03/08/2022

Table Structure Recognition with Conditional Attention

Tabular data in digital documents is widely used to express compact and ...
research
05/25/2021

Tab.IAIS: Flexible Table Recognition and Semantic Interpretation System

Table extraction is an important but still unsolved problem. In this pap...
research
11/25/2022

Semantic Table Detection with LayoutLMv3

This paper presents an application of the LayoutLMv3 model for semantic ...
research
07/14/2022

DEXTER: An end-to-end system to extract table contents from electronic medical health documents

In this paper, we propose DEXTER, an end to end system to extract inform...
research
05/30/2023

Table Detection for Visually Rich Document Images

Table Detection (TD) is a fundamental task towards visually rich documen...

Please sign up or login with your details

Forgot password? Click here to reset