Incorporating Visual Layout Structures for Scientific Text Classification

06/01/2021
by   Zejiang Shen, et al.
18

Classifying the core textual components of a scientific paper-title, author, body text, etc.-is a critical first step in automated scientific document understanding. Previous work has shown how using elementary layout information, i.e., each token's 2D position on the page, leads to more accurate classification. We introduce new methods for incorporating VIsual LAyout structures (VILA), e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance. We show that the I-VILA approach, which simply adds special tokens denoting boundaries between layout structures into model inputs, can lead to +1 4.5 F1 Score improvements in token classification tasks. Moreover, we design a hierarchical model H-VILA that encodes these layout structures and record a up-to 70 without hurting prediction accuracy. The experiments are conducted on a newly curated evaluation suite, S2-VLUE, with a novel metric measuring VILA awareness and a new dataset covering 19 scientific disciplines with gold annotations. Pre-trained weights, benchmark datasets, and source code will be available at https://github.com/allenai/VILAhttps://github.com/allenai/VILA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2020

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to und...
research
01/25/2023

Generalizability in Document Layout Analysis for Scientific Article Figure Caption Extraction

The lack of generalizability – in which a model trained on one dataset c...
research
03/25/2023

Freestyle Layout-to-Image Synthesis

Typical layout-to-image synthesis (LIS) models generate images for a clo...
research
08/29/2023

Vision Grid Transformer for Document Layout Analysis

Document pre-trained models and grid-based models have proven to be very...
research
07/28/2022

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

Due to the complex layouts of documents, it is challenging to extract in...
research
08/10/2021

BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents

Understanding documents from their visual snapshots is an emerging probl...
research
02/17/2023

Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories

When extracting structured data from repetitively organized documents, s...

Please sign up or login with your details

Forgot password? Click here to reset