Multimodal Machine Learning for Extraction of Theorems and Proofs in the Scientific Literature

07/18/2023
by   Shrey Mishra, et al.
0

Scholarly articles in mathematical fields feature mathematical statements such as theorems, propositions, etc., as well as their proofs. Extracting them from the PDF representation of the articles requires understanding of scientific text along with visual and font-based indicators. We pose this problem as a multimodal classification problem using text, font features, and bitmap image rendering of the PDF as different modalities. In this paper we propose a multimodal machine learning approach for extraction of theorem-like environments and proofs, based on late fusion of features extracted by individual unimodal classifiers, taking into account the sequential succession of blocks in the document. For the text modality, we pretrain a new language model on a 11 GB scientific corpus; experiments shows similar performance for our task than a model (RoBERTa) pretrained on 160 GB, with faster convergence while requiring much less fine-tuning data. Font-based information relies on training a 128-cell LSTM on the sequence of font names and sizes within each block. Bitmap renderings are dealt with using an EfficientNetv2 deep network tuned to classify each image block. Finally, a simple CRF-based approach uses the features of the multimodal model along with information on block sequences. Experimental results show the benefits of using a multimodal approach vs any single modality, as well as major performance improvements using the CRF modeling of block sequences.

READ FULL TEXT

page 11

page 15

research
04/07/2022

Sequence-Based Extractive Summarisation for Scientific Articles

This paper presents the results of research on supervised extractive tex...
research
10/07/2020

VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

We introduce a novel approach for scanned document representation to per...
research
02/23/2018

Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

In this paper, we propose a novel approach of word-level Indic script id...
research
01/29/2023

Global Flood Prediction: a Multimodal Machine Learning Approach

Flooding is one of the most destructive and costly natural disasters, an...
research
02/05/2021

3D Medical Multi-modal Segmentation Network Guided by Multi-source Correlation Constraint

In the field of multimodal segmentation, the correlation between differe...
research
01/16/2023

Multimodal Side-Tuning for Document Classification

In this paper, we propose to exploit the side-tuning framework for multi...
research
03/10/2021

What is Multimodality?

The last years have shown rapid developments in the field of multimodal ...

Please sign up or login with your details

Forgot password? Click here to reset