Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

04/13/2018
by   Moritz Schubotz, et al.
0

Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

Discovering Mathematical Objects of Interest – A Study of Mathematical Notations

Mathematical notation, i.e., the writing system used to communicate conc...
research
11/30/2020

Automatic Mathematical Information Retrieval to Perform Translations up to Computer Algebra Systems

In mathematics, LaTeX is the de facto standard to prepare documents, e.g...
research
09/17/2021

MathTools: An Open API for Convenient MathML Handling

Mathematical formulae carry complex and essential semantic information i...
research
03/03/2023

Discovery and Recognition of Formula Concepts using Machine Learning

Citation-based Information Retrieval (IR) methods for scientific documen...
research
10/27/2020

Semantic Search in Millions of Equations

Given the increase of publications, search for relevant papers becomes t...
research
12/04/2020

ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

The zbMATH database contains more than 4 million bibliographic entries. ...

Please sign up or login with your details

Forgot password? Click here to reset