ArGoT: A Glossary of Terms extracted from the arXiv

09/07/2021
by   Luis Berlioz, et al.
0

We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the article's source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the term's definitions and the arXiv's metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the text's vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.

READ FULL TEXT
research
09/15/2021

Learning Mathematical Properties of Integers

Embedding words in high-dimensional vector spaces has proven valuable in...
research
05/12/2023

Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions

Neural-based word embeddings using solely distributional information hav...
research
09/21/2022

An Integrated Web Platform for the Mizar Mathematical Library

This paper reports on the development of a Web platform to host the Miza...
research
08/29/2022

Extracting Mathematical Concepts from Text

We investigate different systems for extracting mathematical entities fr...
research
12/18/2022

Synthesis and Evaluation of a Domain-specific Large Data Set for Dungeons Dragons

This paper introduces the Forgotten Realms Wiki (FRW) data set and domai...
research
11/09/2020

Automated Discovery of Mathematical Definitions in Text with Deep Neural Networks

Automatic definition extraction from texts is an important task that has...
research
10/11/2012

Distributional Framework for Emergent Knowledge Acquisition and its Application to Automated Document Annotation

The paper introduces a framework for representation and acquisition of k...

Please sign up or login with your details

Forgot password? Click here to reset