Self-Supervised Pretraining of Graph Neural Network for the Retrieval of Related Mathematical Expressions in Scientific Articles

08/22/2022
by   Lukas Pfahler, et al.
0

Given the increase of publications, search for relevant papers becomes tedious. In particular, search across disciplines or schools of thinking is not supported. This is mainly due to the retrieval with keyword queries: technical terms differ in different sciences or at different times. Relevant articles might better be identified by their mathematical problem descriptions. Just looking at the equations in a paper already gives a hint to whether the paper is relevant. Hence, we propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with self-supervised learning. Using graph convolutional neural networks we embed mathematical expression into low-dimensional vector spaces that allow efficient nearest neighbor queries. To train our models, we collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on arXiv.org. The math is converted into an XML format, which we view as graph data. Our empirical evaluations involving a new dataset of manually annotated search queries show the benefits of using embedding models for mathematical retrieval. This work was originally published at KDD 2020.

READ FULL TEXT
research
10/27/2020

Semantic Search in Millions of Equations

Given the increase of publications, search for relevant papers becomes t...
research
10/27/2020

The Search for Equations - Learning to Identify Similarities between Mathematical Expressions

On your search for scientific articles relevant to your research questio...
research
08/10/2021

Self-supervised Consensus Representation Learning for Attributed Graph

Attempting to fully exploit the rich information of topological structur...
research
02/05/2021

Self-Supervised Deep Graph Embedding with High-Order Information Fusion for Community Discovery

Deep graph embedding is an important approach for community discovery. D...
research
08/15/2022

AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query

Paper recommendation with user-generated keyword is to suggest papers th...
research
07/06/2022

100 prisoners and a lightbulb – looking back

100 prisoners and a light bulb is a long standing mathematical puzzle. T...

Please sign up or login with your details

Forgot password? Click here to reset