Intelligent Arxiv: Sort daily papers by learning users topics preference

02/06/2020
by   Ezequiel Alvarez, et al.
0

Current daily paper releases are becoming increasingly large and areas of research are growing in diversity. This makes it harder for scientists to keep up to date with current state of the art and identify relevant work within their lines of interest. The goal of this article is to address this problem using Machine Learning techniques. We model a scientific paper to be built as a combination of different scientific knowledge from diverse topics into a new problem. In light of this, we implement the unsupervised Machine Learning technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a given field to: i) define and extract underlying topics in the corpus; ii) get the topics weight vector for each paper in the corpus; and iii) get the topics weight vector for new papers. By registering papers preferred by a user, we build a user vector of weights using the information of the vectors of the selected papers. Hence, by performing an inner product between the user vector and each paper in the daily Arxiv release, we can sort the papers according to the user preference on the underlying topics. We have created the website IArxiv.org where users can read sorted daily Arxiv releases (and more) while the algorithm learns each users preference, yielding a more accurate sorting every day. Current IArxiv.org version runs on Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to others. We propose several new useful and relevant implementations to be additionally developed as well as new Machine Learning techniques beyond LDA to further improve the accuracy of this new tool.

READ FULL TEXT
research
08/04/2020

COVID-19 Kaggle Literature Organization

The world has faced the devastating outbreak of Severe Acute Respiratory...
research
03/15/2016

Topic Modeling Using Distributed Word Embeddings

We propose a new algorithm for topic modeling, Vec2Topic, that identifie...
research
09/27/2017

A Bimodal Network Approach to Model Topic Dynamics

This paper presents an intertemporal bimodal network to analyze the evol...
research
03/29/2017

The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study

Which topics of machine learning are most commonly addressed in research...
research
05/11/2018

TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation

The field of Natural Language Processing (NLP) is growing rapidly, with ...
research
02/03/2016

"Draw My Topics": Find Desired Topics fast from large scale of Corpus

We develop the "Draw My Topics" toolkit, which provides a fast way to in...
research
08/25/2023

Discovering Mental Health Research Topics with Topic Modeling

Mental health significantly influences various aspects of our daily live...

Please sign up or login with your details

Forgot password? Click here to reset