Beyond original Research Articles Categorization via NLP

09/13/2023
by   Rosanna Turrisi, et al.
0

This work proposes a novel approach to text categorization – for unknown categories – in the context of scientific literature, using Natural Language Processing techniques. The study leverages the power of pre-trained language models, specifically SciBERT, to extract meaningful representations of abstracts from the ArXiv dataset. Text categorization is performed using the K-Means algorithm, and the optimal number of clusters is determined based on the Silhouette score. The results demonstrate that the proposed approach captures subject information more effectively than the traditional arXiv labeling system, leading to improved text categorization. The approach offers potential for better navigation and recommendation systems in the rapidly growing landscape of scientific research literature.

READ FULL TEXT

page 3

page 6

page 7

research
07/11/2022

Learning Mutual Fund Categorization using Natural Language Processing

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long...
research
05/07/2023

MIReAD: Simple Method for Learning High-quality Representations from Scientific Documents

Learning semantically meaningful representations from scientific documen...
research
09/12/2018

Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications

Medical applications challenge today's text categorization techniques by...
research
08/31/2023

Using Large Language Models to Automate Category and Trend Analysis of Scientific Articles: An Application in Ophthalmology

Purpose: In this paper, we present an automated method for article class...
research
10/21/2022

Life is a Circus and We are the Clowns: Automatically Finding Analogies between Situations and Processes

Analogy-making gives rise to reasoning, abstraction, flexible categoriza...
research
06/05/2018

Adapting Neural Text Classification for Improved Software Categorization

Software Categorization is the task of organizing software into groups t...
research
06/08/2023

covLLM: Large Language Models for COVID-19 Biomedical Literature

The COVID-19 pandemic led to 1.1 million deaths in the United States, de...

Please sign up or login with your details

Forgot password? Click here to reset