Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation

by   Diego Kozlowski, et al.

Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.


Graph Neural Networks for Natural Language Processing: A Survey

Deep learning has become the dominant approach in coping with various ta...

Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2

With the COVID-19 pandemic, there is a growing urgency for medical commu...

The DELICES project: Indexing scientific literature through semantic expansion

Scientific digital libraries play a critical role in the development and...

Using Large Language Models to Automate Category and Trend Analysis of Scientific Articles: An Application in Ophthalmology

Purpose: In this paper, we present an automated method for article class...

Social and environmental impact of recent developments in machine learning on biology and chemistry research

Potential societal and environmental effects such as the rapidly increas...

Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences

The increase in performance in NLP due to the prevalence of distribution...

Please sign up or login with your details

Forgot password? Click here to reset