Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

12/01/2022
by   Lennart Küll, et al.
0

The number of scientific publications continues to rise exponentially, especially in Computer Science (CS). However, current solutions to analyze those publications restrict access behind a paywall, offer no features for visual analysis, limit access to their data, only focus on niches or sub-fields, and/or are not flexible and modular enough to be transferred to other datasets. In this thesis, we conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata and to determine the state of CS research. Specifically, we investigate trends of the quantity, impact, and topics for authors, venues, document types (conferences vs. journals), and fields of study (compared to, e.g., medicine). To achieve this we introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations. The data underlying this system is the DBLP Discovery Dataset (D3), which contains metadata from 5 million CS publications. Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future. The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles receive twice as many citations compared to conference papers, but the contrast is much smaller for the most cited conferences and journals, and vi) journals also get more citations in all other investigated fields of study, while only CS and engineering publish more in conferences than journals.

READ FULL TEXT

page 3

page 9

page 13

page 14

page 16

page 17

page 29

page 34

research
10/13/2022

CS-Insights: A System for Analyzing Computer Science Research

This paper presents CS-Insights, an interactive web application to analy...
research
04/28/2022

D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research

DBLP is the largest open-access repository of scientific articles on com...
research
06/27/2018

Author-Based Analysis of Conference versus Journal Publication in Computer Science

Conference publications in computer science (CS) have attracted scholarl...
research
02/17/2020

Identifying the Development and Application of Artificial Intelligence in Scientific Text

We describe a strategy for identifying the universe of research publicat...
research
05/23/2023

Evaluating the Efficacy of ChatGPT-4 in Providing Scientific References across Diverse Disciplines

This work conducts a comprehensive exploration into the proficiency of O...
research
12/22/2021

Machine Learning for Computational Science and Engineering – a brief introduction and some critical questions

Artificial Intelligence (AI) is now entering every sub-field of science,...
research
02/07/2023

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

Due to the exponential growth of scientific publications on the Web, the...

Please sign up or login with your details

Forgot password? Click here to reset