Identifying the Development and Application of Artificial Intelligence in Scientific Text

02/17/2020
by   James Dunham, et al.
0

We describe a strategy for identifying the universe of research publications relevant to the application and development of artificial intelligence. The approach leverages the arXiv corpus of scientific preprints, in which authors choose subject tags for their papers from a set defined by editors. We compose a functional definition of AI relevance by learning these subjects from paper metadata, and then inferring the arXiv-subject labels of papers in larger corpora: Clarivate Web of Science, Digital Science Dimensions, and Microsoft Academic Graph. This yields predictive classification F_1 scores between .75 and .86 for Natural Language Processing (cs.CL), Computer Vision (cs.CV), and Robotics (cs.RO). For a single model that learns these and four other AI-relevant subjects (cs.AI, cs.LG, stat.ML, and cs.MA), we see precision of .83 and recall of .85. We evaluate the out-of-domain performance of our classifiers against other sources of topic information and predictions from alternative methods. We find that a supervised solution can generalize to identify publications that belong to the high-level fields of study represented on arXiv. This offers a method for identifying AI-relevant publications that updates at the pace of research output, without reliance on subject-matter experts for query development or labeling.

READ FULL TEXT

page 7

page 8

research
12/01/2022

Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

The number of scientific publications continues to rise exponentially, e...
research
04/30/2021

Revisiting Citizen Science Through the Lens of Hybrid Intelligence

Artificial Intelligence (AI) can augment and sometimes even replace huma...
research
08/13/2018

Methodology for identifying study sites in scientific corpus

The TERRE-ISTEX project aims at identifying the evolution of research wo...
research
05/02/2017

Increasing Papers' Discoverability with Precise Semantic Labeling: the sci.AI Platform

The number of published findings in biomedicine increases continually. A...
research
11/04/2022

The Sustainable Development Goals and Aerospace Engineering: A critical note through Artificial Intelligence

The 2030 Agenda of the United Nations (UN) revolves around the Sustainab...
research
03/07/2019

Predicting Research Trends From Arxiv

We perform trend detection on two datasets of Arxiv papers, derived from...

Please sign up or login with your details

Forgot password? Click here to reset