S2ORC: The Semantic Scholar Open Research Corpus

11/07/2019
by   Kyle Lo, et al.
0

We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated paper metadata. We provide structured full text for 8.1M open access papers. All inline citation mentions in the full text are detected and linked to their corresponding bibliography entries, which are linked to their referenced papers, forming contextual citation edges. To our knowledge, this is the largest publicly-available contextual citation graph. The full text alone is the largest structured academic text corpus to date. We release S2ORC to facilitate research and development of tools and tasks for the analysis of scientific text.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset