Improving astroBERT using Semantic Textual Similarity

11/29/2022
by   Felix Grezes, et al.
0

The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2015

Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

The workshop "Mining Scientific Papers: Computational Linguistics and Bi...
research
11/07/2019

S2ORC: The Semantic Scholar Open Research Corpus

We introduce S2ORC, a large contextual citation graph of English-languag...
research
04/25/2023

CitePrompt: Using Prompts to Identify Citation Intent in Scientific Papers

Citations in scientific papers not only help us trace the intellectual l...
research
01/24/2023

The Semantic Scholar Open Data Platform

The volume of scientific output is creating an urgent need for automated...
research
11/16/2022

Galactica: A Large Language Model for Science

Information overload is a major obstacle to scientific progress. The exp...
research
02/06/2020

Citation Data of Czech Apex Courts

In this paper, we introduce the citation data of the Czech apex courts (...
research
09/19/2023

Interactive Distillation of Large Single-Topic Corpora of Scientific Papers

Highly specific datasets of scientific literature are important for both...

Please sign up or login with your details

Forgot password? Click here to reset