SciTweets – A Dataset and Annotation Framework for Detecting Scientific Online Discourse

by   Salim Hafid, et al.

Scientific topics, claims and resources are increasingly debated as part of online discourse, where prominent examples include discourse related to COVID-19 or climate change. This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines. For instance, communication studies aim at a deeper understanding of biases, quality or spreading pattern of scientific information whereas computational methods have been proposed to extract, classify or verify scientific claims using NLP and IR techniques. However, research across disciplines currently suffers from both a lack of robust definitions of the various forms of science-relatedness as well as appropriate ground truth data for distinguishing them. In this work, we contribute (a) an annotation framework and corresponding definitions for different forms of scientific relatedness of online discourse in Tweets, (b) an expert-annotated dataset of 1261 tweets obtained through our labeling framework reaching an average Fleiss Kappa κ of 0.63, (c) a multi-label classifier trained on our data able to detect science-relatedness with 89 (claims, references). With this work we aim to lay the foundation for developing and evaluating robust methods for analysing science as part of large-scale online discourse.


page 1

page 2

page 3

page 4


SciDTB: Discourse Dependency TreeBank for Scientific Abstracts

Annotation corpus for discourse relations benefits NLP tasks such as mac...

Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning

Claims are a fundamental unit of scientific discourse. The exponential g...

Large-scale, Language-agnostic Discourse Classification of Tweets During COVID-19

Quantifying the characteristics of public attention is an essential prer...

SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours

Media is full of false claims. Even Oxford Dictionaries named "post-trut...

Maintaining scientific discourse during a global pandemic: ESO's first e-conference #H02020

From 22 to 26 June 2020, we hosted ESO's first live e-conference, #H0202...

Bias in Semantic and Discourse Interpretation

In this paper, we show how game-theoretic work on conversation combined ...

Expressing High-Level Scientific Claims with Formal Semantics

The use of semantic technologies is gaining significant traction in scie...

Please sign up or login with your details

Forgot password? Click here to reset