Towards a corpus for credibility assessment in software practitioner blog articles

06/21/2021
by   Ashley Williams, et al.
0

Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state-of-practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with 'argumentation' and 'evidence' being two important criteria. We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss' Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking

Automated fact-checking based on machine learning is a promising approac...
research
03/02/2021

Practitioner-generated blog posts as evidence for software engineering research: attitudinal survey and preliminary checklist

Background: Blog posts are frequently used by software practitioners to ...
research
07/17/2021

Overview and Insights from the SciVer Shared Task on Scientific Claim Verification

We present an overview of the SciVer shared task, presented at the 2nd S...
research
07/21/2020

Beyond Accuracy: Assessing Software Documentation Quality

Good software documentation encourages good software engineering, but th...
research
04/07/2020

Automatically Assessing Quality of Online Health Articles

The information ecosystem today is overwhelmed by an unprecedented quant...

Please sign up or login with your details

Forgot password? Click here to reset