A Machine Learning Pipeline for Automatic Extraction of Statistic Reports and Experimental Conditions from Scientific Papers

by   Steffen Epp, et al.

A common writing style for statistical results are the recommendations of the American Psychology Association, known as APA-style. However, in practice, writing styles vary as reports are not 100 are not reported despite being mandatory. In addition, the statistics are not reported in isolation but in context of experimental conditions investigated and the general topic. We address these challenges by proposing a flexible pipeline STEREO based on active wrapper induction and unsupervised aspect extraction. We applied our pipeline to the over 100,000 documents in the CORD-19 dataset. It required only 0.25 learn statistics extraction rules that cover 95 The statistic extraction has nearly 100 precision on non-APA writing styles. In total, we were able to extract 113k reported statistics, of which only <1 the correct conditions from APA-conform reports (30 model for topic extraction achieves a precision of 75 in APA style (73 foundation for automatic statistic extraction and future developments for scientific paper analysis. Particularly the extraction of non-APA conform reports is important and allows applications such as giving feedback to authors about what is missing and could be changed.


A Computational Model For Individual Scholars' Writing Style Dynamics

A manuscript's writing style is central in determining its readership, i...

Reporting the Unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes

Official reports of hate crimes in the US are under-reported relative to...

Reducing a Set of Regular Expressions and Analyzing Differences of Domain-specific Statistic Reporting

Due to the large amount of daily scientific publications, it is impossib...

Writing Style Aware Document-level Event Extraction

Event extraction, the technology that aims to automatically get the stru...

Style Transfer and Extraction for the Handwritten Letters Using Deep Learning

How can we learn, transfer and extract handwriting styles using deep neu...

Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder

When creating an outfit, style is a criterion in selecting each fashion ...

The Importance of Suppressing Domain Style in Authorship Analysis

The prerequisite of many approaches to authorship analysis is a represen...

Please sign up or login with your details

Forgot password? Click here to reset