GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

07/13/2020
by   Florian Borchert, et al.
0

The lack of publicly available text corpora is a major obstacle for progress in clinical natural language processing, for non-English speaking countries in particular. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines in the field of oncology. The corpus is one of the largest corpora of German medical text to date. It does not contain any patient-related data and can therefore be used without data protection restrictions. Moreover, it is the first corpus for the German language covering diverse conditions in a large medical subfield. In addition to the textual sources, we provide a large variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other medical text corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2022

A Medical Information Extraction Workbench to Process German Clinical Text

Background: In the information extraction and natural language processin...
research
04/21/2022

German Parliamentary Corpus (GerParCor)

Parliamentary debates represent a large and partly unexploited treasure ...
research
11/30/2019

Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German

This paper presents SwissCrawl, the largest Swiss German text corpus to ...
research
08/02/2017

Towards Semantic Modeling of Contradictions and Disagreements: A Case Study of Medical Guidelines

We introduce a formal distinction between contradictions and disagreemen...
research
11/10/2021

Multimodal Approach for Metadata Extraction from German Scientific Publications

Nowadays, metadata information is often given by the authors themselves ...
research
06/30/2021

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in te...

Please sign up or login with your details

Forgot password? Click here to reset