MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain

08/16/2021
by   Leonhard Hennig, et al.
0

We present MobIE, a German-language dataset, which is human-annotated with 20 coarse- and fine-grained entity types and entity linking information for geographically linkable entities. The dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities, 13.1K of which are linked to a knowledge base. A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types, while the remaining documents are annotated using a weakly-supervised labeling approach implemented with the Snorkel framework. To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE, and thus can be used for joint and multi-task learning of these fundamental information extraction tasks. We make MobIE public at https://github.com/dfki-nlp/mobie.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events

Monitoring mobility- and industry-relevant events is important in areas ...
research
05/16/2021

Few-NERD: A Few-Shot Named Entity Recognition Dataset

Recently, considerable literature has grown up around the theme of few-s...
research
09/26/2020

DWIE: an entity-centric dataset for multi-task document-level information extraction

This paper presents DWIE, the 'Deutsche Welle corpus for Information Ext...
research
01/04/2021

Reddit Entity Linking Dataset

We introduce and make publicly available an entity linking dataset from ...
research
09/01/2022

KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks

As digitized traditional cultural heritage documents have rapidly increa...
research
07/11/2022

Slot Filling for Extracting Reskilling and Upskilling Options from the Web

Disturbances in the job market such as advances in science and technolog...
research
09/04/2023

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Many of the most commonly explored natural language processing (NLP) inf...

Please sign up or login with your details

Forgot password? Click here to reset