CREER: A Large-Scale Corpus for Relation Extraction and Entity Recognition

04/27/2022
by   Yu-Siou Tang, et al.
0

We describe the design and use of the CREER dataset, a large corpus annotated with rich English grammar and semantic attributes. The CREER dataset uses the Stanford CoreNLP Annotator to capture rich language structures from Wikipedia plain text. This dataset follows widely used linguistic and semantic annotations so that it can be used for not only most natural language processing tasks but also scaling the dataset. This large supervised dataset can serve as the basis for improving the performance of NLP tasks in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2022

CSL: A Large-scale Chinese Scientific Literature Dataset

Scientific literature serves as a high-quality corpus, supporting a lot ...
research
09/15/2023

AlbNER: A Corpus for Named Entity Recognition in Albanian

Scarcity of resources such as annotated text corpora for under-resourced...
research
03/08/2022

A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text

Medical information extraction consists of a group of natural language p...
research
09/10/2021

How May I Help You? Using Neural Text Simplification to Improve Downstream NLP Tasks

The general goal of text simplification (TS) is to reduce text complexit...
research
05/12/2021

Designing Multimodal Datasets for NLP Challenges

In this paper, we argue that the design and development of multimodal da...
research
11/19/2018

The Mafiascum Dataset: A Large Text Corpus for Deception Detection

Detecting deception in natural language has a wide variety of applicatio...
research
08/12/2020

The Annotation Guideline of LST20 Corpus

This report presents the annotation guideline for LST20, a large-scale c...

Please sign up or login with your details

Forgot password? Click here to reset