AI Chat AI Image Generator AI Video Text to Speech

CREER: A Large-Scale Corpus for Relation Extraction and Entity Recognition

04/27/2022

∙

by Yu-Siou Tang, et al.

∙

∙

We describe the design and use of the CREER dataset, a large corpus annotated with rich English grammar and semantic attributes. The CREER dataset uses the Stanford CoreNLP Annotator to capture rich language structures from Wikipedia plain text. This dataset follows widely used linguistic and semantic annotations so that it can be used for not only most natural language processing tasks but also scaling the dataset. This large supervised dataset can serve as the basis for improving the performance of NLP tasks in the future.

Yu-Siou Tang
1 publication
Chung-Hsien Wu
1 publication

page 1

page 2

page 3

page 4

research

∙ 09/12/2022

CSL: A Large-scale Chinese Scientific Literature Dataset

Scientific literature serves as a high-quality corpus, supporting a lot ...

0 Yudong Li, et al. ∙

research

∙ 09/15/2023

AlbNER: A Corpus for Named Entity Recognition in Albanian

Scarcity of resources such as annotated text corpora for under-resourced...

0 Erion Çano, et al. ∙

research

∙ 03/08/2022

A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text

Medical information extraction consists of a group of natural language p...

0 Enwei Zhu, et al. ∙

research

∙ 09/10/2021

How May I Help You? Using Neural Text Simplification to Improve Downstream NLP Tasks

The general goal of text simplification (TS) is to reduce text complexit...

0 Hoang Van, et al. ∙

research

∙ 05/12/2021

Designing Multimodal Datasets for NLP Challenges

In this paper, we argue that the design and development of multimodal da...

12 James Pustejovsky, et al. ∙

research

∙ 11/19/2018

The Mafiascum Dataset: A Large Text Corpus for Deception Detection

Detecting deception in natural language has a wide variety of applicatio...

0 Bob de Ruiter, et al. ∙

research

∙ 08/12/2020

The Annotation Guideline of LST20 Corpus

This report presents the annotation guideline for LST20, a large-scale c...

0 Prachya Boonkwan, et al. ∙