DataCLUE: A Benchmark Suite for Data-centric NLP

11/16/2021
by   Liang Xu, et al.
0

Data-centric AI has recently proven to be more effective and high-performance, while traditional model-centric AI delivers fewer and fewer benefits. It emphasizes improving the quality of datasets to achieve better model performance. This field has significant potential because of its great practicability and getting more and more attention. However, we have not seen significant research progress in this field, especially in NLP. We propose DataCLUE, which is the first Data-Centric benchmark applied in NLP field. We also provide three simple but effective baselines to foster research in this field (improve Macro-F1 up to 5.7 comprehensive experiments with human annotators and show the hardness of DataCLUE. We also try an advanced method: the forgetting informed bootstrapping label correction method. All the resources related to DataCLUE, including datasets, toolkit, leaderboard, and baselines, is available online at https://github.com/CLUEbenchmark/DataCLUE

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric Approach

Affordance-centric Question-driven Task Completion for Egocentric Assist...
research
03/10/2022

Exploiting the Potential of Datasets: A Data-Centric Approach for Model Robustness

Robustness of deep neural networks (DNNs) to malicious perturbations is ...
research
12/18/2020

NeurST: Neural Speech Translation Toolkit

NeurST is an open-source toolkit for neural speech translation developed...
research
03/02/2022

Mukayese: Turkish NLP Strikes Back

Having sufficient resources for language X lifts it from the under-resou...
research
06/26/2023

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

Data-centric AI approach aims to enhance the model performance without m...
research
07/10/2022

Human-Centric Research for NLP: Towards a Definition and Guiding Questions

With Human-Centric Research (HCR) we can steer research activities so th...
research
01/02/2023

DMOps: Data Management Operation and Recipes

Data-centric AI has shed light on the significance of data within the ma...

Please sign up or login with your details

Forgot password? Click here to reset