Datasets: A Community Library for Natural Language Processing

09/07/2021
by   Quentin Lhoest, et al.
6

The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at https://github.com/huggingface/datasets.

READ FULL TEXT
research
05/29/2022

L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models, and Library

Despite being the third most popular language in India, the Marathi lang...
research
03/01/2023

audb – Sharing and Versioning of Audio and Annotation Data in Python

Driven by the need for larger and more diverse datasets to pre-train and...
research
02/15/2022

textless-lib: a Library for Textless Spoken Language Processing

Textless spoken language processing research aims to extend the applicab...
research
02/02/2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

PromptSource is a system for creating, sharing, and using natural langua...
research
09/30/2019

CSPLib: Twenty Years On

In 1999, we introduced CSPLib, a benchmark library for the constraints c...
research
03/29/2021

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

Recent advances in document image analysis (DIA) have been primarily dri...
research
12/20/2018

SMILER: Saliency Model Implementation Library for Experimental Research

The Saliency Model Implementation Library for Experimental Research (SMI...

Please sign up or login with your details

Forgot password? Click here to reset