A Data-Centric Framework for Composable NLP Workflows

03/02/2021
by   Zhengzhong Liu, et al.
17

Empirical natural language processing (NLP) systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization. We establish a unified open-source framework to support fast development of such sophisticated NLP workflows in a composable manner. The framework introduces a uniform data representation to encode heterogeneous results by a wide range of NLP tasks. It offers a large repository of processors for NLP tasks, visualization, and annotation, which can be easily assembled with full interoperability under the unified representation. The highly extensible framework allows plugging in custom processors from external off-the-shelf NLP and deep learning libraries. The whole framework is delivered through two modularized yet integratable open-source projects, namely Forte1 (for workflow infrastructure and NLP function processors) and Stave2 (for user interaction, visualization, and annotation).

READ FULL TEXT

page 2

page 6

research
12/20/2022

Is GPT-3 a Good Data Annotator?

GPT-3 (Generative Pre-trained Transformer 3) is a large-scale autoregres...
research
07/05/2023

Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

This study examines the performance of open-source Large Language Models...
research
03/06/2023

OpenICL: An Open-Source Framework for In-context Learning

In recent years, In-context Learning (ICL) has gained increasing attenti...
research
04/16/2020

A Workflow Manager for Complex NLP and Content Curation Pipelines

We present a workflow manager for the flexible creation and customisatio...
research
01/06/2022

HuSpaCy: an industrial-strength Hungarian natural language processing toolkit

Although there are a couple of open-source language processing pipelines...
research
08/14/2023

Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation

Fine-grained, span-level human evaluation has emerged as a reliable and ...
research
09/18/2023

Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Most NLP tasks are modeled as supervised learning and thus require label...

Please sign up or login with your details

Forgot password? Click here to reset