skweak: Weak Supervision Made Easy for NLP

04/19/2021
by   Pierre Lison, et al.
0

We present skweak, a versatile, Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks. Weak supervision is an emerging machine learning paradigm based on a simple idea: instead of labelling data points by hand, we use labelling functions derived from domain knowledge to automatically obtain annotations for a given dataset. The resulting labels are then aggregated with a generative model that estimates the accuracy (and possible confusions) of each labelling function. The skweak toolkit makes it easy to implement a large spectrum of labelling functions (such as heuristics, gazetteers, neural models or linguistic constraints) on text data, apply them on a corpus, and aggregate their results in a fully unsupervised fashion. skweak is especially designed to facilitate the use of weak supervision for NLP tasks such as text classification and sequence labelling. We illustrate the use of skweak for NER and sentiment analysis. skweak is released under an open-source license and is available at: https://github.com/NorskRegnesentral/skweak

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

Alfred: A System for Prompted Weak Supervision

Alfred is the first system for programmatic weak supervision (PWS) that ...
research
11/16/2020

NLPGym – A toolkit for evaluating RL agents on Natural Language Processing Tasks

Reinforcement learning (RL) has recently shown impressive performance in...
research
09/18/2023

Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Most NLP tasks are modeled as supervised learning and thus require label...
research
04/30/2020

Named Entity Recognition without Labelled Data: A Weak Supervision Approach

Named Entity Recognition (NER) performance often degrades rapidly when a...
research
07/16/2021

Pseudo-labelling Enhanced Media Bias Detection

Leveraging unlabelled data through weak or distant supervision is a comp...
research
10/04/2022

Text Characterization Toolkit

In NLP, models are usually evaluated by reporting single-number performa...
research
11/03/2021

OpenPrompt: An Open-source Framework for Prompt-learning

Prompt-learning has become a new paradigm in modern natural language pro...

Please sign up or login with your details

Forgot password? Click here to reset