Dissecting Span Identification Tasks with Performance Prediction

10/06/2020
by   Sean Papay, et al.
0

Span identification (in short, span ID) tasks such as chunking, NER, or code-switching detection, ask models to identify and classify relevant spans in a text. Despite being a staple of NLP, and sharing a common structure, there is little insight on how these tasks' properties influence their difficulty, and thus little guidance on what model families work well on span ID tasks, and why. We analyze span ID tasks via performance prediction, estimating how well neural architectures do on different tasks. Our contributions are: (a) we identify key properties of span ID tasks that can inform performance prediction; (b) we carry out a large-scale experiment on English data, building a model to predict performance for unseen span ID tasks that can support architecture choices; (c), we investigate the parameters of the meta model, yielding new insights on how model and task properties interact to affect span ID performance. We find, e.g., that span frequency is especially important for LSTMs, and that CRFs help when spans are infrequent and boundaries non-distinctive.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

An Empirical Study on Finding Spans

We present an empirical study on methods for span finding, the selection...
research
10/17/2022

PeerDA: Data Augmentation via Modeling Peer Relation for Span Identification Tasks

Span Identification (SpanID) is a family of NLP tasks that aims to detec...
research
02/10/2023

Span-based Named Entity Recognition by Generating and Compressing Information

The information bottleneck (IB) principle has been proven effective in v...
research
05/05/2023

CLaC at SemEval-2023 Task 2: Comparing Span-Prediction and Sequence-Labeling approaches for NER

This paper summarizes the CLaC submission for the MultiCoNER 2 task whic...
research
02/24/2021

NLRG at SemEval-2021 Task 5: Toxic Spans Detection Leveraging BERT-based Token Classification and Span Prediction Techniques

Toxicity detection of text has been a popular NLP task in the recent yea...
research
06/25/2019

From IP ID to Device ID and KASLR Bypass (Extended Version)

IP headers include a 16-bit ID field. Our work examines the generation o...
research
06/19/2023

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction

Universal Information Extraction (UIE) has been introduced as a unified ...

Please sign up or login with your details

Forgot password? Click here to reset