SpanBERT: Improving Pre-training by Representing and Predicting Spans

by   Mandar Joshi, et al.
Princeton University
University of Washington

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT-large, our single model obtains 94.6 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6 benchmark (70.8


page 1

page 2

page 3

page 4


Span Selection Pre-training for Question Answering

BERT (Bidirectional Encoder Representations from Transformers) and relat...

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

In this paper, we generalize text infilling (e.g., masked language model...

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

We introduce a novel approach to transformers that learns hierarchical r...

BERTVision – A Parameter-Efficient Approach for Question Answering

We present a highly parameter efficient approach for Question Answering ...

Semantic Search as Extractive Paraphrase Span Detection

In this paper, we approach the problem of semantic search by framing the...

An Empirical Study on Finding Spans

We present an empirical study on methods for span finding, the selection...

A Cross-Task Analysis of Text Span Representations

Many natural language processing (NLP) tasks involve reasoning with text...

Please sign up or login with your details

Forgot password? Click here to reset