PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

07/19/2022
by   Thang M. Pham, et al.
4

Since BERT (Devlin et al., 2018), learning contextualized word embeddings has been a de-facto standard in NLP. However, the progress of learning contextualized phrase embeddings is hindered by the lack of a human-annotated, phrase-in-context benchmark. To fill this gap, we propose PiC - a dataset of  28K of noun phrases accompanied by their contextual Wikipedia pages and a suite of three tasks of increasing difficulty for evaluating the quality of phrase embeddings. We find that training on our dataset improves ranking models' accuracy and remarkably pushes Question Answering (QA) models to near-human accuracy which is 95 query phrase and a passage. Interestingly, we find evidence that such impressive performance is because the QA models learn to better capture the common meaning of a phrase regardless of its actual context. That is, on our Phrase Sense Disambiguation (PSD) task, SotA model accuracy drops substantially (60 phrase under two different contexts. Further results on our 3-task PiC benchmark reveal that learning contextualized phrase embeddings remains an interesting, open challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2020

Supervised Phrase-boundary Embeddings

We propose a new word embedding model, called SPhrase, that incorporates...
research
08/01/2022

Patents Phrase to Phrase Semantic Matching Dataset

There are many general purpose benchmark datasets for Semantic Textual S...
research
04/17/2016

From Incremental Meaning to Semantic Unit (phrase by phrase)

This paper describes an experimental approach to Detection of Minimal Se...
research
06/04/2019

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

Common language models typically predict the next word given the context...
research
11/02/2020

Sequence-to-Sequence Networks Learn the Meaning of Reflexive Anaphora

Reflexive anaphora present a challenge for semantic interpretation: thei...
research
02/07/2020

How do Quantifiers Affect the Quality of Requirements?

Context: Requirements quality can have a substantial impact on the effec...
research
04/17/2023

What Makes a Good Dataset for Symbol Description Reading?

The usage of mathematical formulas as concise representations of a docum...

Please sign up or login with your details

Forgot password? Click here to reset