UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

05/28/2021
by   Xiaotao Gu, et al.
0

Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2017

Automated Phrase Mining from Massive Text Corpora

As one of the fundamental tasks in text analysis, phrase mining aims at ...
research
02/27/2022

UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining

High-quality phrase representations are essential to finding topics and ...
research
06/18/1999

Automatically Selecting Useful Phrases for Dialogue Act Tagging

We present an empirical investigation of various ways to automatically i...
research
06/22/2022

Multi-View Clustering for Open Knowledge Base Canonicalization

Open information extraction (OIE) methods extract plenty of OIE triples ...
research
08/21/2020

To Paraphrase or Not To Paraphrase: User-Controllable Selective Paraphrase Generation

In this article, we propose a paraphrase generation technique to keep th...
research
01/26/2021

Medical Segment Coloring of Clinical Notes

This paper proposes a deep learning-based method to identify the segment...
research
04/18/2023

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

This paper presents an extension to train end-to-end Context-Aware Trans...

Please sign up or login with your details

Forgot password? Click here to reset