UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining

02/27/2022
by   Jiacheng Li, et al.
0

High-quality phrase representations are essential to finding topics and related terms in documents (a.k.a. topic mining). Existing phrase representation learning methods either simply combine unigram representations in a context-free manner or rely on extensive annotations to learn context-aware knowledge. In this paper, we propose UCTopic, a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining. UCTopic is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics. The key to pretraining is positive pair construction from our phrase-oriented assumptions. However, we find traditional in-batch negatives cause performance decay when finetuning on a dataset with small topic numbers. Hence, we propose cluster-assisted contrastive learning(CCL) which largely reduces noisy negatives by selecting negatives from clusters and further improves phrase representations for topics accordingly. UCTopic outperforms the state-of-the-art phrase representation model by 38.2 entity cluster-ing tasks. Comprehensive evaluation on topic mining shows that UCTopic can extract coherent and diverse topical phrases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase representations derived from BERT often do not exhibit complex ph...
research
05/28/2021

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

Identifying and understanding quality phrases from context is a fundamen...
research
08/01/2019

Contrastive Reasons Detection and Clustering from Online Polarized Debate

This work tackles the problem of unsupervised modeling and extraction of...
research
02/15/2017

Automated Phrase Mining from Massive Text Corpora

As one of the fundamental tasks in text analysis, phrase mining aims at ...
research
11/29/2021

SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation

Acronym disambiguation means finding the correct meaning of an ambiguous...
research
04/12/2022

Position-aware Location Regression Network for Temporal Video Grounding

The key to successful grounding for video surveillance is to understand ...
research
03/04/2020

Contrastive estimation reveals topic posterior information to linear models

Contrastive learning is an approach to representation learning that util...

Please sign up or login with your details

Forgot password? Click here to reset