EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression

08/25/2023
by   Alexander Tsvetkov, et al.
0

We propose an unsupervised method to extract keywords and keyphrases from texts based on a pre-trained language model (LM) and Shannon's information maximization. Specifically, our method extracts phrases having the highest conditional entropy under the LM. The resulting set of keyphrases turns out to solve a relevant information-theoretic problem: if provided as side information, it leads to the expected minimal binary code length in compressing the text using the LM and an entropy encoder. Alternately, the resulting set is an approximation via a causal LM to the set of phrases that minimize the entropy of the text when conditioned upon it. Empirically, the method provides results comparable to the most commonly used methods in various keyphrase extraction benchmark challenges.

READ FULL TEXT
research
01/13/2022

Optimal alphabet for single text compression

A text can be viewed via different representations, i.e. as a sequence o...
research
06/13/2023

PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling

Although text-to-speech (TTS) systems have significantly improved, most ...
research
05/08/2023

PromptRank: Unsupervised Keyphrase Extraction Using Prompt

The keyphrase extraction task refers to the automatic selection of phras...
research
10/04/2017

Counterfactual Language Model Adaptation for Suggesting Phrases

Mobile devices use language models to suggest words and phrases for use ...
research
09/23/2021

Text Ranking and Classification using Data Compression

A well-known but rarely used approach to text categorization uses condit...
research
01/21/2020

Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference

Some NLP tasks can be solved in a fully unsupervised fashion by providin...
research
08/24/2023

Separating the Human Touch from AI-Generated Text using Higher Criticism: An Information-Theoretic Approach

We propose a method to determine whether a given article was entirely wr...

Please sign up or login with your details

Forgot password? Click here to reset