A Novel Metric for Evaluating Semantics Preservation

10/04/2021
by   Letian Peng, et al.
0

In this paper, we leverage pre-trained language models (PLMs) to precisely evaluate the semantics preservation of edition process on sentences. Our metric, Neighbor Distribution Divergence (NDD), evaluates the disturbance on predicted distribution of neighboring words from mask language model (MLM). NDD is capable of detecting precise changes in semantics which are easily ignored by text similarity. By exploiting the property of NDD, we implement a unsupervised and even training-free algorithm for extractive sentence compression. We show that our NDD-based algorithm outperforms previous perplexity-based unsupervised algorithm by a large margin. For further exploration on interpretability, we evaluate NDD by pruning on syntactic dependency treebanks and apply NDD for predicate detection as well.

READ FULL TEXT
research
08/07/2023

Knowledge-preserving Pruning for Pre-trained Language Models without Retraining

Given a pre-trained language model, how can we efficiently compress it w...
research
10/29/2021

Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Unsupervised constituency parsing has been explored much but is still fa...
research
08/27/2018

Large Margin Neural Language Model

We propose a large margin criterion for training neural language models....
research
04/19/2021

Refining Targeted Syntactic Evaluation of Language Models

Targeted syntactic evaluation of subject-verb number agreement in Englis...
research
09/26/2022

Entailment Semantics Can Be Extracted from an Ideal Language Model

Language models are often trained on text alone, without additional grou...
research
04/14/2021

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Learning sentence embeddings often requires large amount of labeled data...

Please sign up or login with your details

Forgot password? Click here to reset