Towards Unsupervised Recognition of Semantic Differences in Related Documents

05/22/2023
by   Jannis Vamvas, et al.
0

Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications. We formulate recognizing semantic differences (RSD) as a token-level regression task and study three unsupervised approaches that rely on a masked language model. To assess the approaches, we begin with basic English sentences and gradually move to more complex, cross-lingual document pairs. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels. However, all unsupervised approaches still leave a large margin of improvement. Code to reproduce our experiments is available at https://github.com/ZurichNLP/recognizing-semantic-differences

READ FULL TEXT
research
01/31/2020

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

Cross-lingual document alignment aims to identify pairs of documents in ...
research
06/11/2021

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

The cross-lingual language models are typically pretrained with masked l...
research
05/16/2023

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Recent studies have shown that dual encoder models trained with the sent...
research
04/02/2022

CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment Analysis

As an extensive research in the field of Natural language processing (NL...
research
05/09/2022

EASE: Entity-Aware Contrastive Learning of Sentence Embedding

We present EASE, a novel method for learning sentence embeddings via con...
research
05/09/2021

DocSCAN: Unsupervised Text Classification via Learning from Neighbors

We introduce DocSCAN, a completely unsupervised text classification appr...
research
07/29/2017

Bilingual Document Alignment with Latent Semantic Indexing

We apply cross-lingual Latent Semantic Indexing to the Bilingual Documen...

Please sign up or login with your details

Forgot password? Click here to reset