Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning

05/25/2023
by   Daniel Saggau, et al.
0

Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method – siamese neural network – with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

Deep Bregman Divergence for Contrastive Learning of Visual Representations

Deep Bregman divergence measures divergence of data points using neural ...
research
06/24/2015

Efficient Learning for Undirected Topic Models

Replicated Softmax model, a well-known undirected topic model, is powerf...
research
03/04/2020

Contrastive estimation reveals topic posterior information to linear models

Contrastive learning is an approach to representation learning that util...
research
04/19/2023

Shuffle Divide: Contrastive Learning for Long Text

We propose a self-supervised learning method for long text documents bas...
research
06/05/2020

DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual R...
research
08/20/2021

Supervised Contrastive Learning for Interpretable Long Document Comparison

Recent advancements in deep learning techniques have transformed the are...
research
01/28/2022

PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings

Learning sentence embeddings in an unsupervised manner is fundamental in...

Please sign up or login with your details

Forgot password? Click here to reset