Document Network Embedding: Coping for Missing Content and Missing Links

12/06/2019
by   Jean Dupuy, et al.
0

Searching through networks of documents is an important task. A promising path to improve the performance of information retrieval systems in this context is to leverage dense node and content representations learned with embedding techniques. However, these techniques cannot learn representations for documents that are either isolated or whose content is missing. To tackle this issue, assuming that the topology of the network and the content of the documents correlate, we propose to estimate the missing node representations from the available content representations, and conversely. Inspired by recent advances in machine translation, we detail in this paper how to learn a linear transformation from a set of aligned content and node representations. The projection matrix is efficiently calculated in terms of the singular value decomposition. The usefulness of the proposed method is highlighted by the improved ability to predict the neighborhood of nodes whose links are unobserved based on the projected content representations, and to retrieve similar documents when content is missing, based on the projected node representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2017

Content Based Document Recommender using Deep Learning

With the recent advancements in information technology there has been a ...
research
01/16/2020

Document Network Projection in Pretrained Word Embedding Space

We present Regularized Linear Embedding (RLE), a novel method that proje...
research
02/28/2019

Global Vectors for Node Representations

Most network embedding algorithms consist in measuring co-occurrences of...
research
03/14/2023

Improving information retrieval through correspondence analysis instead of latent semantic analysis

Both latent semantic analysis (LSA) and correspondence analysis (CA) are...
research
02/20/2021

CDA: a Cost Efficient Content-based Multilingual Web Document Aligner

We introduce a Content-based Document Alignment approach (CDA), an effic...
research
05/20/2019

Why Machines Cannot Learn Mathematics, Yet

Nowadays, Machine Learning (ML) is seen as the universal solution to imp...

Please sign up or login with your details

Forgot password? Click here to reset