Structure with Semantics: Exploiting Document Relations for Retrieval

01/11/2022
by   Natraj Raman, et al.
0

Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text. The inclusion of structural relationship between documents can benefit the retrieval mechanism by addressing semantic gaps. However, incorporating these relationships requires tractable mechanisms that balance structure with semantics and take advantage of the prevalent pre-train/fine-tune paradigm. We propose here a holistic approach to learning document representations by integrating intra-document content with inter-document relations. Our deep metric learning solution analyzes the complex neighborhood structure in the relationship network to efficiently sample similar/dissimilar document pairs and defines a novel quintuplet loss function that simultaneously encourages document pairs that are semantically relevant to be closer and structurally unrelated to be far apart in the representation space. Furthermore, the separation margins between the documents are varied flexibly to encode the heterogeneity in relationship strengths. The model is fully fine-tunable and natively supports query projection during inference. We demonstrate that it outperforms competing methods on multiple datasets for document retrieval tasks.

READ FULL TEXT

page 9

page 11

research
11/03/2019

MRNN: A Multi-Resolution Neural Network with Duplex Attention for Document Retrieval in the Context of Question Answering

The primary goal of ad-hoc retrieval (document retrieval in the context ...
research
11/15/2016

SimDoc: Topic Sequence Alignment based Document Similarity Framework

Document similarity is the problem of estimating the degree to which a g...
research
03/15/2022

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

Dense retrieval models, which aim at retrieving the most relevant docume...
research
12/20/2022

Fine-Grained Distillation for Long Document Retrieval

Long document retrieval aims to fetch query-relevant documents from a la...
research
11/27/2018

A Concept-Centered Hypertext Approach to Case-Based Retrieval

The goal of case-based retrieval is to assist physicians in the clinical...
research
02/18/2022

Modelling the semantics of text in complex document layouts using graph transformer networks

Representing structured text from complex documents typically calls for ...
research
05/27/2021

Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval

With the need of fast retrieval speed and small memory footprint, docume...

Please sign up or login with your details

Forgot password? Click here to reset