Document Similarity for Texts of Varying Lengths via Hidden Topics

03/26/2019
by   Hongyu Gong, et al.
0

Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its summary. This is because of the lexical, contextual and the abstraction gaps between a long document of rich details and its concise summary of abstract information. In this paper, we present a document matching approach to bridge this gap, by comparing the texts in a common space of hidden topics. We evaluate the matching algorithm on two matching tasks and find that it consistently and widely outperforms strong baselines. We also highlight the benefits of incorporating domain knowledge to text matching.

READ FULL TEXT
research
04/23/2019

Wasserstein-Fisher-Rao Document Distance

As a fundamental problem of natural language processing, it is important...
research
10/08/2022

EDU-level Extractive Summarization with Varying Summary Lengths

Extractive models usually formulate text summarization as extracting top...
research
03/01/2020

StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization

Traditional preneural approaches to single document summarization relied...
research
12/06/2022

Document-Level Abstractive Summarization

The task of automatic text summarization produces a concise and fluent t...
research
07/08/2018

Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retrieval in Asymmetric Texts

The goal of our industrial ticketing system is to retrieve a relevant so...
research
12/02/2021

KPDrop: An Approach to Improving Absent Keyphrase Generation

Keyphrase generation is the task of generating phrases (keyphrases) that...
research
05/25/2016

SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

There have been multiple attempts to resolve various inflection matching...

Please sign up or login with your details

Forgot password? Click here to reset