DeepAI AI Chat
Log In Sign Up

Document Similarity for Texts of Varying Lengths via Hidden Topics

by   Hongyu Gong, et al.
University of Illinois at Urbana-Champaign

Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its summary. This is because of the lexical, contextual and the abstraction gaps between a long document of rich details and its concise summary of abstract information. In this paper, we present a document matching approach to bridge this gap, by comparing the texts in a common space of hidden topics. We evaluate the matching algorithm on two matching tasks and find that it consistently and widely outperforms strong baselines. We also highlight the benefits of incorporating domain knowledge to text matching.


Wasserstein-Fisher-Rao Document Distance

As a fundamental problem of natural language processing, it is important...

EDU-level Extractive Summarization with Varying Summary Lengths

Extractive models usually formulate text summarization as extracting top...

StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization

Traditional preneural approaches to single document summarization relied...

Document-Level Abstractive Summarization

The task of automatic text summarization produces a concise and fluent t...

Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retrieval in Asymmetric Texts

The goal of our industrial ticketing system is to retrieve a relevant so...

An Enhanced MeanSum Method For Generating Hotel Multi-Review Summarizations

Multi-document summaritazion is the process of taking multiple texts as ...

KPDrop: An Approach to Improving Absent Keyphrase Generation

Keyphrase generation is the task of generating phrases (keyphrases) that...