Matching Long Text Documents via Graph Convolutional Networks

02/21/2018
by   Bang Liu, et al.
0

Identifying the relationship between two text objects is a core research problem underlying many natural language processing tasks. A wide range of deep learning schemes have been proposed for text matching, mainly focusing on sentence matching, question answering or query document matching. We point out that existing approaches do not perform well at matching long documents, which is critical, for example, to AI-based news article understanding and event or story formation. The reason is that these methods either omit or fail to fully utilize complicated semantic structures in long documents. In this paper, we propose a graph approach to text matching, especially targeting long document matching, such as identifying whether two news articles report the same event in the real world, possibly with different narratives. We propose the Concept Interaction Graph to yield a graph representation for a document, with vertices representing different concepts, each being one or a group of coherent keywords in the document, and with edges representing the interactions between different concepts, connected by sentences in the document. Based on the graph representation of document pairs, we further propose a Siamese Encoded Graph Convolutional Network that learns vertex representations through a Siamese neural network and aggregates the vertex features though Graph Convolutional Networks to generate the matching result. Extensive evaluation of the proposed approach based on two labeled news article datasets created at Tencent for its intelligent news products show that the proposed graph approach to long document matching significantly outperforms a wide range of state-of-the-art methods.

READ FULL TEXT
research
02/27/2019

Multiresolution Graph Attention Networks for Relevance Matching

A large number of deep learning models have been proposed for the text m...
research
02/01/2019

Dating Documents using Graph Convolution Networks

Document date is essential for many important tasks, such as document re...
research
08/29/2018

Question Answering by Reasoning Across Documents with Graph Convolutional Networks

Most research in reading comprehension has focused on answering question...
research
05/31/2020

Improve Document Embedding for Text Categorization Through Deep Siamese Neural Network

Due to the increasing amount of data on the internet, finding a highly-i...
research
05/19/2019

DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

Keyphrase extraction from documents is useful to a variety of applicatio...
research
11/15/2016

Knowledge Enhanced Hybrid Neural Network for Text Matching

Long text brings a big challenge to semantic matching due to their compl...
research
10/15/2020

Neural Deepfake Detection with Factual Structure of Text

Deepfake detection, the task of automatically discriminating machine-gen...

Please sign up or login with your details

Forgot password? Click here to reset