Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching

01/16/2021
by   Liang Pang, et al.
0

Semantic text matching models have been widely used in community question answering, information retrieval, and dialogue. However, these models cannot well address the long-form text matching problem. That is because there are usually many noises in the setting of long-form text matching, and it is difficult for existing semantic text matching to capture the key matching signals from this noisy information. Besides, these models are computationally expensive because they simply use all textual data indiscriminately in the matching process. To tackle the effectiveness and efficiency problem, we propose a novel hierarchical noise filtering model in this paper, namely Match-Ignition. The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information in the matching process. Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information, based on a sentence similarity graph. While words need to rely on their contexts to express concrete meanings, so we propose to jointly learn the filtering process and the matching process, to reflect the contextual dependencies between words. Specifically, a word graph is first built based on the attention scores in each self-attention block of Transformer, and keywords are then selected by applying PageRank on this graph. In this way, noisy words will be filtered out layer by layer in the matching process. Experimental results show that Match-Ignition outperforms both traditional text matching models for short text and recent long-form text matching models. We also conduct detailed analysis to show that Match-Ignition can efficiently capture important sentences or words, which are helpful for long-form text matching.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2020

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching

Many information retrieval and natural language processing problems can ...
research
03/06/2022

Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

Text semantic matching is a fundamental task that has been widely used i...
research
11/15/2016

Knowledge Enhanced Hybrid Neural Network for Text Matching

Long text brings a big challenge to semantic matching due to their compl...
research
08/16/2021

Toward the Understanding of Deep Text Matching Models for Information Retrieval

Semantic text matching is a critical problem in information retrieval. R...
research
02/27/2019

Multiresolution Graph Attention Networks for Relevance Matching

A large number of deep learning models have been proposed for the text m...
research
04/15/2016

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN

Semantic matching, which aims to determine the matching degree between t...
research
04/08/2023

The Short Text Matching Model Enhanced with Knowledge via Contrastive Learning

In recent years, short Text Matching tasks have been widely applied in t...

Please sign up or login with your details

Forgot password? Click here to reset