Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling

10/11/2022
by   Yuanhang Yang, et al.
0

Transformer-based models have achieved great success on sentence pair modeling tasks, such as answer selection and natural language inference (NLI). These models generally perform cross-attention over input pairs, leading to prohibitive computational costs. Recent studies propose dual-encoder and late interaction architectures for faster computation. However, the balance between the expressive of cross-attention and computation speedup still needs better coordinated. To this end, this paper introduces a novel paradigm MixEncoder for efficient sentence pair modeling. MixEncoder involves a light-weight cross-attention mechanism. It conducts query encoding only once while modeling the query-candidate interaction in parallel. Extensive experiments conducted on four tasks demonstrate that our MixEncoder can speed up sentence pairing by over 113x while achieving comparable performance as the more expensive cross-attention models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2019

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

Attention based models have become the new state-of-the-art in natural l...
research
05/02/2022

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Inference tasks such as answer sentence selection (AS2) or fact verifica...
research
03/20/2022

DEIM: An effective deep encoding and interaction model for sentence matching

Natural language sentence matching is the task of comparing two sentence...
research
05/30/2016

Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention

In this paper, we proposed a sentence encoding-based model for recognizi...
research
10/27/2022

AutoAttention: Automatic Field Pair Selection for Attention in User Behavior Modeling

In Click-through rate (CTR) prediction models, a user's interest is usua...
research
12/08/2021

VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction

With the booming of pre-trained transformers, remarkable progress has be...
research
10/07/2020

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / ...

Please sign up or login with your details

Forgot password? Click here to reset