Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification

07/22/2020
by   Nicole Peinelt, et al.
0

Question paraphrase identification is a key task in Community Question Answering (CQA) to determine if an incoming question has been previously asked. Many current models use word embeddings to identify duplicate questions, but the use of topic models in feature-engineered systems suggests that they can be helpful for this task, too. We therefore propose two ways of merging topics with word embeddings (early vs. late fusion) in a new neural architecture for question paraphrase identification. Our results show that our system outperforms neural baselines on multiple CQA datasets, while an ablation study highlights the importance of topics and especially early topic-embedding fusion in our architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2017

Jointly Learning Word Embeddings and Latent Topics

Word embedding models such as Skip-gram learn a vector-space representat...
research
03/15/2016

Topic Modeling Using Distributed Word Embeddings

We propose a new algorithm for topic modeling, Vec2Topic, that identifie...
research
07/10/2020

Topic Modeling on User Stories using Word Mover's Distance

Requirements elicitation has recently been complemented with crowd-based...
research
04/21/2022

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Recent work incorporates pre-trained word embeddings such as BERT embedd...
research
08/09/2017

Identifying Reference Spans: Topic Modeling and Word Embeddings help IR

The CL-SciSumm 2016 shared task introduced an interesting problem: given...
research
09/12/2018

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

We propose a novel Wasserstein method with a distillation mechanism, yie...

Please sign up or login with your details

Forgot password? Click here to reset