Optimal Transport-based Alignment of Learned Character Representations for String Similarity

07/23/2019
by   Derek Tam, et al.
0

String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE --a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity--a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE or one of its variants outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE's ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B^3 F1 over the previous state-of-the-art approach.

READ FULL TEXT

page 4

page 10

page 11

research
03/11/2022

Semi-constraint Optimal Transport for Entity Alignment with Dangling Cases

Entity alignment (EA) merges knowledge graphs (KGs) by identifying the e...
research
10/08/2021

Contrastive String Representation Learning using Synthetic Data

String representation Learning (SRL) is an important task in the field o...
research
09/05/2022

Conflict-Aware Pseudo Labeling via Optimal Transport for Entity Alignment

Entity alignment aims to discover unique equivalent entity pairs with th...
research
05/28/2015

Query by String word spotting based on character bi-gram indexing

In this paper we propose a segmentation-free query by string word spotti...
research
11/04/2020

Neural text normalization leveraging similarities of strings and sounds

We propose neural models that can normalize text by considering the simi...
research
05/27/2020

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Selecting input features of top relevance has become a popular method fo...
research
01/13/2021

Toward Data Cleaning with a Target Accuracy: A Case Study for Value Normalization

Many applications need to clean data with a target accuracy. As far as w...

Please sign up or login with your details

Forgot password? Click here to reset