A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

07/04/2012
by   Andrew McCallum, et al.
0

The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2022

Algorithm to derive shortest edit script using Levenshtein distance algorithm

String similarity, longest common subsequence and shortest edit scripts ...
research
05/07/2019

Kendall Tau Sequence Distance: Extending Kendall Tau from Ranks to Sequences

An edit distance is a measure of the minimum cost sequence of edit opera...
research
02/09/2023

Locally consistent decomposition of strings with applications to edit distance sketching

In this paper we provide a new locally consistent decomposition of strin...
research
11/10/2018

Efficiently Approximating Edit Distance Between Pseudorandom Strings

We present an algorithm for approximating the edit distance ed(x, y) bet...
research
11/25/2019

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

Analysis of large-scale sequential data has been one of the most crucial...
research
12/04/2019

Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string

Strings are a natural representation of biological data such as DNA, RNA...
research
04/16/2021

Neural String Edit Distance

We propose the neural string edit distance model for string-pair classif...

Please sign up or login with your details

Forgot password? Click here to reset