Fuzzy Segmentations of a String

01/31/2022
by   Armen Kostanyan, et al.
0

This article discusses a particular case of the data clustering problem, where it is necessary to find groups of adjacent text segments of the appropriate length that match a fuzzy pattern represented as a sequence of fuzzy properties. To solve this problem, a heuristic algorithm for finding a sufficiently large number of solutions is proposed. The key idea of the proposed algorithm is the use of the prefix structure to track the process of mapping text segments to fuzzy properties. An important special case of the text segmentation problem is the fuzzy string matching problem, when adjacent text segments have unit length and, accordingly, the fuzzy pattern is a sequence of fuzzy properties of text characters. It is proven that the heuristic segmentation algorithm in this case finds all text segments that match the fuzzy pattern. Finally, we consider the problem of a best segmentation of the entire text based on a fuzzy pattern, which is solved using the dynamic programming method. Keywords: fuzzy clustering, fuzzy string matching, approximate string matching

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

Fast and linear-time string matching algorithms based on the distances of q-gram occurrences

Given a text T of length n and a pattern P of length m, the string match...
research
02/17/2020

Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

The equidistant subsequence pattern matching problem is considered. Give...
research
10/25/2019

Massively Parallel Algorithms for String Matching with Wildcards

We study distributed algorithms for string matching problem in presence ...
research
09/24/2020

Novel Keyword Extraction and Language Detection Approaches

Fuzzy string matching and language classification are important tools in...
research
12/02/2018

Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

Unbalanced translocations are among the most frequent chromosomal altera...
research
01/03/2021

Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

In this paper we investigate the approximate string matching problem whe...
research
12/08/2016

A fuzzy approach for segmentation of touching characters

The problem of correctly segmenting touching characters is an hard task ...

Please sign up or login with your details

Forgot password? Click here to reset