Joint String Complexity for Markov Sources: Small Data Matters

05/23/2018
by   Philippe Jacquet, et al.
0

String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we introduce the joint string complexity as the cardinality of a set of words that are common to both strings. String complexity finds a number of applications from capturing the richness of a language to finding similarities between two genome sequences. In this paper we analyze the joint string complexity when both strings are generated by Markov sources. We prove that the joint string complexity grows linearly (in terms of the string lengths) when both sources are statistically indistinguishable and sublinearly when sources are statistically not the same. Precise analysis of the joint string complexity turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory phenomena with single and double periodicities. To overcome these challenges, we apply powerful analytic techniques such as multivariate generating functions, multivariate depoissonization and Mellin transform, spectral matrix analysis, and complex asymptotic methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2017

On String Contact Representations in 3D

An axis-aligned string is a simple polygonal path, where each line segme...
research
01/16/2021

Binary strings of finite VC dimension

Any binary string can be associated with a unary predicate P on ℕ. In th...
research
10/04/2018

Longest Property-Preserved Common Factor

In this paper we introduce a new family of string processing problems. W...
research
07/24/2019

Exhaustive Exact String Matching: The Analysis of the Full Human Genome

Exact string matching has been a fundamental problem in computer science...
research
01/27/2018

A Characterization of Guesswork on Swiftly Tilting Curves

Given a collection of strings, each with an associated probability of oc...
research
11/08/2022

Comparing Two Counting Methods for Estimating the Probabilities of Strings

There are two methods for counting the number of occurrences of a string...
research
11/25/2019

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

Analysis of large-scale sequential data has been one of the most crucial...

Please sign up or login with your details

Forgot password? Click here to reset