Sample-based distance-approximation for subsequence-freeness

05/02/2023
by   Omer Cohen Sidon, et al.
0

In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w = w_1 … w_k, a sequence (text) T = t_1 … t_n is said to contain w if there exist indices 1 ≤ i_1 < … < i_k ≤ n such that t_i_j = w_j for every 1 ≤ j ≤ k. Otherwise, T is w-free. Ron and Rosin (ACM TOCT 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p :[n]→ [0,1], we are interested in obtaining an estimate Δ, such that |Δ - Δ(T,w,p)| ≤δ with probability at least 2/3, for a given distance parameter δ. Our main result is an algorithm whose sample complexity is Õ(k^2/δ^2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2018

Distribution-free Junta Testing

We study the problem of testing whether an unknown n-variable Boolean fu...
research
01/01/2019

Almost Optimal Distribution-free Junta Testing

We consider the problem of testing whether an unknown n-variable Boolean...
research
09/19/2018

Exploring the Impact of Password Dataset Distribution on Guessing

Leaks from password datasets are a regular occurrence. An organization m...
research
08/17/2023

Distribution-Free Proofs of Proximity

Motivated by the fact that input distributions are often unknown in adva...
research
08/30/2023

Support Testing in the Huge Object Model

The Huge Object model is a distribution testing model in which we are gi...
research
06/27/2012

On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network

Bayesian Networks (BNs) are useful tools giving a natural and compact re...
research
10/31/2018

Testing Halfspaces over Rotation-Invariant Distributions

We present an algorithm for testing halfspaces over arbitrary, unknown r...

Please sign up or login with your details

Forgot password? Click here to reset