# Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

The problem called "String reconstruction from substrings" is a mathematical model of sequencing by hybridization that plays an important role in DNA sequencing. In this problem, we are given a blackbox oracle holding an unknown string X and are required to obtain (reconstruct) X through "substring queries" Q(S). Q(S) is given to the oracle with a string S and the answer of the oracle is Yes if X includes S as a substring and No otherwise. Our goal is to minimize the number of queries for the reconstruction. In this paper, we deal with only binary strings for X whose length n is given in advance by using a sequence of good S's. In 1995, Skiena and Sundaram first studied this problem and obtained an algorithm whose query complexity is n+O( n). Its information theoretic lower bound is n, and they posed an obvious open question; if we can remove the O( n) additive term. No progress has been made until now. This paper gives two partially positive answers to this open question. One is a randomized algorithm whose query complexity is n+O(1) with high probability and the other is an average-case algorithm also having a query complexity of n+O(1) on average. The n lower bound is still true for both cases, and hence they are optimal up to an additive constant.

• 4 publications
• 7 publications
• 1 publication
research
11/13/2020

### Substring Query Complexity of String Reconstruction

Suppose an oracle knows a string S that is unknown to us and we want to ...
research
09/08/2021

### Simplified Quantum Algorithm for the Oracle Identification Problem

In the oracle identification problem we have oracle access to bits of an...
research
07/17/2020

### Adaptive Exact Learning in a Mixed-Up World: Dealing with Periodicity, Errors and Jumbled-Index Queries in String Reconstruction

We study the query complexity of exactly reconstructing a string from ad...
research
08/26/2021

### Multi-strand Reconstruction from Substrings

The problem of string reconstruction based on its substrings spectrum ha...
research
11/22/2018

### The Statistical Dictionary-based String Matching Problem

In the Dictionary-based String Matching (DSM) problem, a retrieval syste...
research
08/30/2022

### MC^2: Rigorous and Efficient Directed Greybox Fuzzing

Directed greybox fuzzing is a popular technique for targeted software te...
research
07/26/2021

### Approximating Sumset Size

Given a subset A of the n-dimensional Boolean hypercube 𝔽_2^n, the sumse...