Data-Driven Bee Identification for DNA Strands

05/08/2023
by   Shubhransh Singhvi, et al.
0

We study a data-driven approach to the bee identification problem for DNA strands. The bee-identification problem, introduced by Tandon et al. (2019), requires one to identify M bees, each tagged by a unique barcode, via a set of M noisy measurements. Later, Chrisnata et al. (2022) extended the model to case where one observes N noisy measurements of each bee, and applied the model to address the unordered nature of DNA storage systems. In such systems, a unique address is typically prepended to each DNA data block to form a DNA strand, but the address may possibly be corrupted. While clustering is usually used to identify the address of a DNA strand, this requires ℳ^2 data comparisons (when ℳ is the number of reads). In contrast, the approach of Chrisnata et al. (2022) avoids data comparisons completely. In this work, we study an intermediate, data-driven approach to this identification task. For the binary erasure channel, we first show that we can almost surely correctly identify all DNA strands under certain mild assumptions. Then we propose a data-driven pruning procedure and demonstrate that on average the procedure uses only a fraction of ℳ^2 data comparisons. Specifically, for ℳ= 2^n and erasure probability p, the expected number of data comparisons performed by the procedure is κℳ^2, where (1+2p-p^2/2)^n ≤κ≤(1+p/2)^n.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

DNA-Based Storage: Models and Fundamental Limits

Due to its longevity and enormous information density, DNA is an attract...
research
12/02/2021

Achieving the Capacity of a DNA Storage Channel with Linear Coding Schemes

Due to the redundant nature of DNA synthesis and sequencing technologies...
research
08/04/2023

Model Provenance via Model DNA

Understanding the life cycle of the machine learning (ML) model is an in...
research
01/04/2019

Efficient and Explicit Balanced Primer Codes

To equip DNA-based data storage with random-access capabilities, Yazdi e...
research
08/31/2021

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

The concept of DNA storage was first suggested in 1959 by Richard Feynma...
research
11/28/2020

Cyberbiosecurity: DNA Injection Attack in Synthetic Biology

Today arbitrary synthetic DNA can be ordered online and delivered within...
research
08/29/2019

Analysis of a DNA mixture case involving Romani reference populations

Here we present an Italian criminal case that shows how statistical meth...

Please sign up or login with your details

Forgot password? Click here to reset