Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

04/07/2023
by   Jaeho Jeong, et al.
0

Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3 reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.

READ FULL TEXT
research
02/15/2023

Indel Error Correction Codes for DNA Digital Data Storage and Retrieval

A procedure for storage and retrieval of Digital information in DNA stri...
research
11/11/2020

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

Due to its high data density and longevity, DNA is considered a promisin...
research
11/09/2018

Representation-Oblivious Error Correction by Natural Redundancy

Storage systems have a strong need for substantially improving their err...
research
01/09/2020

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

We propose coding techniques that limit the length of homopolymers runs,...
research
08/03/2022

Low-redundancy codes for correcting multiple short-duplication and edit errors

Due to its higher data density, longevity, energy efficiency, and ease o...
research
03/05/2021

Iterative DNA Coding Scheme With GC Balance and Run-Length Constraints Using a Greedy Algorithm

In this paper, we propose a novel iterative encoding algorithm for DNA s...
research
06/24/2019

Survey of Information Encoding Techniques for DNA

Key to DNA storage is encoding the information to a sequence of nucleoti...

Please sign up or login with your details

Forgot password? Click here to reset