Error-Correcting Codes for Nanopore Sequencing
Nanopore sequencers, being superior to other sequencing technologies for DNA storage in multiple aspects, have attracted considerable attention in recent times. Their high error rates however demand thorough research on practical and efficient coding schemes to enable accurate recovery of stored data. To this end, we consider a simplified model of a nanopore sequencer inspired by Mao et al., that incorporates intersymbol interference and measurement noise. Essentially, our channel model passes a sliding window of length ℓ over an input sequence, that outputs the L_1-weight of the enclosed ℓ bits and shifts by δ positions with each time step. The resulting (ℓ+1)-ary vector, termed the read vector, may also be corrupted by t substitution errors. By employing graph-theoretic techniques, we deduce that for δ=1, at least loglog n bits of redundancy are required to correct a single (t=1) substitution. Finally for ℓ≥ 3, we exploit some inherent characteristics of read vectors to arrive at an error-correcting code that is optimal up to an additive constant for this setting.
READ FULL TEXT