Survey of Information Encoding Techniques for DNA
Key to DNA storage is encoding the information to a sequence of nucleotides before it can be synthesised for storage. Definition of such an encoding or mapping must adhere to multiple design restrictions. First, not all possible sequences of nucleotides can be synthesised. Homopolymers, e.g., sequences of the same nucleotide, of a length of more than two, for example, cannot be synthesised without potential errors. Similarly, the G-C content of the resulting sequences should be higher than 50%. Second, given that synthesis is expensive, the encoding must map as many bits as possible to one nucleotide. Third, the synthesis (as well as the sequencing) is error prone, leading to substitutions, deletions and insertions. An encoding must therefore be designed to be resilient to errors through error correction codes or replication. Fourth, for the purpose of computation and selective retrieval, encodings should result in substantially different sequences across all data, even for very similar data. In the following we discuss the history and evolution of encodings.
READ FULL TEXT