Skip to main content
Fig. 1 | BMC Biotechnology

Fig. 1

From: A highly parallel strategy for storage of digital information in living cells

Fig. 1

A diagram of the encoding and decoding process. The input data is first wrapped in a tar archive, to ensure a uniform input format as well as combining multiple files into a single contiguous data stream with a well-established method. The digital information is encoded using a pre-generated codebook, producing one long single sequence of DNA. This sequence is split into overlapping packets, each up to 200 bp long, which are then synthesized as a complex pool of oligonucleotides. These can be cloned into plasmids and transformed into cells, where they can be maintained reliably for a very long time. To recover the information, the population of cells (or alternatively plasmids or lyophilized oligonucleotides) can be sequenced with NextGen sequencing technology, and de novo assembly of the resulting reads is performed. During assembly, some errors can be corrected by simply considering the consensus of the contig, whereas systematic errors (such as those arising during synthesis) can be corrected in silico using the error correcting code. Finally, the codebook is used to decode the resulting contig and recover the digital files

Back to article page