EE 388 course project Outline DNA storage model Capacity computation Two coding strategies Experimental results Conclusion and future work DNA storage DNA as a storage medium High density 215 petabytesgram ID: 1007430
Download Presentation The PPT/PDF document "Coding for DNA storage Shubham Chandak, ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Coding for DNA storageShubham Chandak, Kedar TatwawadiEE 388 course project
2. OutlineDNA storage modelCapacity computationTwo coding strategiesExperimental resultsConclusion and future work
3. DNA storageDNA as a storage mediumHigh density: 215 petabytes/gram1High durability2Synthesis (writing) and sequencing (reading) error-prone and expensiveError correction needed for reliable data recovery1. Erlich, Y., & Zielinski, D. (2017). DNA Fountain enables a robust and efficient storage architecture. Science, 355(6328), 950-954.2. Grass, R. N., Heckel, R., Puddu, M., Paunescu, D., & Stark, W. J. (2015). Robust Chemical Preservation of Digital Information on DNA in Silica with Error‐Correcting Codes. Angewandte Chemie International Edition, 54(8), 2552-2555.
4. Storage model IEncodingPBinary dataPool of distinct short DNA sequences(~150 nucleotides)ReadingP“Reads” sampled with replacement+substitution errorsDecodeBinary data
5. Storage model IIFor this talk:Ignore DNA symbols and constraints – work with binary sequencesAssume that the index of each sequence is transmitted without error
6. Storage model IIEncodingP information bits sequenceswith =256 bits each “reads” sampled with replacement BSC() 123...P427...2P427...2Noisy reads
7. Storage model IIEncodingP information bits sequenceswith =256 bits each “reads” sampled with replacement BSC() 123...P427...2P427...2Noisy readsGoal: Analyze and achieve optimal tradeoff between : Error correction bits used per information bit : Coverage – bits read per information bit
8. Capacity: error free readsLet Poisson model: Erasure channel over bit strings with erasure probability
9. Capacity: error free readsLet Poisson model: Erasure channel over bit strings with erasure probability Rate =
10. Capacity: BSC() Memoryless channel to get lower bound on capacity = number of times bit was read = number of times bit was read as 0
11. Capacity: BSC() Memoryless channel to get lower bound on capacity = number of times bit was read = number of times bit was read as 0BMS channel with capacity For each , numerically find so that
12. Capacity plot
13. Coding strategy I: RaptorQ + BCHRaptorQ1Rateless erasure codeFor K source packets: 99.9999% probability of recovery given K+2 packetsBCH2: Good minimum distance propertiesEncodingDecoding: Consensus -> BCH decoding -> Raptor decoding1. https://tools.ietf.org/html/rfc63302. Bose, R. C., & Ray-Chaudhuri, D. K. (1960). On a class of error correcting binary group codes. Information and control, 3(1), 68-79.SegmentRaptorQBCH nL bits
14. Coding strategy I: results Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75* 100 successes out of 100 random trialshttps://pypi.org/project/libraptorq/https://github.com/jkent/python-bchlib
15. Coding strategy I: results Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75* 100 successes out of 100 random trialshttps://pypi.org/project/libraptorq/https://github.com/jkent/python-bchlibOptimal coverageAchieved coverage*0.12.796.700.22.274.300.32.013.300.51.732.50Optimal coverageAchieved coverage*0.12.796.700.22.274.300.32.013.300.51.732.50
16. Limitations of small block length codesPlot generated using code at https://github.com/yp-mit/spectre
17. Coding strategy II: LDPCLLR = = number of times bit was read = number of times bit was read as 0Regular LDPC Segmentn()L bits LDPCnL bits
18. Coding strategy II: results ILDPC (l,k)Optimal coverageDE threshold coverageAchieved coverage* (LDPC)Achieved coverage* (Strategy I)(3,33)0.12.793.023.256.70(3,18)0.22.272.502.704.30(3,13)0.32.012.262.453.30(3,9)0.51.732.002.102.50LDPC (l,k)Optimal coverageDE threshold coverageAchieved coverage* (LDPC)Achieved coverage* (Strategy I)(3,33)0.12.793.023.256.70(3,18)0.22.272.502.704.30(3,13)0.32.012.262.453.30(3,9)0.51.732.002.102.50 * 100 successes out of 100 random trialsLDPC 100 iterations of BPDE performed with particle filter N = 100,000, 200 iterationshttp://radfordneal.github.io/LDPC-codes/http://pretty-good-codes.org/index.html
19. Coding strategy II: results II
20. Conclusion and future workAnalyzed DNA storage problem for simplified modelImplemented schemes to achieve close-to-optimum performance
21. Conclusion and future workAnalyzed DNA storage problem for simplified modelImplemented schemes to achieve close-to-optimum performanceAdding index to segmentsProtect index with BCH codeConverting binary data to DNA symbols {A, C, G, T}Constraint: Runs of 3 or more not allowed, e.g., AAAInteraction between error correction and constraint codingExploiting non-IID noise in reads
22. Thank You!
23.