/
Coding for DNA storage Shubham Chandak, Kedar Tatwawadi Coding for DNA storage Shubham Chandak, Kedar Tatwawadi

Coding for DNA storage Shubham Chandak, Kedar Tatwawadi - PowerPoint Presentation

amey
amey . @amey
Follow
66 views
Uploaded On 2023-07-09

Coding for DNA storage Shubham Chandak, Kedar Tatwawadi - PPT Presentation

EE 388 course project Outline DNA storage model Capacity computation Two coding strategies Experimental results Conclusion and future work DNA storage DNA as a storage medium High density 215 petabytesgram ID: 1007430

dna coverage bit storage coverage dna storage bit coding 100 coverageachieved strategy capacity error read number bits model information

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Coding for DNA storage Shubham Chandak, ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Coding for DNA storageShubham Chandak, Kedar TatwawadiEE 388 course project

2. OutlineDNA storage modelCapacity computationTwo coding strategiesExperimental resultsConclusion and future work

3. DNA storageDNA as a storage mediumHigh density: 215 petabytes/gram1High durability2Synthesis (writing) and sequencing (reading) error-prone and expensiveError correction needed for reliable data recovery1. Erlich, Y., & Zielinski, D. (2017). DNA Fountain enables a robust and efficient storage architecture. Science, 355(6328), 950-954.2. Grass, R. N., Heckel, R., Puddu, M., Paunescu, D., & Stark, W. J. (2015). Robust Chemical Preservation of Digital Information on DNA in Silica with Error‐Correcting Codes. Angewandte Chemie International Edition, 54(8), 2552-2555.

4. Storage model IEncodingPBinary dataPool of distinct short DNA sequences(~150 nucleotides)ReadingP“Reads” sampled with replacement+substitution errorsDecodeBinary data

5. Storage model IIFor this talk:Ignore DNA symbols and constraints – work with binary sequencesAssume that the index of each sequence is transmitted without error

6. Storage model IIEncodingP information bits  sequenceswith =256 bits each  “reads” sampled with replacement BSC() 123...P427...2P427...2Noisy reads

7. Storage model IIEncodingP information bits  sequenceswith =256 bits each  “reads” sampled with replacement BSC() 123...P427...2P427...2Noisy readsGoal: Analyze and achieve optimal tradeoff between : Error correction bits used per information bit : Coverage – bits read per information bit 

8. Capacity: error free readsLet Poisson model: Erasure channel over bit strings with erasure probability  

9. Capacity: error free readsLet Poisson model: Erasure channel over bit strings with erasure probability Rate =  

10. Capacity: BSC() Memoryless channel to get lower bound on capacity = number of times bit was read = number of times bit was read as 0 

11. Capacity: BSC() Memoryless channel to get lower bound on capacity = number of times bit was read = number of times bit was read as 0BMS channel with capacity For each , numerically find so that  

12. Capacity plot

13. Coding strategy I: RaptorQ + BCHRaptorQ1Rateless erasure codeFor K source packets: 99.9999% probability of recovery given K+2 packetsBCH2: Good minimum distance propertiesEncodingDecoding: Consensus -> BCH decoding -> Raptor decoding1. https://tools.ietf.org/html/rfc63302. Bose, R. C., & Ray-Chaudhuri, D. K. (1960). On a class of error correcting binary group codes. Information and control, 3(1), 68-79.SegmentRaptorQBCH nL bits

14. Coding strategy I: results Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75* 100 successes out of 100 random trialshttps://pypi.org/project/libraptorq/https://github.com/jkent/python-bchlib 

15. Coding strategy I: results Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75Optimal coverageAchieved coverage*0.12.642.890.22.152.350.31.912.060.51.651.75* 100 successes out of 100 random trialshttps://pypi.org/project/libraptorq/https://github.com/jkent/python-bchlibOptimal coverageAchieved coverage*0.12.796.700.22.274.300.32.013.300.51.732.50Optimal coverageAchieved coverage*0.12.796.700.22.274.300.32.013.300.51.732.50  

16. Limitations of small block length codesPlot generated using code at https://github.com/yp-mit/spectre

17. Coding strategy II: LDPCLLR = = number of times bit was read = number of times bit was read as 0Regular LDPC Segmentn()L bits LDPCnL bits

18. Coding strategy II: results ILDPC (l,k)Optimal coverageDE threshold coverageAchieved coverage* (LDPC)Achieved coverage* (Strategy I)(3,33)0.12.793.023.256.70(3,18)0.22.272.502.704.30(3,13)0.32.012.262.453.30(3,9)0.51.732.002.102.50LDPC (l,k)Optimal coverageDE threshold coverageAchieved coverage* (LDPC)Achieved coverage* (Strategy I)(3,33)0.12.793.023.256.70(3,18)0.22.272.502.704.30(3,13)0.32.012.262.453.30(3,9)0.51.732.002.102.50 * 100 successes out of 100 random trialsLDPC 100 iterations of BPDE performed with particle filter N = 100,000, 200 iterationshttp://radfordneal.github.io/LDPC-codes/http://pretty-good-codes.org/index.html

19. Coding strategy II: results II

20. Conclusion and future workAnalyzed DNA storage problem for simplified modelImplemented schemes to achieve close-to-optimum performance

21. Conclusion and future workAnalyzed DNA storage problem for simplified modelImplemented schemes to achieve close-to-optimum performanceAdding index to segmentsProtect index with BCH codeConverting binary data to DNA symbols {A, C, G, T}Constraint: Runs of 3 or more not allowed, e.g., AAAInteraction between error correction and constraint codingExploiting non-IID noise in reads

22. Thank You!

23.