/
Predicting the Computational Time for Bionano Genomics Optical Map Assembly Predicting the Computational Time for Bionano Genomics Optical Map Assembly

Predicting the Computational Time for Bionano Genomics Optical Map Assembly - PowerPoint Presentation

willow
willow . @willow
Follow
65 views
Uploaded On 2024-01-03

Predicting the Computational Time for Bionano Genomics Optical Map Assembly - PPT Presentation

Bionano Genomics San Diego California United States of America Abstract Pat Lynch PhD Background Generating highquality finished genomes replete with accurate identification of structural variation and high completion minimal gaps remains challenging using short read sequencing technologi ID: 1038125

model genome data parameters genome model parameters data computational bionano assembly rate dna long input number pipeline structural hours

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Predicting the Computational Time for Bi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Predicting the Computational Time for Bionano Genomics Optical Map AssemblyBionano Genomics, San Diego, California, United States of AmericaAbstractPat Lynch, PhDBackgroundGenerating high-quality finished genomes replete with accurate identification of structural variation and high completion (minimal gaps) remains challenging using short read sequencing technologies alone. The Saphyr® Genome Imaging system provides direct visualization of long DNA molecules in their native state, bypassing the statistical inference needed to align paired-end reads with an uncertain insert size distribution. These long labeled molecules are de novo assembled into physical maps spanning the entire diploid genome. The resulting provides the ability to correctly position and orient sequence contigs into chromosome-scale scaffolds and detect a large range of homozygous and heterozygous structural variation with very high efficiency.ConclusionsReferenceMethodsResultsThe Bionano Genomics Saphyr® Genome Imaging System combines NanoChannel arrays with optical mapping to image extremely long, high molecular weight DNA that has been linearized. This allows for the detection of structural variants in the genome that are from 500 base pairs up to megabases in length, exceeding the detection range of any other technology. As part of the complete workflow, molecules imaged by the Saphyr instrument are computationally assembled into genome maps by the Bionano Solve pipeline. This process is computationally intensive. In order to better predict the time needed to complete the genome assembly, we used JMP to come up with a model that would predict how long it would take. Input parameters initially included all data quality metrics available, then were reduced to a smaller number of variables that were shown to have a statistical impact. An initial data set was used to establish the model. The model was then used on subsequent data sets to predict the resources needed for the genome assembly. This poster presents the methods used to determine the prediction model, then assesses the efficacy of the model.After the second screening, the two remaining parameters with the highest P-value were removed. The analysis was run a third time and the three remaining parameters (DNA (Gbp), FP Rate, and Map Rate) all had P-values less than 0.1. After the initial screening, the two parameters that had the highest P-value (Effective Coverage and FN Rate) were removed and the analysis was rerun again with the remaining 5 parameters. To determine the effect of input parameters on the number of computational hours needed to complete the genome assembly, we used the Fit Model function. 16 data sets were used to generate the model. 7 input parameters were used in the initial screening. The Bionano Solve pipeline takes digitized images of long DNA molecules and de novo assembles them into genome maps. The pipeline takes a large amount of computational resources to create the genome maps, and the level of resources is variable depending on the quality and quantity of input data. This analysis shows that is it possible to create a model of how many computational hours an assembly will take and use that model in a predictive fashion. Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34Hastie, A.R., et al. Rapid genome mapping in NanoChannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome. PLoS ONE (2013); 8(2): e55864.For Research Use Only. Not for use in diagnostic procedures.Using the prediction expression to the left, the predicted number of computational hours was determined for 47 data sets. Those data sets were subsequently run through the Bionano Solve pipeline and the actual number of computational hours was determined. The residuals are plotted to the left. Nearly half (22/47) of the predictions are within 20% of the actual values. There is a bias in which the predicted value is, on average, higher than the actual value.Prediction Expression:-4480.14+6.58*(DNA(Gbp))+36.39*(Map Rate)+258*(FP Rate)