for Managing the Complexities of a Genomic Core Facility GNomEx Tony Di Sera Passionate about Software fascinated by Molecular Biology Over 20 years in the software field University of Utah and ID: 652413
Download Presentation The PPT/PDF document "Challenges and Solutions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Challenges and Solutions for Managing the Complexitiesof a Genomic Core Facility
GNomExSlide2
Tony Di SeraPassionate about Software, fascinated by Molecular Biology.Over 20 years in the software fieldSlide3
University of Utah and The Hunstman Cancer InstituteSlide4
Our Job is…To deliver clean, beautiful data to the Researcher as quickly as possible…..Slide5
GNomEx at a GlanceLIMs
Order TrackingWorkflowEmail NotificationResults Delivery
Data RepositoryAnalysis Project Center
Configurable AnnotationsPrivate to Public Visibility
Submit
Experiment
Workflow
Results
Delivery
Automated
Billing
Analysis
VisualizationSlide6
GNomEx OverviewData FlowExperiments
Analysis
VisualizationSlide7
GNomEx ExperimentsSlide8
Experiments cont…Slide9
GNomEx AnalysisSlide10
Visualizing your DataSlide11
Visualizing your DataSlide12
Three ChallengesSlide13
Challenge #1Complicated Data
BIG Data
Sensitive DataSlide14
Big DataIf you don’t have slack in the system, your throughput drops to a crawl.Slide15
Big DataSlide16Slide17
If you store your Data In-house….Hire a talented, fearless, focused Sys Admin
xkdcSlide18
Transferring BIG Data
- FDT by CalTech
Pool of directly mapped buffers
Data Transfer Socket
Connection
&
Control
Management
Pool of directly mapped buffers
Restore Multiple
Files Concurrently
Independent
Threads per
DeviceSlide19
Big Data, Big ProcessingSlide20
Illumina
Data Pipeline
GNomEx
Barcode Tags
Experiment Info
Run Info
Experiment Folders
ImagesSlide21
Sequencing AnalysisSlide22
Automated Analysis Pipeline# run novoalign with default parameters
#e david.nix@hci.utah.edu#a A1325@
align -g hg19 -i *.txt.gz
#map, recalibrate and call SNP/INDEL w/ GATK
@
snpindel
-
g
hg19 -i A*.
txt.gz
#
map,
recalibrate, call
SNP/
INDEL, annotate
@
annot -g hg19 -
i control_A*.gz case_B
*.gz -
vaast -annovar
Simplifies running analyses on cluster
Fully versioned
CustomizableSlide23
Complicated Data
The Data Model
The File SystemSlide24
Sensitive DataSlide25
Who can Access the Data?
Collaborators
Visibility
Owner
Lab
Members
Institution
PublicSlide26
Three ChallengesSlide27
Challenge #2The DemandMore Researchers More Experiments
More Samples per LanePush for Faster Results
Slower
Response
TimesSlide28
It is a shameTo ANNOY the user …….in the first 20 seconds Slide29
Addressing the BottlenecksSlide30
GNomEx
Image
Processing
Analysis
How many servers are we talking about?
Tomcat
FDT
Database
Server
File Server
Data
Pipeline
Analysis
Fast
Disk
High
Performance
Clusters
Slow Disk
The
Repository
Fast
Disk
Fast
Disk
Fast
DiskSlide31
Biggest Bottleneck is….Getting the features implemented and bugs fixed in
GNomEx
.Slide32
Three ChallengesSlide33
Different Users, Different Perspectives3 Core FacilitiesBioinformaticsResearchers at your Institution
Outside ResearchersAccountingSlide34
Three Kinds of Users
Submit
Experiment
Workflow
Results
Delivery
Automated
Billing
Analysis
Visualization
Submit
Annotate
Preapprove
Authorize
Register
Track
Record
Data Pipeline
Review
Split
Invoice
Analysis Pipeline
Upload
Annotate
Organize
Link
Organize
Browse
Browse
Download
Pay
Researcher
Core
Bioinformatics
DownloadSlide35
We Don’t Always Speak the Same Language
JDK
SQL
P-Value
FDR
Cluster Nodes
Hibernate
Eclipse
Ant
Case/Control
NICs
NFS
REFS
Image
Copy
Cluster density
Molarity
Adapters
5’
vs
3’
CpG
Islands
Optical Error
Linux
Kernal
Interface
Inheritance
Spike inSlide36
But We Share the Same GoalDeliver clean, beautiful data to the Researcher as quickly as possible…..Slide37
Agile Development Reducing Risk by shortening the Delivery WindowSlide38
Agile ManifestoValue…More Than…
Individuals and Interactions Processes and Tools
Working software Comprehensive Documentation
Customer Collaboration
Contract Negotiation
Responding to Change
Following
a PlanSlide39
Iteration IncrementingIteratingSlide40
Our Scrum BoardSlide41
In Summary
Housing Big Data requires$ and expertise System performance
Is multi-facetedWork towards Shared Understanding.
Build a team and process that embraces change.Slide42
PlansSlide43
Special Thanks Slide44
Parting ThoughtsPrivileged to work in this fieldWorking with bright, interesting, fun, and nice peopleIn an area exploding with new advancementsThat will ultimately lead to
important scientific discoverieshttp://www.sourceforge.net/projects/gnomexhttp://hci-scrum.hci.utah.edu/gnomexdoc
tony.disera@hci.utah.edu