/
Challenges and Solutions Challenges and Solutions

Challenges and Solutions - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
379 views
Uploaded On 2018-03-15

Challenges and Solutions - PPT Presentation

for Managing the Complexities of a Genomic Core Facility GNomEx Tony Di Sera Passionate about Software fascinated by Molecular Biology Over 20 years in the software field University of Utah and ID: 652413

analysis data big gnomex data analysis gnomex big disk pipeline utah fast challenges experiment hci delivery visualization researcher hg19

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Challenges and Solutions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Challenges and Solutions for Managing the Complexitiesof a Genomic Core Facility

GNomExSlide2

Tony Di SeraPassionate about Software, fascinated by Molecular Biology.Over 20 years in the software fieldSlide3

University of Utah and The Hunstman Cancer InstituteSlide4

Our Job is…To deliver clean, beautiful data to the Researcher as quickly as possible…..Slide5

GNomEx at a GlanceLIMs

Order TrackingWorkflowEmail NotificationResults Delivery

Data RepositoryAnalysis Project Center

Configurable AnnotationsPrivate to Public Visibility

Submit

Experiment

Workflow

Results

Delivery

Automated

Billing

Analysis

VisualizationSlide6

GNomEx OverviewData FlowExperiments

Analysis

VisualizationSlide7

GNomEx ExperimentsSlide8

Experiments cont…Slide9

GNomEx AnalysisSlide10

Visualizing your DataSlide11

Visualizing your DataSlide12

Three ChallengesSlide13

Challenge #1Complicated Data

BIG Data

Sensitive DataSlide14

Big DataIf you don’t have slack in the system, your throughput drops to a crawl.Slide15

Big DataSlide16
Slide17

If you store your Data In-house….Hire a talented, fearless, focused Sys Admin

xkdcSlide18

Transferring BIG Data

- FDT by CalTech

Pool of directly mapped buffers

Data Transfer Socket

Connection

&

Control

Management

Pool of directly mapped buffers

Restore Multiple

Files Concurrently

Independent

Threads per

DeviceSlide19

Big Data, Big ProcessingSlide20

Illumina

Data Pipeline

GNomEx

Barcode Tags

Experiment Info

Run Info

Experiment Folders

ImagesSlide21

Sequencing AnalysisSlide22

Automated Analysis Pipeline# run novoalign with default parameters

#e david.nix@hci.utah.edu#a A1325@

align -g hg19 -i *.txt.gz

#map, recalibrate and call SNP/INDEL w/ GATK

@

snpindel

-

g

hg19 -i A*.

txt.gz

#

map,

recalibrate, call

SNP/

INDEL, annotate

@

annot -g hg19 -

i control_A*.gz case_B

*.gz -

vaast -annovar

Simplifies running analyses on cluster

Fully versioned

CustomizableSlide23

Complicated Data

The Data Model

The File SystemSlide24

Sensitive DataSlide25

Who can Access the Data?

Collaborators

Visibility

Owner

Lab

Members

Institution

PublicSlide26

Three ChallengesSlide27

Challenge #2The DemandMore Researchers More Experiments

More Samples per LanePush for Faster Results

Slower

Response

TimesSlide28

It is a shameTo ANNOY the user …….in the first 20 seconds Slide29

Addressing the BottlenecksSlide30

GNomEx

Image

Processing

Analysis

How many servers are we talking about?

Tomcat

FDT

Database

Server

File Server

Data

Pipeline

Analysis

Fast

Disk

High

Performance

Clusters

Slow Disk

The

Repository

Fast

Disk

Fast

Disk

Fast

DiskSlide31

Biggest Bottleneck is….Getting the features implemented and bugs fixed in

GNomEx

.Slide32

Three ChallengesSlide33

Different Users, Different Perspectives3 Core FacilitiesBioinformaticsResearchers at your Institution

Outside ResearchersAccountingSlide34

Three Kinds of Users

Submit

Experiment

Workflow

Results

Delivery

Automated

Billing

Analysis

Visualization

Submit

Annotate

Preapprove

Authorize

Register

Track

Record

Data Pipeline

Review

Split

Invoice

Analysis Pipeline

Upload

Annotate

Organize

Link

Organize

Browse

Browse

Download

Pay

Researcher

Core

Bioinformatics

DownloadSlide35

We Don’t Always Speak the Same Language

JDK

SQL

P-Value

FDR

Cluster Nodes

Hibernate

Eclipse

Ant

Case/Control

NICs

NFS

REFS

Image

Copy

Cluster density

Molarity

Adapters

5’

vs

3’

CpG

Islands

Optical Error

Linux

Kernal

Interface

Inheritance

Spike inSlide36

But We Share the Same GoalDeliver clean, beautiful data to the Researcher as quickly as possible…..Slide37

Agile Development Reducing Risk by shortening the Delivery WindowSlide38

Agile ManifestoValue…More Than…

Individuals and Interactions Processes and Tools

Working software Comprehensive Documentation

Customer Collaboration

Contract Negotiation

Responding to Change

Following

a PlanSlide39

Iteration IncrementingIteratingSlide40

Our Scrum BoardSlide41

In Summary

Housing Big Data requires$ and expertise System performance

Is multi-facetedWork towards Shared Understanding.

Build a team and process that embraces change.Slide42

PlansSlide43

Special Thanks Slide44

Parting ThoughtsPrivileged to work in this fieldWorking with bright, interesting, fun, and nice peopleIn an area exploding with new advancementsThat will ultimately lead to

important scientific discoverieshttp://www.sourceforge.net/projects/gnomexhttp://hci-scrum.hci.utah.edu/gnomexdoc

tony.disera@hci.utah.edu