/
The St. Jude Children’s Research Hospital/Washington University Pediatric Cancer Genome The St. Jude Children’s Research Hospital/Washington University Pediatric Cancer Genome

The St. Jude Children’s Research Hospital/Washington University Pediatric Cancer Genome - PowerPoint Presentation

Hulksmash
Hulksmash . @Hulksmash
Follow
342 views
Uploaded On 2022-08-01

The St. Jude Children’s Research Hospital/Washington University Pediatric Cancer Genome - PPT Presentation

Clayton W Naeve PhD Endowed Chair in Bioinformatics SVP amp CIO St Jude Childrens Research Hospital The Data Deluge St Jude Data The First 50 Years 48 Years 800 TB 2 12 Years 1000 TB ID: 931866

cancer data project genome data cancer genome project research pcgp jude tumor pediatric hospital ftes university washington gene childhood

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The St. Jude Children’s Research Hospi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The St. Jude Children’s Research Hospital/Washington University Pediatric Cancer Genome Project: A CIO’s Perspective

Clayton W.

Naeve

, Ph.D.

Endowed Chair in Bioinformatics

SVP & CIO

St. Jude Children’s Research Hospital

Slide2

The Data Deluge

St. Jude Data: The First 50 Years

48 Years (800 TB)

2 1/2 Years (1000 TB

)

PCGP Data:

917 TB

,

148 million

files

Slide3

Launched Feb. 2010St. Jude/

WashU

collaboration

WGS on 600 patients (leukemia, brain tumors, solid tumors)Matched germline and tumor samples1200 genomes (~90 billion bp

/genome) in 36 months~2 Petabytes of data

The PCGP Project

St.

Jude/

WashU

Pediatric Cancer Genome Project

Slide4

PCGP Challenges

Moving

data

Data workflow

Data

analysis

Computational horsepower

Data storage

Data sharing

Challenges to Information Sciences

Slide5

Moving Data

Multi-Terabyte data transit across networks is

not

trivial

DNA

sequence

raw data

reads,

contig

assembly

,

alignment to reference, variants, etc. shipped

to SJCRH as binary BAM files: ~100

GB

24

hrs

to infinity to send via commodity internetInternet2 connectivity (10

Gbs via MRC) to transfer files from

WashU to SJCRHEvaluated 5 different fast data transfer algorithms….selected FDT (developed at CalTech to transfer LHC data at Cern)Developed a pipeline to facilitate transferToday: ~5 hour transit time/file

Slide6

Moving Data

Slide7

Moving Data

Slide8

Began work on PCGP 9 months prior to launchDeveloped a LIMS system for Validation Lab

Developed a PCGP SharePoint site to facilitate collaboration internally

Developed a bioinformatics workflow engine: PALLAS

Security managementData provenance managementIntermediate and final result trackingFlexible workflow design

Rapid new analytical algorithms/tools configurationWeb-based LSF job submission and monitoring Support a range of protocols to connect to other web application systems, databases, file systems, and etc.Integrated with applications, such as SRM, Genome Browser and etc.Data integration with tissue sample, clinical, and research data

Vision: parse each algorithm to the appropriate computing environment

Data Workflow

Slide9

BAM Quality Assurance:

Tumor

Purity Algorithm (

SJCRH)Not Disease/Genomic Swap (SNP checks)Xenograft

Filter (Remove Contaminating Mouse Reads)Gene Exon and Genome Coverage algorithms (Gang Wu)BAM file work:

Bam file extraction and visualizationSamtools and C++/bioperl api’s

BambinoIGV Single Nucleotide Variation

:Freebayes

In-house PCGPCopy Number Variation:Stan’s Copy Number Algorithm

Regression Tree Algorithm

Structural Variation:One End Anchored

Inference:CRESTViralTopology

Fusion Detection:

In-house (Michael Rusch)

RNAseq:RNAseq mysql/Cufflinks  

ChipSeq:

ChiPseq mysql/in house (John Obenauer)viralScanin-house

(McGoldrick)

Integration:GFF intersectGff2fastagffBuildersCancer warehouse Visualization:Circos makerBED GFF Tracks maker

Jinghui

Zhang and

CompBio

Team

Data Analyses

Slide10

Computational Horsepower (HPCF)

IBM

BladeCenter

(810 cores/3TB RAM)

IBM iDataplex (1,008 cores/4TB RAM) – April 2010

SGI Altix UV1000 (640 cores/5TB RAM/60TB storage using Lustre v2.2) – December 2011IBM SoNAS

(780 TB) – March 2011Data Transfer Node (10 Gbps I2 connection) – April 2011Internal Data Transfer Node (10 Gbps

x2) – June 2011QDR Infiniband

(40 Gbps for all HPC equipment) – January 2012Software (Platform LSF, Intel Parallel Studio)Total: 2,366 cores, 13TB RAM (estimated 11.6 Tflops)

2010: 365,000

cpu hours2011: 712,000

cpu hours

Slide11

IBM SoNAS

(780

TB) – March 2011Scales to 21PB; 1 billion files/filesystem; 7,200 drives

Current total on campus: 3.8 Petabytes (3,800,000 Gb)PCGP uses 917 TB (<- +500TB on tape), 148 million data filesIBM TSM systems for backup/archive (Tiered)240 SAS (15k) drives480 SAS-NL (7.2k) drives

Current 7,900 tape capacity, up to 1.6TB/tape; 12.6+ PB total734 TB usable under one file systemHigh

speed/low latency backend interconnect (QDR InfiniBand 20Gb per port and 100ns

latency)

Data Storage

Slide12

Gene sequencing project identifies potential drug targets in common childhood brain tumor

Nature

June 20, 2012

Researchers studying the genetic roots of the most common malignant childhood brain tumor have discovered missteps in three of the four subtypes of the cancer that involve genes already targeted for drug development. The most significant gene alterations are linked to subtypes of medulloblastoma

that currently have the best and worst prognosis. They were among 41 genes associated for the first time to medulloblastoma by the St. Jude Children's Research Hospital – Washington University Pediatric Cancer Genome Project.World's largest release of comprehensive human cancer genome data helps researchers everywhere speed discoveriesNature Genetics

May 29, 2012To speed progress against cancer and other diseases, the St. Jude Children's Research Hospital – Washington University Pediatric Cancer Genome Project today announced the largest-ever release of comprehensive human cancer genome data for free access by the global scientific community. The amount of information released more than doubles the volume of high-coverage, whole genome data currently available from all human genome sources combined. This information is valuable not just to cancer researchers, but also to scientists studying almost any disease.

Genome sequencing initiative links altered gene to age-related neuroblastoma riskJournal of the American Medical Association

March 13, 2012St. Jude Children’s Research Hospital – Washington University Pediatric Cancer Genome Project and Memorial Sloan-Kettering Cancer Center discover the first gene alteration associated with patient age and

neuroblastoma outcome. Researchers have identified the first gene mutation associated with a chronic and often fatal form of neuroblastoma that typically strikes adolescents and young adults. The finding provides the first clue about the genetic basis of the long-recognized but poorly understood link between treatment outcome and age at diagnosis.

Cancer sequencing initiative discovers mutations tied to aggressive childhood brain tumors

Nature GeneticsJanuary 29, 2012Findings from the St. Jude Children's Research Hospital – Washington University Pediatric Cancer Genome Project (PCGP) offer important insight into a poorly understood tumor that kills more than 90 percent of patients within two years. The tumor, diffuse intrinsic

pontine glioma (DIPG), is found almost exclusively in children and accounts for 10 to 15 percent of pediatric tumors of the brain and central nervous system.

Cancer sequencing project identifies potential approaches to combat aggressive leukemia

Nature January 11, 2012Researchers with the St. Jude Children's Research Hospital - Washington University Pediatric Cancer Genome Project (PCGP) have discovered that a subtype of leukemia characterized by a poor prognosis is fueled by mutations in pathways distinctly different from a seemingly similar leukemia associated with a much better outcome. The work provides the first details of the genetic alterations fueling a subtype of acute lymphoblastic leukemia (ALL) known as early T-cell precursor ALL (ETP-ALL). The results suggest ETP-ALL has more in common with acute myeloid leukemia (AML) than with other subtypes of ALL.

Gene identified as a new target for treatment of aggressive childhood eye tumor

Nature

January 11, 2012New findings from the St. Jude Children's Research Hospital – Washington University Pediatric Cancer Genome Project (PCGP) have helped identify the mechanism that makes the childhood eye tumor retinoblastoma so aggressive. The discovery explains why the tumor develops so rapidly while other cancers can take years or even decades to form. The finding also led investigators to a new treatment target and possible therapy for the rare childhood tumor of the retina, the light-sensing tissue at the back of the eye.

>356 Patients/712 Complete Genomes

Progress

Slide13

http://

www.pediatriccancergenomeproject.org

Data Sharing

http

://explore.pediatriccancergenomeproject.org

Slide14

Data Sharing

Data Integration is critical: platform data (expression, WGS, methylation, etc.) and processed data (“genomics” data with phenotype data (clinical care, clinical research))

Slide15

Total=>150 FTEs with “research informatics” skills

Key: Staff

19 Academic

Departments

2 PhD

2 Support

Information Sciences

PCGP 5

PhD

1 Dev.

8-10

Faculty

50-60 Support

Staff

10

PhD

Bioinformatics

2 developers

Enterprise

Informatics

Clinical

Informatics

127 FTEs

81 FTEs

Research

Informatics

56 FTEs

Offshore

Developers

15 FTEs

HPC

Shared Resources

Computational Biology

Slide16

Project total cost: $65M (11 Illuminas

@

WashU

and 4 @ SJCRH, sequencing costs, staffing, IT, etc.)New “IT” staff @ SJCRH: 10 FTEs in CompBiol, 0 FTEs in ISCapital IT investment: ~$7.2 M at SJCRH, $9M at

WashUIT is ~25% of overall project costs (doesn’t include costs of other participating SJ FTEs)

$

ummary

Slide17

Information Sciences PCGP Team

Key: Staff

Ashish

Pagare

David Zhao

Dan Alford

Stephen Espy

Kiran Chand Bobba

Scott MaloneDr. Antonio Ferreira

Bill PappasJames McMurryDr. Jianmin Wang

Dr. John ObenauerJared BecksfortPankaj GuptaDr.

Suraj Mukatira

Simon Hagstrom

Sundeep ShakyaAsmita VaidyaSwetha MandavaBhagavathy KrishnaManohar Gorthi

Sandhya Rani KolliSivaram Chintalapudi

Roshan ShresthaIrina McGuirePJ StevensThanh LeJohn PenrodPat Eddy

Dr. Dan McGoldrick

Slide18

Questions?

Slide19

Data Workflow

cluster

GPU

Contig

assembly

SV

CNV

INDELS

SNV

CIRCOS

PALLAS

large memory