/
Discovering Combinatorial Biomarkers Discovering Combinatorial Biomarkers

Discovering Combinatorial Biomarkers - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
390 views
Uploaded On 2017-10-07

Discovering Combinatorial Biomarkers - PPT Presentation

Vipin Kumar kumarcsumnedu httpwwwcsumnedukumar Department of Computer Science and Engineering ICCABS Feb 2012   Highthroughput technologies DNA Methylation Proteins Gene Expression amp ID: 593873

kumar data steinbach patterns data kumar patterns steinbach high 2010 cancer fang 2012 snp order size gupta error mining analysis gene tolerant

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Discovering Combinatorial Biomarkers" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Discovering Combinatorial Biomarkers

Vipin Kumarkumar@cs.umn.eduhttp://www.cs.umn.edu/~kumar

Department

of Computer Science and Engineering

ICCABS, Feb 2012

 Slide2

High-throughput technologies

DNA

Methylation

Proteins

Gene Expression &

non-coding RNA

SNP

Structural Variation

Metabolites

Clinical Data e.g. brain imaging

Adopted from E.

Schadt

2

Data

mining offers potential solution for analysis of

these

large-scale

datasets

Novel associations between genotypes and phenotypes

Biomarker discovery for complex diseases

Personalized

Medicine – Automated analysis of patients history for customized treatmentSlide3

Biomarker Discovery and its Impact

Biomarkers:Genes:BRCA1 (breast cancer)Protein variantsIVS5-13insC (type 2 diabetes)Pathways/networks:P53 (cancers)Clinical Impact:Diagnosis

PrognosisTreatment

fMRI

Schizophrenia

vs controls

Lim et al.

3

Miki et al.

1994

Chiefari

et al.

2

011

Oren et al.

2010Slide4

SNP as an illustration

4NHGRI GWA Catalog www.genome.gov/GWAStudiesPublished Genome-wide Associations through

06/2010 1,904 published GWA at p≤5*10

-8 for 165 traitsSlide5

5

Published Genome-wide Associations through

06/2011

1,449

published GWA at p≤5*10

-8

for

237

traits

SNP as an illustration

NHGRI GWA Catalog

www.genome.gov/GWAStudies

50% increase in one yearSlide6

High coverage but low odds ratio (1.2)

High odds ratio (15.9) but low coverage (7%)

No significant associations

6

Challenge: Limitations of Single-locus Association Test

Many other studiesSlide7

7

Given a SNP data set of Myeloma patients, find SNPs that are associated with short vs. long survival.3404 SNPs selected from various regions of the chromosome70 cases (Patients survived shorter than 1 year)73 Controls (Patients survived longer than 3 years)

cases

Controls

3404 SNPs

A Example where Single-locus Test Led to No Significant Associations

Top ranked SNP:

-log

10

P-value = 3.8; Odds Ratio = 3.7

Van Ness et al

2008

Myeloma SNP data has

signal

 the need of discovering combinations of SNPs

Myeloma Survival DataSlide8

8

Single-locus Tests Ignore Genetic InteractionRipke et al. 2011

Costanzo et al. 2010

Scholl et al. 2009Ruzankina et al. 2009Kamath, 2003

Extensively observed in model organisms,

e.g. yeast, C.

elegans

,

fly

.

Non-additive effect “Genetic Interaction”Slide9

A synthetic pattern

The focus of this talk: Higher-order Combinatorial Biomarker9

......

Complex biological system

Complex human diseases

Higher-order genetic buffering

Disease

Control

Triple mutations only exist in disease subjectsSlide10

Disqualified

Prune all the

supersets

10

+

+

Millions of user,

thousands

of items

Discovering High-order Combinatorial Biomarkers

Challenge I:

Computational

Efficiency

Given

n

features,

there are

2

n

candidates

!

How

to effectively handle the combinatorial search space?

Brute-force search e.g. MDR can only handle 10~100 SNPs. [Rita et al. 2001]

The Apriori framework for efficient search of exponential space

[Agrawal et al. 1994]

Support based pruningSlide11

Discovering High-order Combinatorial Biomarkers

Challenge I: Computational Efficiency11[Fang et al. TKDE 2010]A novel anti-monotonic objective function designed for

mining low-support discriminative patterns from dense

and high-dimensional data

Traditional

Apriori

-based pattern mining techniques

Designed for sparse data

Unique challenges of genomic datasets

High

density

A SNP dataset has a density of 33.33%

Three binary columns per SNP

the three genotypes

High dimensionality

Makes the search more challenging Disease heterogeneity Each combination supported by a small fraction of subjectsSlide12

Targeting

patterns with better association than their subsets reduces # of hypothesis tests

Subsets having higher association

Subsets having lower association

Computational challenges

can be addressed by

Better algorithm

design,

e.g.

Apriori

-based

High-performance

computing

Statistical

challenges

call for additional

efforts

Limited

sample

size

Huge

number of

hypothesis

tests

12[Fang, Haznadar, Wang, Yu, Steinbach, Church, Oetting, VanNess, Kumar, PLoS ONE, 2012]

Discovering High-order Combinatorial Biomarkers

Challenge II: Statistical Power

Many combinations are trivial extensions of their subsets

Myeloma Survival Data

Kidney Rejection Data

Lung Cancer DataSlide13

13

High-order Combinatorial Biomarkers: an example[Fang, Haznadar, Wang, Yu, Steinbach, Church, Oettng, Van Ness and Kumar, PLoS ONE, 2012][Fang, Pandey, Wang, Gupta, Steinbach and Kumar, IEEE TKDE, 2012]

Patients

Control

Best size-1

Best size-2

Best size-3

Best size-4

Size-5

Data from Church et al. 2010

www.ingenuity.com

Jump

All heavy smokers

Lung Cancer Data

The five genes are functionally relatedSlide14

14

Insights on High-order Functional InteractionsPatterns with positive Jump are functionally more coherent[Fang, Haznadar, Wang, Yu, Steinbach, Church, Oettng, Van Ness and Kumar, PLoS ONE, 2012]

Lungcancer

Control

Best size-1

Best size-2

Best size-3

Best size-4

Size-5

Lung cancer dataset

Jump

Kidney Rejection Data

Lung Cancer Data

CombinedSlide15

15

High-order Combinations Discovered from Different Types of Data[Fang, Haznadar, Wang, Yu, Steinbach, Church, Oettng, Van Ness and Kumar, PLoS ONE, 2012][Fang, Pandey, Wang, Gupta, Steinbach and Kumar, IEEE TKDE, 2012]

mRNA

: Breast Cancer

Metabolites

: COPD

SNP

: acute kidney rejection

Data from Oetting et al. 2008

Data from Vijver et al. 2002

Data from Wendt et al. 2010

Rejection

No-rejection

Survived (5-year)

Control

AE COPD

Stable COPD

The proposed framework is general to handle different types of dataSlide16

16

Biomarker Discovery using Error-tolerant Patterns01

1

1

1

0

1

1

1

1

0

1

1

1

1

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

1

0

0

0

0

0

1

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

X

True

patterns

are

fragmented due to

noise and variability

Possible solution: Error-tolerant

patterns

These patterns differ in the way errors/noise in the data are

tolerated

[

Yang et al 2001]; [Pei et al 2001]; [

Seppanen

et al 2004]; [Liu et al

2006

]; [Cheng et al 2006]; [Gupta et al., KDD 2008]; [

Poernomo

et al

2009

]

See

Gupta et al KDD 2008 for a

surveySlide17

17

Greater fraction of error-tolerant patterns enrich at least one gene set (higher precision)Greater fraction of gene sets are enriched by at least one error-tolerant pattern (higher recall)Four Breast cancer gene-expression data sets are used for experiments:

GSE7390

GSE6532

GSE3494

GSE1456

+

+

+

158

cases

Cases

: patients with metastasis within 5 years of follow-up;

Controls

: patients with no metastasis within 8 years of follow-up

Discriminative Error-tolerant and traditional association patterns case/control are discovered and evaluated by enrichment analysis using

MSigDB

gene

sets

Error-tolerant patterns

Traditional patterns

Error-tolerant patterns

Traditional patterns

433

controls

Error-tolerant pattern vs. Traditional association patterns

Gupta et al.

BICoB

2010;

Gupta et al.

BMC Bioinformatics 2011Slide18

18

Differential Expression (DE)

Traditional analysis targets

changes of expression level

Differential Coexpression (DC)

Changes of the coherence

of gene expression

Combinatorial Search

Genetic Heterogeneity

calls for subspace analysis

[Silva et al., 1995], [Li, 2002], [

Kostka

&

Spang

, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc.

[

Eisen

et al. 1999] [

Golub

et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc.

Differential Coexpression PatternsSlide19

Subspace Differential Coexpression Analysis

≈ 60%

≈ 10%

Enriched with the TNF-

α

/

NF

k

B

signaling pathway (

6/10 overlap with the pathway, corrected p value: 1.4*10

-3

)

Suggests that the

dysregulation

of TNF-

α

/

NF

k

B pathway may be related to lung cancerSelected for highlight talk, RECOMB SB 2010 Best Network Model award, Sage Congress, 2010

[Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]

Three lung cancer datasets [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007]Slide20

20

Combinatorial Biomarkers: Summary Higher-order combinations Important for understanding complex human diseases A novel framework

Improved computational efficiency

Enhanced statistical powerNaturally handles disease heterogeneityError-tolerance

Different types of differentiation: coexpression

General to handle different types of data

SNP

Gene expression

Metabolomic

data

Brian imaging data (e.g. fMRI)Slide21

21

ReferencesG. Fang, R. Kuang, G. Pandey, M. Steinbach, C.L. Myers, and V. Kumar. Subspace differential coexpression analysis: problem definition and a general approach. Pacific Symposium on Biocomputing, 15:145-156, 2010.G. Fang, G.

Pandey, W. Wang, M. Gupta, M. Steinbach, and V. Kumar. Mining low-support discriminative patterns from dense and high-dimensional data. IEEE TKDE, 24(2):279-294, 2012

.G. Fang, Majda Haznadar

, Wen Wang,

Haoyu

Yu, Michael Steinbach, Tim Church, William

Oetting

, Brian Van Ness, and Vipin Kumar. High-order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions.

PLoS

ONE, page in press, 2012

.

R. Gupta, N.

Rao

, and V. Kumar. Discovery of

errortolerant

biclusters

from noisy gene expression data. In BMC Bioinformatics, 12(S12):S1, 2011.

R. Gupta, Smita Agrawal,

Navneet Rao, Ze Tian

, Rui Kuang, Vipin Kumar, "Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining", In

Proc. of the International Conference on Bioinformatics and Computational Biology (BICoB), 2010Gowtham Atluri, Rohit Gupta, Gang Fang, Gaurav Pandey, Michael Steinbach and Vipin Kumar, Association Analysis Techniques for Bioinformatics Problems, Proceedings of the 1st International Conference on Bioinformatics and Computational Biology (BICoB), pp 1-13, 2009.S. Landman Vipin Kumar Michael Steinbach, Haoyu Yu. Identification of Co-occurring Insertions in Cancer Genomes Using Association Analysis. International Journal of Data Mining and Bioinformatics, in press, 2012

.M. Steinbach, H. Yu, G. Fang, and V. Kumar. Using constraints to generate and explore higher order discriminative patterns. Advances in Knowledge Discovery and Data Mining, pages 338-350

, 2011.S. Dey, Gowtham Atluri, Michael Steinbach, Angus MacDonald, Kelvin Lim, and Vipin Kumar. A pattern mining based integrative framework for biomarker discovery. Tech report, Department of Computer Science, University of Minnesota, (002), 2012.G. Pandey, C. Myers, and V. Kumar. Incorporating functional inter-relationships into protein function prediction algorithms. BMC bioinformatics, 10(1):142, 2009. G. Pandey, B. Zhang, A.N. Chang, C.L. Myers, J. Zhu, V. Kumar, and E.E. Schadt. An integrative multi-network and multi-classifier approach to predict genetic interactions. PLoS computational biology, 6(9):e1000928, 2010 (Cited as one of the major computational biology breakthroughs of 2010 by a Nature Biotechnology feature article).

J. Bellay, G. Atluri, T.L. Sing, K. Toufighi, M. Costanzo, P.S.M. Ribeiro, G.

Pandey, J. Baller, B. VanderSluis, M. Michaut, et al. Putting genetic interactions in context through a global modular decomposition. Genome Research, 21(8):1375-1387, 2011.Slide22

Acknowledgement

Kumar Lab, Data MiningGang FangWen WangVanja PaunicYi YangBenjamin OatleyXiaoye LiuSanjoy DeyGowtham Atluri

Gaurav PandeyMichael

SteinbachMyers Lab,

FuncGenomics

Jeremy

Bellay

Chad Myers

Kuang

Lab,

Compbio

TaeHyun

Hwang

Rui

Kuang

Wendt Lab, Lung Disease

Chris Wendt

Masonic Cancer Center

Tim Church

Bill

Oetting

McDonald Lab, BehaviorAngus McDonaldMayo Clinic-IBM-UMR fellowship, Walter Barnes Lang fellowship, NSF: #IIS0916439, UMII seed grant, BICB seed grant, Computations enabled by the Minnesota Supercomputing Institute. BioMedical Genomics Center at University of Minnesota, International Myeloma Foundation. Etiology and Early Marker Study program of the Prostate Lung Colorectal and Ovarian Cancer Screening Trial

Van Ness Lab, MyelomaBrian

Van NessLim Lab, Brain ImagingKelvin LimSlide23

Thanks!

23