/
Approximate Dynamic Programmin Approximate Dynamic Programmin

Approximate Dynamic Programmin - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
387 views
Uploaded On 2016-07-23

Approximate Dynamic Programmin - PPT Presentation

g and aCGH denoising Charalampos Babis E Tsourakakis ctsourakmathcmuedu Machine Learning Seminar 10 th January 11 Machine Learning Lunch Seminar ID: 415479

seminar machine lunch learning machine seminar learning lunch time monge work data algorithm optimization experimental halfspace related outline problem

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Approximate Dynamic Programmin" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Approximate Dynamic Programming and aCGH denoising

Charalampos (Babis) E. Tsourakakis ctsourak@math.cmu.edu

Machine Learning Seminar

10th January ‘11

Machine Learning Lunch Seminar

1Slide2

Joint work

Richard Peng

SCS, CMUGary L. Miller SCS, CMU

Russell Schwartz

SCS & BioScienceCMUMachine Learning Lunch Seminar

2

David Tolliver

SCS, CMU

Maria

Tsiarli

CNUP, CNBC

Upitt

and

Stanley Shackney

OncologistSlide3

OutlineMotivation

Related Work Our contributionsHalfspaces and DPMultiscale

Monge optimization Experimental ResultsConclusionsMachine Learning Lunch Seminar

3Slide4

Dynamic Programming

Machine Learning Lunch Seminar

4Richard Bellman

“Lazy” Recursion!Overlapping subproblems

Optimal Substructure Why Dynamic Programming?

The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named

Wilson

.

He was Secretary of Defense, and he actually had

a pathological fear

and

hatred of the word

, research

. I’m not using the term lightly; I’m

using

it precisely.

His face would suffuse, he would turn red, and he would

get violent if people used the term, research, in his presence.You can imagine how

he felt, then, about the term, mathematical. ….. What title, what name, could I choose? …. Thus, I thought dynamic programming was a good name. It

was something

not even a

Congressman could

object to

. Slide5

*Few* applications…Machine Learning Lunch Seminar

5

DNA sequence alignment

“Pretty” printing

Histogram

constructionin DB systems

HMMs

and many more…Slide6

… and few books

Machine Learning Lunch Seminar

6Slide7

Motivation

Machine Learning Lunch Seminar

7Tumorigenesis is strongly associated with abnormalities in DNA copy numbers.Goal: find the true DNA

copy number per probeby denoising the measurements.

Array based comparative genomic hybridization (aCGH)Slide8

Typical AssumptionsNear-by probes tend to have the same DNA copy number

Treat the data as 1d time seriesFit piecewise constant segmentsMachine Learning Lunch Seminar

8

log T/Rfor humansR=2

GenomeSlide9

Problem FormulationInput: Noisy sequence (p1

,..,pn)Output: (F1,..,Fn) which minimizes

Digression:Constant C is determined by training on data with ground truth. Machine Learning Lunch Seminar

9Slide10

OutlineMotivation

Related Work Our contributionsHalfspaces and DPMultiscale

Monge optimization Experimental ResultsConclusionsMachine Learning Lunch Seminar

10Slide11

Related Work

Machine Learning Lunch Seminar

11Don KnuthOptimal BSTs in O(n2

) time

Frances Yao

 

Recurrence

 

Quadrangle

inequality

(

Monge

)

Then we can turn the naïve O(n

3

) algorithm to O(n

2

) Slide12

Related WorkGaspard Monge

Machine Learning Lunch Seminar12

 

 

Quadrangle

inequality

Inverse

Quadrangle

inequality Slide13

Related WorkMachine Learning Lunch Seminar

13

Eppstein

Galil

Giancarlo

Larmore

Schieber

 

time

 

time

 Slide14

Related WorkSMAWK algorithm : finds all row minima of a totally monotone matrix

NxN in O(N) time! Bein, Golin, Larmore, Zhang showed that the Knuth-Yao technique is implied by the SMAWK algorithm.

Machine Learning Lunch Seminar14Slide15

Related WorkHMMsBayesian HMMs

Kalman FiltersWavelet decompositionsQuantile regressionEM and edge filtering LassoCircular Binary Segmentation (CBS)

Likelihood based methods using a Gaussian profile for the data (CGHSEG)…….Machine Learning Lunch Seminar15Slide16

OutlineMotivation

Related Work Our contributionsHalfspaces and DPMultiscale

Monge optimization Experimental ResultsConclusionsMachine Learning Lunch Seminar

16Slide17

Our contributionsTechnique 1: Using halfspace

queries get a fast, high quality approximation algorithm ε additive error, runs in O(n4/3+δ log(U/

ε) ) time.Technique 2: break carefully original problem into a “small” number of Monge optimization problems. Approximates optimal answer within a factor of (1+ε), O(nlogn/ε)

time.Machine Learning Lunch Seminar

17Slide18

Analysis of our Recurrence

Recurrence for our optimization problem:Equivalent formulation where

 

Machine Learning Lunch Seminar

18

, for all i>0Slide19

Analysis of our RecurrenceLet

Claim: This immediately implies a O(n2) algorithm for this problem.

Machine Learning Lunch Seminar19

This term kills the

Monge

property! Slide20

Why it’s not Monge?

Basically, because we cannot be searching for the optimum breakpoint in a restricted range. E.g., for C=1 and the sequence (0,2,0,2,….,0,2) : fit a segment per point (0,2,0,2,….,0,2,1): fit one segment for all points

Machine Learning Lunch Seminar20Slide21

Halfspaces and DPMachine Learning Lunch Seminar

21

Do binary searches to approximate DPi for every i=1,..,n. LetWe do enough iterations in order to get . O(logn log(U/

ε)) iterations suffice where

C}

By induction we can show that

,i.e., additive

ε

error

approximation

 Slide22

Halfspaces and DPMachine Learning Lunch Seminar

22

i fixed, binary search query

constant

 

 

~

~Slide23

Dynamic Halfspace Reporting

23

Eppstein

Agarwal

Matousek

Halfspace emptiness query

 

Given a set of points S in R

4

the

halfspace

range reporting problem can be solved

query time

space and preprocessing time

u

pdate time

 Slide24

Halfspaces and DPHence the algorithm iterates

through the indices i=1..n, and maintains the Eppstein et al. data structure containing one point for every j<i.It performs binary search on the value, which reduces to emptiness queriesIt provides an answer within ε additive error from the optimal one.Running time: O(n

4/3+δ log(U/ε) )Machine Learning Lunch Seminar

24~Slide25

Multimonge Decomposition and DPBy simple algebra we can write our weight function w(

j,i) as w’(j,i)/(i-j)+C where The weight function w’ is Monge

! Key Idea: approximate i-j by a constant! But how? Machine Learning Lunch Seminar25Slide26

Multimonge Decomposition and DPFor each i, we break the choices of j into intervals [

lk,rk] s.t i-lk and i-rk

differ by at most 1+ε.Ο(logn/ε) such intervals suffice to get a 1+ε approximation. However, we need to make sure that when we solve a specific subproblem, the optimum lies in the desired interval.

How?Machine Learning Lunch Seminar

26Slide27

Multimonge Decomposition and DP

M is a sufficiently large positive constantRunning time O(nlogn/ε)

using an O(n) time persubproblemMachine Learning Lunch Seminar27

Larmore

SchieberSlide28

OutlineMotivation

Related Work Our contributionsHalfspaces and DPMultiscale

Monge optimization Experimental ResultsConclusionsMachine Learning Lunch Seminar

28Slide29

Experimental SetupAll methods implemented in MATLAB CGHseg

(Picard et al.)CBS (MATLAB Bioinformatics Toolbox)Train on data with ground truth to “learn” C.Datasets“Hard” synthetic data (Lai et al.)Coriell Cell Lines

Breast Cancer Cell LinesMachine Learning Lunch Seminar29Slide30

Synthetic DataMachine Learning Lunch Seminar

30

CGHTRIMMERCBSCGHSEG

Misses a small

aberrant region!0.007 sec60 sec

1.23 secSlide31

Coriell Cell Lines

CGHTRIMMERCBS

CGHSEG 5.78 sec

47.7 min8.15 min

Machine Learning Lunch Seminar31

2 (1FP,1FN) mistakes, both made by the competitors.

8 (7FP,1FN) mistakes

8 (1FP,3FN) mistakesSlide32

Breast Cancer (BT474)Machine Learning Lunch Seminar

32

CGHTRIMMER

CGHSEG

CBS

NEK7, KIF14

Note: In the paper

we have an extensive

biological analysis

with respect to the

findings of the three

methodsSlide33

OutlineMotivation

Related Work Our contributionsHalfspaces and DPMultiscale

Monge optimization Experimental ResultsConclusionsMachine Learning Lunch Seminar

33Slide34

SummaryNew, simple formulation of aCGH denoising data with numerous other applications (e.g., histogram construction etc.)

Two new techniques for approximate DP:Halfspace queries Multiscale Monge Decomposition

Validation of our model using synthetic and real data.Machine Learning Lunch Seminar34Slide35

Problems

Other problems where our techniques are directly or almost directly applicable?O(n2) is unlikely to be tight. E.g., if two points pi and pj

satisfy

then they belong in different segments.

Can we find a faster exact algorithm?

 

Machine Learning Lunch Seminar

35Slide36

Thanks a lot!Machine Learning Lunch Seminar

36