/
Optimizing pooling strategies for the Optimizing pooling strategies for the

Optimizing pooling strategies for the - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
446 views
Uploaded On 2017-01-24

Optimizing pooling strategies for the - PPT Presentation

massive nextgeneration sequencing of viral samples Pavel Skums 1 Joint work with Olga Glebova 2 Alex Zelikovsky 2 Ion Mandoiu 3 and Yury Khudyakov 1 1 Centers for Disease Control and Prevention Atlanta GA ID: 513527

set problem pooling samples problem set samples pooling number optimal design cliques clique exists minimal minimum test pools natural vertex vertices pool

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Optimizing pooling strategies for the" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Optimizing pooling strategies for the massive next-generation sequencing of viral samples

Pavel Skums1Joint work withOlga Glebova2, Alex Zelikovsky2, Ion Mandoiu3 and Yury Khudyakov1

1Centers for Disease Control and Prevention, Atlanta, GA2Georgia State University, Atlanta, GA3University of Connecticut, Storrs, CTSlide2

1. Massive NGS of viral samples

2. Optimal pooling design problem

3. Algorithm and resultsOutlineSlide3

NGS in epidemiology

Molecular surveillance

Prediction of the epidemics progressTransmission networksEpidemiological parameters

Vaccination strategiesSlide4

NGS in epidemiology

A large-scale molecular surveillance requires sequencing ofunprecedentedly large sets of viral samples.

NGS of tens of thousands of samples is highly cost- and labor-intensive.Example: sequencing 100K samples using 454 senior system with 50 MIDs doing one sequencing run per dayCost:  5000*(100 000)/50 = 10 000 000$Time

: (100 000)/50 = 2000 days  5.5 years Slide5

Optimal Pooling Design Problem

Goal: a framework for identification of viral sequences from large number of samples using the smallest possible number of NGS runs.

Idea: for n samples generate m pools (i.e. mixtures of samples) with m <<

n in such a way that every sample is uniquely identified by the pools to which it belongs. Slide6

Optimal Pooling Design Problem

EExample.8 samples: 1,2,3,4,5,6,7,8

Sequencing each sample separately: 8 runs4 pools: M1 = {1,2,3,4} M2

= {5,6,7,8} M3 = {1,2,5,6} M

4

= {1,3,5,7

}

Sequencing

pools

:

4 runsSlide7

Optimal Pooling Design Problem

4 pools: M1

= {1,2,3,4} M2 = {5,6,7,8} M3 = {1,2,5,6} M4 = {1,3,5,7}

{1} = M1M

3

M

4

{2} = (M

1

M

3

) \ M

4

……………………………………

{8} = (M

2

\ M

3

) \ M

4Slide8

Optimal Pooling Design Problem

Problem 1 (Optimal Pooling Design Problem).Given

: a set of samples S = {S1,...,Sn}Find: a set of pools P = {P1,…,Pm} ,

Pk  S for k=1,…,m such that

1)P

1

…P

m

=

S

2) for every

S

i

,S

j

S

there

exists

P

k

P

such that

|

P

k

{

S

i

,S

j

}| = 1

3)

m

is minimal

Theorem1

. There exists a solution of Problem 1 with m = log(n) + 1

(

P

k

separates

S

i

and

S

j

)Slide9

Optimal Pooling Design Problem

Additional conditions for the problem

ConditionReasonsEach pool contains at most k samples|Pj| ≤ k for j = 1,…,mnumber of reads which could be obtained by each NGS technology is bounded

if large number of samples are mixed in one pool, some of them may be lost due to a PCR biasSlide10

Optimal Pooling Design Problem

Additional conditions for the problem

ConditionReasonsEach pool contains at most k samples|Pj| ≤ k for j = 1,…,mnumber of reads which could be obtained by each NGS technology is bounded

if large number of samples are mixed in one pool, some of them may be lost due to a PCR bias

Each

sample belongs to at least l pools

|{j : S

i

P

j

}| ≥ l for i = 1,…,n

to ensure sufficient coverage for sequences of each

sampleSlide11

Optimal Pooling Design Problem

Additional conditions for the problem

ConditionReasonsEach pool contains at most k samples|Pj| ≤ k for j = 1,…,mnumber of reads which could be obtained by each NGS technology is bounded

if large number of samples are mixed in one pool, some of them may be lost due to a PCR bias

Each

sample belongs to at least l pools

|{j : S

i

P

j

}| ≥ l for i = 1,…,n

to ensure sufficient coverage for sequences of each

sample

Some pairs of samples should not be put into a same pool

Some samples may intersect (if they belong to

the same transmission cluster)Slide12

Optimal Pooling Design Problem

ConditionReasons

Each pool Pj should be a clique of the graph G(S)Some samples may intersect (if they belong to the same transmission cluster)A graph G(S) = (V,E), where

V = SSiSjE

if

and only if there is a confidence that

the samples S

i

and

S

j

do not intersect.Slide13

Optimal Pooling Design Problem

Problem 2 (Minimum Clique Test Set Problem).Given

: a graph G=G(S), natural numbers k,lFind: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every

i=1,…,m 2) every vertex v V(G) belongs to at least l cliques from P

3) for every

u,v

V

(G)

there

exists

P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimalSlide14

Optimal Pooling Design Problem

Minimum Test Set Problem (Garey, Johnson)

Given: collection Q={Q1,…,Qn} of subsets of a finite set SFind: a subcollection P = {P1,…,Pm

}Q such that 1) for every s

i

,s

j

S

there

exists

P

r

P

such that

|

P

r

{

s

i

,s

j

}| = 1

2)

m

is minimalSlide15

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graph G=G(S), natural numbers k,l

Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m

2) every vertex v V(G) belongs to at least l cliques from P

3) for every

u,v

V

(G)

there exists P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

Only some pairs of vertices should be separated

A graph H with V(H)=V(G),

uv

E

(H) if and only if u and v should be separatedSlide16

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graphs G and H on the same set of vertices V, natural

numbers k,lFind: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every

i=1,…,m 2)

every vertex v V(G) belongs to at least l cliques from

P

3

) for every

uv

E

(H)

there

exists

P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

Only some pairs of vertices should be separated

A graph H with V(H)=V(G),

uv

E

(H) if and only if u and v should be separatedSlide17

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graphs G and H on the same set of vertices V, natural numbers k,l

Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m

2) every vertex v V belongs to at least l cliques from P

3) for every

uv

E

(H)

there

exists

P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

Replace each vertex

v

V

(G) with l pairwise non-adjacent copiesSlide18

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graphs G and H on the same set of vertices V, natural number k

Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m

2) P

1

…P

m

=

V(G)

3

) for every

uv

E

(H)

there

exists

P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

Replace each vertex

v

V

(G) with l pairwise non-adjacent copiesSlide19

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graphs G and H on the same set of vertices V, natural number k

Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m

2) P1…Pm =

V

3

) for every

uv

E

(H)

there

exists

P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

For every

u

V

add a vertex

x

u

and an edge

ux

u

E

(H)

1

2

3Slide20

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graphs G and H on the same set of vertices V, natural number k

Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m

2) P1…Pm =

V

3

) for every

uv

E

(H)

there

exists

P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

For every

u

V

add a vertex

x

u

and an edge

ux

u

E

(H)

1

2

3

x

1

x

2

x

3Slide21

Problem reformulations

Minimum Clique Test Set ProblemGiven: a graphs G and H on the same set of vertices, natural number kFind

: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 3) for every

uvE(H) there exists P

i

P

such that

|P

i

{

u,v

}| = 1

4)

m

is minimal

For every

u

V

add a vertex

x

u

and an edge

ux

u

E

(H)

1

2

3

x

1

x

2

x

3Slide22

Heuristic algorithm

Input: a graphs G and H on the same set of vertices V, natural number kP =

WHILE CP C  V OR E(H)   find maximum cut (A,B) in H (using local search); for every aA put w(a) = # of neighbors of a from B in H;

for every b

B

put

w(b)

= # of neighbors of

b

from

A

in

H;

find maximum clique C

1

with |C

1

|≤k in a

subgraph

G[A] with weights w;

find

maximum clique

C

2

with |

C

2

|

≤k

in a

subgraph

G[B]

with weights

w;

C :=

argmax

{w(C

1

),w(C

2

)} ;

P

:=

P

 {C};

E(H) := E(H) \ {

uv

:

uC

,

vV

\C}

ENDWHILESlide23

Algorithm results

1. All samples are unrelated (i.e. G is complete graph)

# samples# generated pools

4

3

8

4

16

5

32

6

64

7

128

8

256

9

512

10

1024

11

2048

12Slide24

Algorithm results

2. Some samples are related (G is a random graph with the edge probability p)Slide25

Thank you!