/
Understanding and Managing Cascades on Large Graphs Understanding and Managing Cascades on Large Graphs

Understanding and Managing Cascades on Large Graphs - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
345 views
Uploaded On 2019-03-20

Understanding and Managing Cascades on Large Graphs - PPT Presentation

B Aditya Prakash Carnegie Mellon University Virginia Tech Christos Faloutsos Carnegie Mellon University Graph Analytics wkshp B A Prakash C Faloutsos From VLDB12 tutorial ID: 758341

prakash faloutsos analytics graph faloutsos prakash graph analytics wkshp virus wkshpb part time graphs epidemic networks threshold nodes immunization

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Understanding and Managing Cascades on L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Understanding and Managing Cascades on Large Graphs

B.

Aditya

PrakashCarnegie Mellon University Virginia Tech. Christos FaloutsosCarnegie Mellon University

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide2

From: VLDB’12 tutorialhttp://vldb.org/pvldb/vol5/p2024_badityaprakash_vldb2012.pdf

Graph Analytics wkshpB. A. Prakash; C. Faloutsos

2B. Aditya PrakashSlide3

Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook

Network [2010]The Internet [2005]

Graph Analytics wkshp

3

B. A. Prakash; C. FaloutsosSlide4

4

Dynamical Processes over networks are also everywhere!

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide5

Why do we care?

Social collaboration

Information DiffusionViral MarketingEpidemiology and Public HealthCyber SecurityHuman mobility Games and Virtual Worlds Ecology........

B. A. Prakash; C. Faloutsos

5

Graph Analytics wkshpSlide6

Why do we care? (1: Epidemiology)Dynamical Processes over networks

[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

6

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide7

Why do we care? (1: Epidemiology)Dynamical Processes over networks

Hospital

Another Hospital

Drug-resistant Bacteria (like XDR-TB)

7

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide8

Why do we care? (1: Epidemiology)Dynamical Processes over networks

Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred

[US-MEDICARE NETWORK 2005]

8Problem: Given k units of disinfectant, whom to immunize?Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide9

Why do we care? (1: Epidemiology)

CURRENT PRACTICE

OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]9Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide10

Why do we care? (2: Online Diffusion)10

> 800m users, ~$1B revenue

[WSJ 2010]

~100m active users

> 50m usersGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide11

Why do we care? (2: Online Diffusion) Dynamical Processes over networks

Celebrity

Buy Versace™!

Followers

11

Social Media Marketing

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide12

Why do we care? (3: To change the world?)Dynamical Processes over networks

Social networks and Collaborative Action

12Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide13

High Impact – Multiple Settings

Q. How to squash

rumors

faster?

Q. How do opinions spread?Q. How to market better?

13

epidemic out-breaks

products/viruses

transmit s/w patches

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide14

Research Theme

DATA

Large real-world networks & processes

14

ANALYSIS

Understanding

POLICY/ ACTION

Managing

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide15

Research Theme –

Public Health

DATA

Modeling # patient transfers

ANALYSIS

Will an epidemic happen?

POLICY/ ACTION

How to control out-breaks?

Graph Analytics wkshp

15

B. A. Prakash; C. FaloutsosSlide16

Research Theme –

Social Media

DATA

Modeling Tweets spreading

POLICY/ ACTION

How to market better?

ANALYSIS

# cascades in future?

Graph Analytics wkshp

16

B. A. Prakash; C. FaloutsosSlide17

In this tutorial17

ANALYSIS

Understanding

Given propagation models:

Q1: Will an epidemic happen? Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide18

In this tutorial18

Q2: How to immunize and control out-breaks better?

POLICY/ ACTION

Managing

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide19

In this tutorial19

DATA

Large real-world networks & processes

Q3: How do #

hashtags spread?Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide20

OutlineMotivationPart 1: Understanding Epidemics

(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models

(Empirical Studies)Conclusion20Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide21

Part 1: TheoryQ1: What is the epidemic threshold?Q2: How do viruses compete?

21Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide22

A fundamental question

Strong Virus

Epidemic?

22

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide23

example (static graph)

Weak Virus

Epidemic?

23

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide24

Problem Statement

Find, a condition under which virus will die out exponentially quickly

regardless of initial infection condition

above (epidemic)

b

elow (extinction)

# Infected

time

24

Separate the regimes?

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide25

Threshold (static version)Problem Statement

Given: Graph G, and

Virus specs (attack prob. etc.)Find: A condition for virus extinction/invasion25

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide26

Threshold: Why important?Accelerating simulationsForecasting (‘What-if’ scenarios)

Design of contagion and/or topologyA great handle to manipulate the spreadingImmunizationMaximize collaboration…..

26

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide27

Part 1: TheoryQ1: What is the epidemic threshold?Background

Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?

27

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide28

“SIR” model: life immunity (mumps)

Each node in the graph is in one of three statesSusceptible (i.e. healthy)I

nfectedRemoved (i.e. can’t get infected again)28

Prob.

β

Prob.

δ

t = 1

t = 2

t = 3

Background

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide29

Terminology: continued

Other virus propagation models (“VPM”)SIS : susceptible-infected-susceptible, flu-likeSIRS : temporary immunity, like

pertussisSEIR : mumps-like, with virus incubation (E = Exposed)….………….Underlying contact-network – ‘who-can-infect-whom’

29

BackgroundGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide30

Related Work

R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991.A. Barrat, M.

Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press, 2010.F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227, 1969.D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C.

Faloutsos. Epidemic thresholds in real networks. ACM TISSEC, 10(4), 2008.D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM, 2005.Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003.H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000.H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics, 46, 1984.

J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991.

J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy, 1993.

R. Pastor-Santorras

and A.

Vespignani

. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14, 2001.

………

………

………

All are about

either

:

Structured topologies

(cliques, block-diagonals, hierarchies, random)

Specific virus propagation models

Static graphs

30

Background

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide31

Part 1: TheoryQ1: What is the epidemic threshold?Background

Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?

31

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide32

How should the answer look like?Answer should depend on:

GraphVirus Propagation Model (VPM)But how??Graph – average degree? max. degree? diameter?VPM – which parameters?

How to combine – linear? quadratic? exponential?

32

…..

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide33

Static Graphs: Our Main ResultInformally,

33

For, any arbitrary topology

(adjacency matrix A) any virus propagation model (VPM) in

standard literature

the

epidemic threshold depends only

on the

λ

,

first

eigenvalue

of

A

,

and

some

constant

, determined

by the virus propagation

model

λ

No epidemic if

λ

* < 1

In Prakash+ ICDM 2011 (Selected among

best papers

).

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide34

Our thresholds for some modelss = effective strength

s < 1 : below threshold

ModelsEffective Strength (s)Threshold (tipping point)SIS, SIR, SIRS, SEIR

s = λ .

s = 1 SIV, SEIVs = λ .

(

H.I.V.

)

s =

λ

.

34

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide35

Our result: Intuition for λ“Official” definition:

Let A be the adjacency matrix. Then

λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)].Doesn’t give much intuition!

“Un-official” Intuition

λ ~ # paths in the graph35

u

u

.

(

i

, j) = # of paths

i

j of length k

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide36

N nodes

Largest

Eigenvalue

(

λ

)

λ ≈ 2

λ = N

λ = N-1

36

N =

1000

λ

≈ 2

λ

= 31.67

λ

= 999

better connectivity higher

λ

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide37

Examples: Simulations – SIR (mumps)

(a) Infection profile (b) “Take-off” plotPORTLAND graph31 million links, 6 million

nodes

Fraction of InfectionsFootprintEffective Strength

Time ticks

37

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide38

Examples: Simulations – SIRS (pertusis)

Fraction of Infections

Footprint

Effective Strength

Time ticks

(a) Infection profile (b) “Take-off” plotPORTLAND graph

31 million links, 6 million

nodes

38

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide39

Part 1: TheoryQ1: What is the epidemic threshold?Background

Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?

39

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide40

λ

*

< 1

Graph-based

Model-based

40

Proof Sketch

General VPM structure

Topology and stability

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide41

Models and more models

ModelUsed for

SIRMumpsSISFluSIRS

PertussisSEIRChicken-pox

……..SICRTuberculosisMSIRMeaslesSIVSensor Stability

H.I.V.

……….

41

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide42

Ingredient 1: Our generalized model

42

Endogenous Transitions

Susceptible

Infected

Vigilant

Exogenous Transitions

Endogenous Transitions

Endogenous Transitions

Susceptible

Infected

Vigilant

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide43

Special case

Susceptible

Infected

Vigilant

43

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide44

Special case: H.I.V.

Multiple Infectious, Vigilant states

44

“Terminal”

“Non-terminal”Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide45

Ingredient 2: NLDS+StabilityView as a NLDS

discrete time non-linear dynamical system (NLDS)

Probability vector Specifies the state of the system

at time t

Details45

size

mN

x

1

.

.

.

.

.

size N

(number of nodes in the graph)

.

.

.

S

I

V

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide46

Ingredient 2: NLDS + StabilityView as a NLDSdiscrete time

non-linear dynamical system (NLDS)

Non-linear functionExplicitly gives the evolution of system

Details

46

size

mN

x

1

.

.

.

.

.

.

.

.

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide47

Ingredient 2: NLDS + StabilityView as a NLDSdiscrete time

non-linear dynamical system (NLDS)Threshold  Stability of NLDS

47

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide48

=

probability that node

i is not attacked by any of its infectious neighbors

Special case: SIR

size 3N x 1I

R

S

NLDS

I

R

S

Details

48

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide49

Fixed Point

1

1

.

00.00.State when no node is infected

Q: Is it stable?

Details

49

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide50

Stability for SIR

Stable

under thresholdUnstableabove threshold

50

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide51

λ

*

< 1

Graph-based

Model-based

51

General VPM structure

Topology and stability

See paper for full proof

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide52

Part 1: TheoryQ1: What is the epidemic threshold?Background

Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?

52

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide53

Dynamic Graphs: Epidemic?

adjacency matrix

8

8

Alternating behaviors

DAY

(e.g., work)

53

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide54

adjacency matrix

8

8

Dynamic Graphs: Epidemic?

Alternating behaviors

NIGHT

(e.g., home)

54

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide55

SIS modelrecovery rate δ

infection rate β

Set of T arbitrary graphs

Model Description

day

N

N

night

N

N

, weekend…..

Infected

Healthy

X

N1

N3

N2

Prob.

β

Prob.

β

Prob.

δ

55

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide56

Informall

y, NO epidemic if

eig (S) = < 1Our result: Dynamic Graphs Threshold

Single number!

Largest eigenvalue of The system matrix S

In

Prakash+, ECML-PKDD 2010

56

S =

Details

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide57

Synthetic

MIT Reality Mining

log(fraction infected)

Time

BELOW

AT

ABOVE

ABOVE

AT

BELOW

Infection-profile

57

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide58

“Take-off” plots

Footprint (# infected @ “steady state”)

Our threshold

Our threshold

(log scale)

NO EPIDEMIC

EPIDEMIC

EPIDEMIC

NO EPIDEMIC

Synthetic

MIT Reality

58

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide59

Part 1: TheoryQ1: What is the epidemic threshold?Q2: What happens when viruses compete?

Mutually-exclusive virusesInteracting viruses

59Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide60

Competing Contagions

iPhone v Android

Blu-ray v

HD-DVD

Biological common flu/avian flu, pneumococcal inf etc60Attack

Retreat

v

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide61

A simple modelModified flu-like Mutual Immunity (“pick one of the two”)

Susceptible-Infected1-Infected2-Susceptible61

Virus 1

Virus 2

Details

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide62

Question: What happens in the end?

62

green

: virus 1

red

: virus 2

Footprint @ Steady State

Footprint @ Steady State

= ?

Number of Infections

ASSUME:

Virus 1 is stronger than Virus 2

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide63

Question: What happens in the end?

63

green

: virus 1

red

: virus 2

Number of Infections

ASSUME:

Virus 1 is stronger than Virus 2

Strength

Strength

??

=

Strength

Strength

2

Footprint @ Steady State

Footprint @ Steady State

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide64

Answer: Winner-Takes-All

64

green: virus 1red: virus 2

ASSUME: Virus 1 is stronger than Virus 2

Number of InfectionsGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide65

Our Result: Winner-Takes-All65

In

Prakash+ WWW 2012Given our model, and any graph

, the weaker virus always dies-out completely

The stronger survives only if it is above threshold Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)Strength(Virus) = λ β / δ  same as before!

Details

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide66

Real Examples66

Reddit

v Digg

Blu-Ray v

HD-DVD[Google Search Trends data]

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide67

Part 1: TheoryQ1: What is the epidemic threshold?Q2: What happens when viruses compete?

Mutually-exclusive virusesInteracting viruses

67Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide68

A simple model:

SI

1|2S

Modified flu-like (SIS) Susceptible-Infected1 or 2-SusceptibleInteraction Factor

εFull Mutual Immunity: ε = 0Partial Mutual Immunity (competition): ε < 0Cooperation: ε > 0

68

Virus 1

Virus 2

&

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide69

Question: What happens in the end?69

ASSUME: Virus 1 is stronger than Virus 2

ε = 0Winner takes all

ε = 1Co-exist independently

ε

= 2

Viruses cooperate

What about for

0 <

ε

<1

?

Is there a point at which both viruses can

co-exist

?

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide70

Answer: Yes!

There is a phase transition

70ASSUME: Virus 1 is stronger than Virus 2

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide71

Answer: Yes! There is a phase transition

71ASSUME:

Virus 1 is stronger than Virus 2

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide72

Answer: Yes! There is a phase transition

72ASSUME:

Virus 1 is stronger than Virus 2

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide73

The stronger survives only if it is above threshold Virus 1 is stronger than Virus 2, if:

strength(Virus 1) > strength(Virus 2)Strength(Virus) σ = N

β / δOur Result: Viruses can Co-exist73

Given our model and a fully connected graph, there exists an ε

critical such that for ε ≥ εcritical, there is a fixed point where both viruses survive.Details

In

Beutel+ KDD 2012

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide74

Real Examples

74

Hulu

v Blockbuster

[Google Search Trends data]

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide75

Real Examples

75

Chrome

v Firefox

[Google Search Trends data]

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide76

OutlineMotivationPart 1: Understanding Epidemics

(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models

(Empirical Studies)Conclusion76Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide77

Part 2: AlgorithmsQ3: Whom to immunize?Q4: How to detect outbreaks?

Q5: Who are the culprits?77

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide78

?

?

Given

:

a graph

A

, virus prop. model and budget

k

;

Find

:

k

‘best’ nodes for immunization (removal)

.

k = 2

?

?

Full Static Immunization

78

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide79

Part 2: AlgorithmsQ3: Whom to immunize?Full Immunization (Static Graphs)

Full Immunization (Dynamic Graphs)Fractional ImmunizationQ4: How to detect outbreaks?Q5: Who are the culprits?

79

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide80

ChallengesGiven a graph

A, budget k, Q1

(Metric) How to measure the ‘shield-value’ for a set of nodes (S)? Q2 (Algorithm) How to find a set of

k nodes with highest ‘shield-value’?

80Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide81

Proposed vulnerability measure λ

Increasing λ

Increasing vulnerability

λ is the epidemic threshold

“Safe”

“Vulnerable”

“Deadly”

81

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide82

1

9

10

3

4

5

7

8

6

2

9

1

11

10

3

4

5

6

7

8

2

9

Original Graph

Without {2, 6}

Eigen-Drop(

S

)

Δ

λ

= λ -

λ

s

Δ

A1

: “Eigen-Drop”: an ideal shield value

82

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide83

(Q2) - Direct Algorithm too expensive!Immunize k

nodes which maximize Δ λ

S = argmax Δ λCombinatorial!Complexity:Example: 1,000 nodes, with 10,000 edges

It takes 0.01 seconds to compute λIt takes

2,615 years to find 5-best nodes! 83

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide84

A2: Our SolutionPart 1: Shield Value

Carefully approximate Eigen-drop (Δ λ)Matrix perturbation

theoryPart 2: AlgorithmGreedily pick best node at each stepNear-optimal due to submodularityNetShield (linear complexity)O(nk

2+m) n = # nodes; m = # edges

84In Tong, Prakash+ ICDM 2010Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide85

Our Solution: Part 1Approximate Eigen-drop (

Δ λ) Δ

λ ≈ SV(S) =Result using Matrix perturbation theoryu(

i) == ‘eigenscore’

~~ pagerank(i)A

u

=

λ

.

u

u(

i

)

Details

85

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide86

P1:

node importance

P2:

set diversity

Original Graph

Select by P1

Select by P1+P2

Details

86

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide87

Our Solution: Part 2: NetShield

We prove that: SV(S) is sub-modular (& monotone non-decreasing)

NetShield: Greedily add best node at each step

Corollary: Greedy algorithm works

1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk

2+m)

Footnote: near-optimal means SV(

S

NetShield

) >= (1-1/e) SV(

S

Opt

)

87

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide88

Experiment: Immunization quality

Log(fraction of

infected nodes)

NetShield

Degree

PageRank

Eigs

(=HITS)

Acquaintance

Betweeness

(shortest path)

Lower is better

Time

88

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide89

Part 2: AlgorithmsQ3: Whom to immunize?Full Immunization (Static Graphs)

Full Immunization (Dynamic Graphs)Fractional ImmunizationQ4: How to detect outbreaks?Q5: Who are the culprits?

89

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide90

Full Dynamic Immunization Given: Set of

T arbitrary graphs Find: k

‘best’ nodes to immunize (remove)

day

NN

night

N

N

, weekend…..

In Prakash+ ECML-PKDD 2010

90

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide91

Full Dynamic Immunization Our solutionRecall theorem

Simple: reduce (= )Goal: max eigendrop ΔNo competing policy for comparisonWe propose and evaluate many policies

Matrix Product

Δ =

day

night

91

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide92

Performance of Policies

MIT Reality Mining

Lower is better

92

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide93

Part 2: AlgorithmsQ3: Whom to immunize?Full Immunization (Static Graphs)

Full Immunization (Dynamic Graphs)Fractional ImmunizationQ4: How to detect outbreaks?Q5: Who are the culprits?

93

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide94

94Fractional Immunization of Networks

B. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.),

Hanghang Tong, Christos FaloutsosUnder Submission Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide95

?

?

Given

:

a graph

A

, virus prop. model and budget

k

;

Find

:

k

‘best’ nodes for immunization (removal)

.

k = 2

Full Static Immunization

95

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide96

Fractional Asymmetric Immunization96

Fractional Effect [ f(x) = ]Asymmetric Effect

# antidotes = 3

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide97

Fractional Asymmetric Immunization97

Fractional Effect [ f(x) = ]Asymmetric Effect

# antidotes = 3

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide98

Fractional Asymmetric Immunization98

Fractional Effect [ f(x) = ]Asymmetric Effect

# antidotes = 3

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide99

Fractional Asymmetric Immunization

Hospital

Another Hospital

99

Drug-resistant Bacteria (like XDR-TB)

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide100

Fractional Asymmetric Immunization

Hospital

Another Hospital

Drug-resistant Bacteria (like XDR-TB)

100

=

f

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide101

Fractional Asymmetric Immunization

Hospital

Another Hospital

101

Problem

: Given k units of disinfectant, how to distribute them to maximize hospitals saved?

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide102

Our Algorithm “SMART-ALLOC”

CURRENT PRACTICE

SMART-ALLOC

[US-MEDICARE NETWORK 2005]

102 Each circle is a hospital, ~3000 hospitals More than 30,000 patients transferred ~6x fewer!Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide103

Running Time103

Simulations

SMART-ALLOC

> 1 week

14

secs

> 30,000x speed-up!

Wall-Clock Time

Lower is better

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide104

Experiments104

K = 200

K = 2000

PENN-NETWORK

SECOND-LIFE

~5 x

~2.5 x

Lower is better

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide105

Part 2: AlgorithmsQ3: Whom to immunize?Q4: How to detect outbreaks?

Q5: Who are the culprits?105

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide106

Outbreak detectionProblems of finding sources of contamination in water networks and finding “hot” stories on blogs are isomorphic.

Minimize time to detection, population affectedMaximize probability of detection.Minimize sensor placement cost.

Blogs

Posts

Links

Information cascade

Graph Analytics wkshp

106

B. A. Prakash; C. FaloutsosSlide107

J. Leskovec, A. Krause, C. Guestrin

, C. Faloutsos, J. VanBriesen, N. Glance. "Cost-effective Outbreak Detection in Networks” KDD 2007

107Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide108

CELF: Main idea

Given a graph G(V,E)

and a budget of B sensors and data on how contaminations spread over the network: for each contamination i we know the time

T(i

, u) when it contaminated node uMinimize time to detect outbreakCELF algorithm uses submodularity and lazy evaluationGraph Analytics wkshp108

B. A. Prakash; C. FaloutsosSlide109

Blogs: Comparison to heuristics

Benefit

(higher=better)Graph Analytics wkshp109B. A. Prakash; C. FaloutsosSlide110

k PA score Blog NP IL OLO OLA1 0.1283 http://instapundit.com 4593

4636 1890 52552 0.1822 http://donsurber.blogspot.com 1534 1206 679 34953 0.2224 http://sciencepolitics.blogspot.com 924 576 888 2701

4 0.2592 http://www.watcherofweasels.com 261 941 1733 36305 0.2923 http://michellemalkin.com 1839 12642 1179 63236 0.3152 http://blogometer.nationaljournal.com 189 2313 3669 92727 0.3353 http://themodulator.org 475 717 1844 49448 0.3508

http://www.bloggersblog.com 895 247 1244 102019 0.3654 http://www.boingboing.net

5776 6337 1024 618310 0.3778 http://atrios.blogspot.com 4682 3205 795 3102“Best 10 blogs to read”NP - number of posts, IL- in-links, OLO- blog out links, OLA- all out links

Graph Analytics wkshp

110

B. A. Prakash; C. FaloutsosSlide111

Part 2: AlgorithmsQ3: Whom to immunize?Q4: How to detect outbreaks?

Q5: Who are the culprits?111

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide112

B. Aditya Prakash, Jilles Vreeken

, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ Under Submission112

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide113

Problem definition113

2-d grid‘+’ -> infectedWho started it?

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide114

Problem definition114

2-d grid‘+’ -> infectedWho started it?

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide115

Culprits: Exoneration115

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide116

Who are the culpritsTwo-part solutionuse MDL for number

of seedsfor a given number:exoneration = centrality + penaltyour method uses smallest eigenvector of

Laplacian submatrixRunning time =linear! (in edges and nodes) 116

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide117

Culprits: Results117

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide118

OutlineMotivationPart 1: Understanding Epidemics

(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models

(Empirical Studies)Conclusion118Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide119

Part 3: Empirical StudiesQ6: How do cascades look like?Q7: How does activity evolve over time?

Q8: How does external influence act?119

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide120

B. A. Prakash; C. FaloutsosCascading Behavior in Large Blog Graphs

How does information propagate

over the blogosphere?

Blogs

PostsLinks

Information cascade

J.

Leskovec

,

M.McGlohon

, C.

Faloutsos

, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.

Graph Analytics wkshp

120Slide121

3 - 121B. A. Prakash; C. Faloutsos

Cascades on the Blogosphere

Cascade

is graph induced by a time ordered propagation of information (edges)

CascadesB1

B

2

B

4

B

3

a

b

c

d

e

B

1

B

2

B

4

B

3

1

1

2

1

3

1

d

e

b

c

e

a

Blogosphere

blogs + posts

Blog network

links among blogs

Post network

links among posts

Graph Analytics wkshpSlide122

3 - 122B. A. Prakash; C. Faloutsos

Blog data

45,000 blogs participating in cascades

All their posts for 3 months (Aug-Sept ‘05)

2.4 million

posts

~5 million links (245,404 inside the dataset)

Time [1 day]

Number of posts

Number of posts

Graph Analytics wkshpSlide123

Popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1

2

3

@t

@t +

lag

Graph Analytics wkshp

123

B. A. Prakash; C. FaloutsosSlide124

Popularity over time

Post popularity drops-off – exponentially?

POWER LAW!Exponent?

# in links

(log)

days after post

(

log

)

Graph Analytics wkshp

124

B. A. Prakash; C. FaloutsosSlide125

Popularity over time

Post popularity drops-off – exponentially?

POWER LAW!

Exponent? -1.6 close to -1.5: Barabasi’s stack model

and like the zero-crossings of a random walk# in links(log)

-1.6

days after post

(

log

)

Graph Analytics wkshp

125

B. A. Prakash; C. FaloutsosSlide126

B. A. Prakash; C. Faloutsos

-1.5 slopeJ. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein.

Nature 437, 1251 (2005) . [PDF]

Graph Analytics wkshp

126Slide127

3 - 127B. A. Prakash; C. Faloutsos

Topological patterns: Cascades

Procedure for gathering cascades:

Find all initiators (nodes with out-degree 0)

Follow in-links

Produces directed acyclic graph

Count cascade shapes (use our multi-level graph isomorphism testing algorithm)

a

b

c

d

e

a

b

c

d

e

d

e

b

c

e

a

Graph Analytics wkshpSlide128

Topological Observations

How do we measure how information flows through the network?

Common cascade shapes extracted using algorithms in

[Leskovec, Singh, Kleinberg; PAKDD 2006].

Graph Analytics wkshp128B. A. Prakash; C. FaloutsosSlide129

B. A. Prakash; C. Faloutsos

Topological Observations

Cascade size distributions also follow power law.

What graph properties do cascades exhibit?

Observation 2: The probability of observing a cascade on n nodes follows a Zipf distribution:

p(n)

n

-2

Cascade size (# of nodes)

Count

a=-2

Graph Analytics wkshp

129Slide130

Topological Observations

What graph properties do cascades exhibit?

Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5).

Size of chain (# nodes)

Count

Size of star (# nodes)

Count

a=-3.1

a=-8.5

Graph Analytics wkshp

130

B. A. Prakash; C. FaloutsosSlide131

Blogs and structure

Cascades take on different shapes (sorted by frequency):

How can we use cascades

to identify communities?

Graph Analytics wkshp131B. A. Prakash; C. FaloutsosSlide132

PCA on cascade types

Perform PCA on sparse matrix.

Use log(count+1)

Project onto 2 PC…

.01

.07

.67

1.1

2.1

5.1

4.2

.07

3.4

1.1

3.2

boingboing

.09

2.1

4.6

slashdot

…………

~9,000 cascade types

~44,000 blogs

Graph Analytics wkshp

132

B. A. Prakash; C. FaloutsosSlide133

28

PCA on cascade types

Observation: Content of blogs and cascade behavior are often related.

Distinct clusters for

“conservative”

and

“humorous”

blogs (hand-labeling).

M.

McGlohon

, J.

Leskovec

, C.

Faloutsos

, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.

Graph Analytics wkshp

133

B. A. Prakash; C. FaloutsosSlide134

29

PCA on cascade types

Observation: Content of blogs and cascade behavior are often related.

Distinct clusters for

“conservative”

and

“humorous”

blogs (hand-labeling).

M.

McGlohon

, J.

Leskovec

, C.

Faloutsos

, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.

Graph Analytics wkshp

134

B. A. Prakash; C. FaloutsosSlide135

Part 3: Empirical StudiesQ6: How do cascades look like?Q7: How does activity evolve over time?

Q8: How does external influence act?135

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide136

Meme (# of mentions in blogs)short phrases Sourced from U.S. politics in 2008

136

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social mediaGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide137

Rise and fall patterns in social media

137

Can we find a unifying model, which includes these patterns?

four

classes on YouTube [Crane et al. ’08]

six

classes on Meme [Yang et al. ’11]

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide138

Rise and fall patterns in social media

138Answer: YES!

We can represent all patterns

by single model

In

Matusbara

+ SIGKDD 2012

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide139

Main idea - SpikeM

1. Un-informed bloggers (uninformed about rumor)

2. External shock at time nb (e.g, breaking news)

3. Infection (word-of-mouth)

139Infectiveness of a blog-post at age n: Strength of infection (quality of news)

Decay function

(how infective a blog posting is)

Time n=0

Time n=

n

b

Time n=n

b

+1

β

Power Law

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide140

SpikeM - with periodicityFull equation of SpikeM

140

Periodicity

12pm

Peak activity

3am

Low activity

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide141

DetailsAnalysis –

exponential rise and power-raw fall

141

Liner-log

Log-log

Rise-part

SI

-> exponential

SpikeM

-> exponential

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide142

DetailsAnalysis –

exponential rise and power-raw fall

142

Liner-log

Log-log

Fall-part

SI -> exponential

SpikeM

-> power law

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide143

Tail-part forecasts

143SpikeM

can capture tail part

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide144

“What-if” forecasting

144

e.g., given (1) first spike,

(2) release date of two sequel movies (3) access volume before the release date?

?

(1) First spike

(2) Release date

(3) Two weeks before release

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide145

“What-if” forecasting

145SpikeM can forecast not only tail-part, but also rise-part!

SpikeM

can forecast upcoming spikes(1) First spike(2) Release date

(3) Two weeks before release

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide146

Part 3: Empirical StudiesQ6: How do cascades look like?Q7: How does activity evolve over time?

Q8: How does external influence act?146

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide147

Tweets Diffusion: Problem DefinitionGiven: Action log of people tweeting a #

hashtagA network of usersFind:How external influence varies with #hashtags

?

?

?

?

?

??

??

147

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide148

Tweet Diffusion: DataYahoo! Twitter

firehoseMore than 750 million tweets (> 10 Tera-bytes)

Test-bed of > 6000 machinesHadoop+PIG system ver 0.20.204.0 Took top 500 hashtags (by volume) in Feb 2011

Network of users:connecting user X to user Y if X directed at least 3 @-messages to Y (or RT-ed a tweet)

148Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide149

Tweet DiffusionPropagation = Influence + ExternalDeveloped a model

takes the previous observations into accountwith parameters representing external influenceLearn from previous dataEM-style alternating minimizing

algorithmGroup tags according to learnt params 149

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide150

Results: External Influence vs Time

time

“External Effects”

#

nowwatching

, #

nowplaying

, #

epictweets

#

purpleglasses

, #

brits

, #

famouslies

#

oscar

, #25jan

#

openfollow

, #

ihatequotes

, #

tweetmyjobs

Can also use for Forecasting, Anomaly Detection!

Bursty

, external events

“Word-of-mouth” Not trending

Long-running tags

“Word-of-mouth”

150

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide151

OutlineMotivationPart 1: Understanding Epidemics

(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models

(Empirical Studies)Conclusion151Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide152

ConclusionsEpidemic ThresholdIt’s the

EigenvalueFast ImmunizationMax. drop in eigenvalue

, linear-time near-optimal algorithmBursts: SpikeM modelExponential growth, Power-law decay

152

Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide153

ML & Stats.

Comp. Systems

Theory &

Algo

.

Biology

Econ.

Social Science

Physics

153

Propagation on Networks

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide154

Publications

Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B.

Aditya Prakash, Alex Beutel, Roni

Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon

Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis

Faloutsos, Nicholas Valler, Christos

Faloutsos)

- In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal

Best Papers of ICDM

.)

Times Series Clustering: Complex is Simpler! (Lei Li

, B.

Aditya

Prakash

)

- In ICML 2011, Bellevue

Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas

Valler

, B.

Aditya

Prakash

,

Hanghang

Tong,

Michalis

Faloutsos

and Christos

Faloutsos

) –

In IEEE NETWORKING 2011, Valencia, Spain

Formalizing the BGP stability problem: patterns and a chaotic model

(B.

Aditya

Prakash

,

Michalis

Faloutsos

and Christos

Faloutsos

)

– In IEEE INFOCOM

NetSciCom

Workshop, 2011.

On the Vulnerability of Large Graphs (

Hanghang

Tong,

B.

Aditya

Prakash

, Tina

Eliassi-Rad

and Christos

Faloutsos

) – In IEEE ICDM 2010, Sydney, Australia

Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms

(B. Aditya

Prakash, Hanghang

Tong, Nicholas Valler, Michalis

Faloutsos and Christos Faloutsos)

– In ECML-PKDD 2010, Barcelona, Spain

MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (

Keith Henderson, Tina Eliassi-Rad

, Christos Faloutsos, Leman

Akoglu, Lei Li, Koji Maruhashi

, B. Aditya

Prakash

and Hanghang

Tong) - In SIGKDD 2010, Washington D.C.

Parsimonious Linear Fingerprinting for Time Series (Lei Li,

B. Aditya

Prakash and Christos Faloutsos

) - In VLDB 2010, SingaporeEigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (

B.

Aditya

Prakash

,

Ashwin

Sridharan

,

Mukund

Seshadri

, Sridhar

Machiraju

and Christos

Faloutsos

) –

In PAKDD 2010, Hyderabad, India

BGP-lens: Patterns and Anomalies in Internet-Routing Updates (

B.

Aditya

Prakash

, Nicholas

Valler

, David Andersen,

Michalis

Faloutsos

and Christos

Faloutsos

)

– In ACM SIGKDD 2009, Paris, France.

Surprising Patterns and Scalable Community Detection in Large Graphs (

B.

Aditya

Prakash

,

Ashwin

Sridharan

,

Mukund

Seshadri

, Sridhar

Machiraju

and Christos

Faloutsos

) –

In IEEE ICDM Large Data Workshop 2009, Miami

FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining

(

Shipra

Agarwal

,

Jayant

R.

Haritsa

and

B.

Aditya

Prakash

) –

In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes

Gehrke

.

Complex Group-By Queries For XML (C.

Gokhale

, N. Gupta, P. Kumar, L. V. S.

Lakshmanan

, R. Ng and

B.

Aditya

Prakash

) –

In IEEE ICDE 2007, Istanbul, Turkey.

*

**

*

*

*

*

*

Graph Analytics wkshp

154

B. A. Prakash; C. FaloutsosSlide155

AcknowledgementsCollaborators

Christos Faloutsos

Roni Rosenfeld, Michalis Faloutsos

, Lada Adamic

, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu, Varun

Gupta, Jilles

Vreeken,

Deepayan

Chakrabarti

,

Hanghang

Tong,

Kunal

Punera

,

Ashwin

Sridharan

,

Sridhar

Machiraju

,

Mukund

Seshadri

,

Alice

Zheng

,

Lei Li,

Polo

Chau

,

Nicholas

Valler

,

Alex

Beutel

,

Xuetao

Wei

155

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide156

AcknowledgementsFunding

156

Graph Analytics wkshp

B. A. Prakash; C. FaloutsosSlide157

Analysis

Policy/Action

Data

Dynamical Processes on Large Networks

B. Aditya Prakash

Christos Faloutsos

157

Graph Analytics wkshp

B. A. Prakash; C. Faloutsos