B Aditya Prakash Carnegie Mellon University Virginia Tech Christos Faloutsos Carnegie Mellon University Graph Analytics wkshp B A Prakash C Faloutsos From VLDB12 tutorial ID: 758341
Download Presentation The PPT/PDF document "Understanding and Managing Cascades on L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Understanding and Managing Cascades on Large Graphs
B.
Aditya
PrakashCarnegie Mellon University Virginia Tech. Christos FaloutsosCarnegie Mellon University
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide2
From: VLDB’12 tutorialhttp://vldb.org/pvldb/vol5/p2024_badityaprakash_vldb2012.pdf
Graph Analytics wkshpB. A. Prakash; C. Faloutsos
2B. Aditya PrakashSlide3
Networks are everywhere!
Human Disease Network [Barabasi 2007]
Gene Regulatory Network [Decourty 2008]
Facebook
Network [2010]The Internet [2005]
Graph Analytics wkshp
3
B. A. Prakash; C. FaloutsosSlide4
4
Dynamical Processes over networks are also everywhere!
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide5
Why do we care?
Social collaboration
Information DiffusionViral MarketingEpidemiology and Public HealthCyber SecurityHuman mobility Games and Virtual Worlds Ecology........
B. A. Prakash; C. Faloutsos
5
Graph Analytics wkshpSlide6
Why do we care? (1: Epidemiology)Dynamical Processes over networks
[AJPH 2007]
CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts
Diseases over contact networks
6
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide7
Why do we care? (1: Epidemiology)Dynamical Processes over networks
Hospital
Another Hospital
Drug-resistant Bacteria (like XDR-TB)
7
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide8
Why do we care? (1: Epidemiology)Dynamical Processes over networks
Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred
[US-MEDICARE NETWORK 2005]
8Problem: Given k units of disinfectant, whom to immunize?Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide9
Why do we care? (1: Epidemiology)
CURRENT PRACTICE
OUR METHOD
~6x fewer!
[US-MEDICARE NETWORK 2005]9Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide10
Why do we care? (2: Online Diffusion)10
> 800m users, ~$1B revenue
[WSJ 2010]
~100m active users
> 50m usersGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide11
Why do we care? (2: Online Diffusion) Dynamical Processes over networks
Celebrity
Buy Versace™!
Followers
11
Social Media Marketing
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide12
Why do we care? (3: To change the world?)Dynamical Processes over networks
Social networks and Collaborative Action
12Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide13
High Impact – Multiple Settings
Q. How to squash
rumors
faster?
Q. How do opinions spread?Q. How to market better?
13
epidemic out-breaks
products/viruses
transmit s/w patches
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide14
Research Theme
DATA
Large real-world networks & processes
14
ANALYSIS
Understanding
POLICY/ ACTION
Managing
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide15
Research Theme –
Public Health
DATA
Modeling # patient transfers
ANALYSIS
Will an epidemic happen?
POLICY/ ACTION
How to control out-breaks?
Graph Analytics wkshp
15
B. A. Prakash; C. FaloutsosSlide16
Research Theme –
Social Media
DATA
Modeling Tweets spreading
POLICY/ ACTION
How to market better?
ANALYSIS
# cascades in future?
Graph Analytics wkshp
16
B. A. Prakash; C. FaloutsosSlide17
In this tutorial17
ANALYSIS
Understanding
Given propagation models:
Q1: Will an epidemic happen? Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide18
In this tutorial18
Q2: How to immunize and control out-breaks better?
POLICY/ ACTION
Managing
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide19
In this tutorial19
DATA
Large real-world networks & processes
Q3: How do #
hashtags spread?Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide20
OutlineMotivationPart 1: Understanding Epidemics
(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models
(Empirical Studies)Conclusion20Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide21
Part 1: TheoryQ1: What is the epidemic threshold?Q2: How do viruses compete?
21Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide22
A fundamental question
Strong Virus
Epidemic?
22
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide23
example (static graph)
Weak Virus
Epidemic?
23
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide24
Problem Statement
Find, a condition under which virus will die out exponentially quickly
regardless of initial infection condition
above (epidemic)
b
elow (extinction)
# Infected
time
24
Separate the regimes?
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide25
Threshold (static version)Problem Statement
Given: Graph G, and
Virus specs (attack prob. etc.)Find: A condition for virus extinction/invasion25
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide26
Threshold: Why important?Accelerating simulationsForecasting (‘What-if’ scenarios)
Design of contagion and/or topologyA great handle to manipulate the spreadingImmunizationMaximize collaboration…..
26
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide27
Part 1: TheoryQ1: What is the epidemic threshold?Background
Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?
27
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide28
“SIR” model: life immunity (mumps)
Each node in the graph is in one of three statesSusceptible (i.e. healthy)I
nfectedRemoved (i.e. can’t get infected again)28
Prob.
β
Prob.
δ
t = 1
t = 2
t = 3
Background
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide29
Terminology: continued
Other virus propagation models (“VPM”)SIS : susceptible-infected-susceptible, flu-likeSIRS : temporary immunity, like
pertussisSEIR : mumps-like, with virus incubation (E = Exposed)….………….Underlying contact-network – ‘who-can-infect-whom’
29
BackgroundGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide30
Related Work
R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991.A. Barrat, M.
Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press, 2010.F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227, 1969.D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C.
Faloutsos. Epidemic thresholds in real networks. ACM TISSEC, 10(4), 2008.D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM, 2005.Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003.H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000.H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics, 46, 1984.
J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991.
J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy, 1993.
R. Pastor-Santorras
and A.
Vespignani
. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14, 2001.
………
………
………
All are about
either
:
Structured topologies
(cliques, block-diagonals, hierarchies, random)
Specific virus propagation models
Static graphs
30
Background
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide31
Part 1: TheoryQ1: What is the epidemic threshold?Background
Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?
31
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide32
How should the answer look like?Answer should depend on:
GraphVirus Propagation Model (VPM)But how??Graph – average degree? max. degree? diameter?VPM – which parameters?
How to combine – linear? quadratic? exponential?
32
…..
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide33
Static Graphs: Our Main ResultInformally,
33
For, any arbitrary topology
(adjacency matrix A) any virus propagation model (VPM) in
standard literature
the
epidemic threshold depends only
on the
λ
,
first
eigenvalue
of
A
,
and
some
constant
, determined
by the virus propagation
model
λ
No epidemic if
λ
* < 1
In Prakash+ ICDM 2011 (Selected among
best papers
).
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide34
Our thresholds for some modelss = effective strength
s < 1 : below threshold
ModelsEffective Strength (s)Threshold (tipping point)SIS, SIR, SIRS, SEIR
s = λ .
s = 1 SIV, SEIVs = λ .
(
H.I.V.
)
s =
λ
.
34
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide35
Our result: Intuition for λ“Official” definition:
Let A be the adjacency matrix. Then
λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)].Doesn’t give much intuition!
“Un-official” Intuition
λ ~ # paths in the graph35
u
u
≈
.
(
i
, j) = # of paths
i
j of length k
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide36
N nodes
Largest
Eigenvalue
(
λ
)
λ ≈ 2
λ = N
λ = N-1
36
N =
1000
λ
≈ 2
λ
= 31.67
λ
= 999
better connectivity higher
λ
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide37
Examples: Simulations – SIR (mumps)
(a) Infection profile (b) “Take-off” plotPORTLAND graph31 million links, 6 million
nodes
Fraction of InfectionsFootprintEffective Strength
Time ticks
37
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide38
Examples: Simulations – SIRS (pertusis)
Fraction of Infections
Footprint
Effective Strength
Time ticks
(a) Infection profile (b) “Take-off” plotPORTLAND graph
31 million links, 6 million
nodes
38
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide39
Part 1: TheoryQ1: What is the epidemic threshold?Background
Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?
39
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide40
λ
*
< 1
Graph-based
Model-based
40
Proof Sketch
General VPM structure
Topology and stability
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide41
Models and more models
ModelUsed for
SIRMumpsSISFluSIRS
PertussisSEIRChicken-pox
……..SICRTuberculosisMSIRMeaslesSIVSensor Stability
H.I.V.
……….
41
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide42
Ingredient 1: Our generalized model
42
Endogenous Transitions
Susceptible
Infected
Vigilant
Exogenous Transitions
Endogenous Transitions
Endogenous Transitions
Susceptible
Infected
Vigilant
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide43
Special case
Susceptible
Infected
Vigilant
43
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide44
Special case: H.I.V.
Multiple Infectious, Vigilant states
44
“Terminal”
“Non-terminal”Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide45
Ingredient 2: NLDS+StabilityView as a NLDS
discrete time non-linear dynamical system (NLDS)
Probability vector Specifies the state of the system
at time t
Details45
size
mN
x
1
.
.
.
.
.
size N
(number of nodes in the graph)
.
.
.
S
I
V
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide46
Ingredient 2: NLDS + StabilityView as a NLDSdiscrete time
non-linear dynamical system (NLDS)
Non-linear functionExplicitly gives the evolution of system
Details
46
size
mN
x
1
.
.
.
.
.
.
.
.
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide47
Ingredient 2: NLDS + StabilityView as a NLDSdiscrete time
non-linear dynamical system (NLDS)Threshold Stability of NLDS
47
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide48
=
probability that node
i is not attacked by any of its infectious neighbors
Special case: SIR
size 3N x 1I
R
S
NLDS
I
R
S
Details
48
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide49
Fixed Point
1
1
.
00.00.State when no node is infected
Q: Is it stable?
Details
49
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide50
Stability for SIR
Stable
under thresholdUnstableabove threshold
50
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide51
λ
*
< 1
Graph-based
Model-based
51
General VPM structure
Topology and stability
See paper for full proof
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide52
Part 1: TheoryQ1: What is the epidemic threshold?Background
Result and Intuition (Static Graphs)Proof Ideas (Static Graphs)Bonus: Dynamic GraphsQ2: How do viruses compete?
52
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide53
Dynamic Graphs: Epidemic?
adjacency matrix
8
8
Alternating behaviors
DAY
(e.g., work)
53
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide54
adjacency matrix
8
8
Dynamic Graphs: Epidemic?
Alternating behaviors
NIGHT
(e.g., home)
54
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide55
SIS modelrecovery rate δ
infection rate β
Set of T arbitrary graphs
Model Description
day
N
N
night
N
N
, weekend…..
Infected
Healthy
X
N1
N3
N2
Prob.
β
Prob.
β
Prob.
δ
55
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide56
Informall
y, NO epidemic if
eig (S) = < 1Our result: Dynamic Graphs Threshold
Single number!
Largest eigenvalue of The system matrix S
In
Prakash+, ECML-PKDD 2010
56
S =
Details
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide57
Synthetic
MIT Reality Mining
log(fraction infected)
Time
BELOW
AT
ABOVE
ABOVE
AT
BELOW
Infection-profile
57
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide58
“Take-off” plots
Footprint (# infected @ “steady state”)
Our threshold
Our threshold
(log scale)
NO EPIDEMIC
EPIDEMIC
EPIDEMIC
NO EPIDEMIC
Synthetic
MIT Reality
58
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide59
Part 1: TheoryQ1: What is the epidemic threshold?Q2: What happens when viruses compete?
Mutually-exclusive virusesInteracting viruses
59Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide60
Competing Contagions
iPhone v Android
Blu-ray v
HD-DVD
Biological common flu/avian flu, pneumococcal inf etc60Attack
Retreat
v
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide61
A simple modelModified flu-like Mutual Immunity (“pick one of the two”)
Susceptible-Infected1-Infected2-Susceptible61
Virus 1
Virus 2
Details
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide62
Question: What happens in the end?
62
green
: virus 1
red
: virus 2
Footprint @ Steady State
Footprint @ Steady State
= ?
Number of Infections
ASSUME:
Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide63
Question: What happens in the end?
63
green
: virus 1
red
: virus 2
Number of Infections
ASSUME:
Virus 1 is stronger than Virus 2
Strength
Strength
??
=
Strength
Strength
2
Footprint @ Steady State
Footprint @ Steady State
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide64
Answer: Winner-Takes-All
64
green: virus 1red: virus 2
ASSUME: Virus 1 is stronger than Virus 2
Number of InfectionsGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide65
Our Result: Winner-Takes-All65
In
Prakash+ WWW 2012Given our model, and any graph
, the weaker virus always dies-out completely
The stronger survives only if it is above threshold Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)Strength(Virus) = λ β / δ same as before!
Details
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide66
Real Examples66
Reddit
v Digg
Blu-Ray v
HD-DVD[Google Search Trends data]
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide67
Part 1: TheoryQ1: What is the epidemic threshold?Q2: What happens when viruses compete?
Mutually-exclusive virusesInteracting viruses
67Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide68
A simple model:
SI
1|2S
Modified flu-like (SIS) Susceptible-Infected1 or 2-SusceptibleInteraction Factor
εFull Mutual Immunity: ε = 0Partial Mutual Immunity (competition): ε < 0Cooperation: ε > 0
68
Virus 1
Virus 2
&
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide69
Question: What happens in the end?69
ASSUME: Virus 1 is stronger than Virus 2
ε = 0Winner takes all
ε = 1Co-exist independently
ε
= 2
Viruses cooperate
What about for
0 <
ε
<1
?
Is there a point at which both viruses can
co-exist
?
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide70
Answer: Yes!
There is a phase transition
70ASSUME: Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide71
Answer: Yes! There is a phase transition
71ASSUME:
Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide72
Answer: Yes! There is a phase transition
72ASSUME:
Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide73
The stronger survives only if it is above threshold Virus 1 is stronger than Virus 2, if:
strength(Virus 1) > strength(Virus 2)Strength(Virus) σ = N
β / δOur Result: Viruses can Co-exist73
Given our model and a fully connected graph, there exists an ε
critical such that for ε ≥ εcritical, there is a fixed point where both viruses survive.Details
In
Beutel+ KDD 2012
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide74
Real Examples
74
Hulu
v Blockbuster
[Google Search Trends data]
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide75
Real Examples
75
Chrome
v Firefox
[Google Search Trends data]
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide76
OutlineMotivationPart 1: Understanding Epidemics
(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models
(Empirical Studies)Conclusion76Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide77
Part 2: AlgorithmsQ3: Whom to immunize?Q4: How to detect outbreaks?
Q5: Who are the culprits?77
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide78
?
?
Given
:
a graph
A
, virus prop. model and budget
k
;
Find
:
k
‘best’ nodes for immunization (removal)
.
k = 2
?
?
Full Static Immunization
78
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide79
Part 2: AlgorithmsQ3: Whom to immunize?Full Immunization (Static Graphs)
Full Immunization (Dynamic Graphs)Fractional ImmunizationQ4: How to detect outbreaks?Q5: Who are the culprits?
79
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide80
ChallengesGiven a graph
A, budget k, Q1
(Metric) How to measure the ‘shield-value’ for a set of nodes (S)? Q2 (Algorithm) How to find a set of
k nodes with highest ‘shield-value’?
80Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide81
Proposed vulnerability measure λ
Increasing λ
Increasing vulnerability
λ is the epidemic threshold
“Safe”
“Vulnerable”
“Deadly”
81
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide82
1
9
10
3
4
5
7
8
6
2
9
1
11
10
3
4
5
6
7
8
2
9
Original Graph
Without {2, 6}
Eigen-Drop(
S
)
Δ
λ
= λ -
λ
s
Δ
A1
: “Eigen-Drop”: an ideal shield value
82
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide83
(Q2) - Direct Algorithm too expensive!Immunize k
nodes which maximize Δ λ
S = argmax Δ λCombinatorial!Complexity:Example: 1,000 nodes, with 10,000 edges
It takes 0.01 seconds to compute λIt takes
2,615 years to find 5-best nodes! 83
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide84
A2: Our SolutionPart 1: Shield Value
Carefully approximate Eigen-drop (Δ λ)Matrix perturbation
theoryPart 2: AlgorithmGreedily pick best node at each stepNear-optimal due to submodularityNetShield (linear complexity)O(nk
2+m) n = # nodes; m = # edges
84In Tong, Prakash+ ICDM 2010Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide85
Our Solution: Part 1Approximate Eigen-drop (
Δ λ) Δ
λ ≈ SV(S) =Result using Matrix perturbation theoryu(
i) == ‘eigenscore’
~~ pagerank(i)A
u
=
λ
.
u
u(
i
)
Details
85
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide86
P1:
node importance
P2:
set diversity
Original Graph
Select by P1
Select by P1+P2
Details
86
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide87
Our Solution: Part 2: NetShield
We prove that: SV(S) is sub-modular (& monotone non-decreasing)
NetShield: Greedily add best node at each step
Corollary: Greedy algorithm works
1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk
2+m)
Footnote: near-optimal means SV(
S
NetShield
) >= (1-1/e) SV(
S
Opt
)
87
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide88
Experiment: Immunization quality
Log(fraction of
infected nodes)
NetShield
Degree
PageRank
Eigs
(=HITS)
Acquaintance
Betweeness
(shortest path)
Lower is better
Time
88
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide89
Part 2: AlgorithmsQ3: Whom to immunize?Full Immunization (Static Graphs)
Full Immunization (Dynamic Graphs)Fractional ImmunizationQ4: How to detect outbreaks?Q5: Who are the culprits?
89
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide90
Full Dynamic Immunization Given: Set of
T arbitrary graphs Find: k
‘best’ nodes to immunize (remove)
day
NN
night
N
N
, weekend…..
In Prakash+ ECML-PKDD 2010
90
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide91
Full Dynamic Immunization Our solutionRecall theorem
Simple: reduce (= )Goal: max eigendrop ΔNo competing policy for comparisonWe propose and evaluate many policies
Matrix Product
Δ =
day
night
91
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide92
Performance of Policies
MIT Reality Mining
Lower is better
92
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide93
Part 2: AlgorithmsQ3: Whom to immunize?Full Immunization (Static Graphs)
Full Immunization (Dynamic Graphs)Fractional ImmunizationQ4: How to detect outbreaks?Q5: Who are the culprits?
93
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide94
94Fractional Immunization of Networks
B. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.),
Hanghang Tong, Christos FaloutsosUnder Submission Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide95
?
?
Given
:
a graph
A
, virus prop. model and budget
k
;
Find
:
k
‘best’ nodes for immunization (removal)
.
k = 2
Full Static Immunization
95
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide96
Fractional Asymmetric Immunization96
Fractional Effect [ f(x) = ]Asymmetric Effect
# antidotes = 3
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide97
Fractional Asymmetric Immunization97
Fractional Effect [ f(x) = ]Asymmetric Effect
# antidotes = 3
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide98
Fractional Asymmetric Immunization98
Fractional Effect [ f(x) = ]Asymmetric Effect
# antidotes = 3
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide99
Fractional Asymmetric Immunization
Hospital
Another Hospital
99
Drug-resistant Bacteria (like XDR-TB)
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide100
Fractional Asymmetric Immunization
Hospital
Another Hospital
Drug-resistant Bacteria (like XDR-TB)
100
=
f
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide101
Fractional Asymmetric Immunization
Hospital
Another Hospital
101
Problem
: Given k units of disinfectant, how to distribute them to maximize hospitals saved?
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide102
Our Algorithm “SMART-ALLOC”
CURRENT PRACTICE
SMART-ALLOC
[US-MEDICARE NETWORK 2005]
102 Each circle is a hospital, ~3000 hospitals More than 30,000 patients transferred ~6x fewer!Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide103
Running Time103
≈
Simulations
SMART-ALLOC
> 1 week
14
secs
> 30,000x speed-up!
Wall-Clock Time
Lower is better
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide104
Experiments104
K = 200
K = 2000
PENN-NETWORK
SECOND-LIFE
~5 x
~2.5 x
Lower is better
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide105
Part 2: AlgorithmsQ3: Whom to immunize?Q4: How to detect outbreaks?
Q5: Who are the culprits?105
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide106
Outbreak detectionProblems of finding sources of contamination in water networks and finding “hot” stories on blogs are isomorphic.
Minimize time to detection, population affectedMaximize probability of detection.Minimize sensor placement cost.
Blogs
Posts
Links
Information cascade
Graph Analytics wkshp
106
B. A. Prakash; C. FaloutsosSlide107
J. Leskovec, A. Krause, C. Guestrin
, C. Faloutsos, J. VanBriesen, N. Glance. "Cost-effective Outbreak Detection in Networks” KDD 2007
107Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide108
CELF: Main idea
Given a graph G(V,E)
and a budget of B sensors and data on how contaminations spread over the network: for each contamination i we know the time
T(i
, u) when it contaminated node uMinimize time to detect outbreakCELF algorithm uses submodularity and lazy evaluationGraph Analytics wkshp108
B. A. Prakash; C. FaloutsosSlide109
Blogs: Comparison to heuristics
Benefit
(higher=better)Graph Analytics wkshp109B. A. Prakash; C. FaloutsosSlide110
k PA score Blog NP IL OLO OLA1 0.1283 http://instapundit.com 4593
4636 1890 52552 0.1822 http://donsurber.blogspot.com 1534 1206 679 34953 0.2224 http://sciencepolitics.blogspot.com 924 576 888 2701
4 0.2592 http://www.watcherofweasels.com 261 941 1733 36305 0.2923 http://michellemalkin.com 1839 12642 1179 63236 0.3152 http://blogometer.nationaljournal.com 189 2313 3669 92727 0.3353 http://themodulator.org 475 717 1844 49448 0.3508
http://www.bloggersblog.com 895 247 1244 102019 0.3654 http://www.boingboing.net
5776 6337 1024 618310 0.3778 http://atrios.blogspot.com 4682 3205 795 3102“Best 10 blogs to read”NP - number of posts, IL- in-links, OLO- blog out links, OLA- all out links
Graph Analytics wkshp
110
B. A. Prakash; C. FaloutsosSlide111
Part 2: AlgorithmsQ3: Whom to immunize?Q4: How to detect outbreaks?
Q5: Who are the culprits?111
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide112
B. Aditya Prakash, Jilles Vreeken
, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ Under Submission112
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide113
Problem definition113
2-d grid‘+’ -> infectedWho started it?
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide114
Problem definition114
2-d grid‘+’ -> infectedWho started it?
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide115
Culprits: Exoneration115
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide116
Who are the culpritsTwo-part solutionuse MDL for number
of seedsfor a given number:exoneration = centrality + penaltyour method uses smallest eigenvector of
Laplacian submatrixRunning time =linear! (in edges and nodes) 116
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide117
Culprits: Results117
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide118
OutlineMotivationPart 1: Understanding Epidemics
(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models
(Empirical Studies)Conclusion118Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide119
Part 3: Empirical StudiesQ6: How do cascades look like?Q7: How does activity evolve over time?
Q8: How does external influence act?119
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide120
B. A. Prakash; C. FaloutsosCascading Behavior in Large Blog Graphs
How does information propagate
over the blogosphere?
Blogs
PostsLinks
Information cascade
J.
Leskovec
,
M.McGlohon
, C.
Faloutsos
, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.
Graph Analytics wkshp
120Slide121
3 - 121B. A. Prakash; C. Faloutsos
Cascades on the Blogosphere
Cascade
is graph induced by a time ordered propagation of information (edges)
CascadesB1
B
2
B
4
B
3
a
b
c
d
e
B
1
B
2
B
4
B
3
1
1
2
1
3
1
d
e
b
c
e
a
Blogosphere
blogs + posts
Blog network
links among blogs
Post network
links among posts
Graph Analytics wkshpSlide122
3 - 122B. A. Prakash; C. Faloutsos
Blog data
45,000 blogs participating in cascades
All their posts for 3 months (Aug-Sept ‘05)
2.4 million
posts
~5 million links (245,404 inside the dataset)
Time [1 day]
Number of posts
Number of posts
Graph Analytics wkshpSlide123
Popularity over time
Post popularity drops-off – exponentially?
lag: days after post
# in links
1
2
3
@t
@t +
lag
Graph Analytics wkshp
123
B. A. Prakash; C. FaloutsosSlide124
Popularity over time
Post popularity drops-off – exponentially?
POWER LAW!Exponent?
# in links
(log)
days after post
(
log
)
Graph Analytics wkshp
124
B. A. Prakash; C. FaloutsosSlide125
Popularity over time
Post popularity drops-off – exponentially?
POWER LAW!
Exponent? -1.6 close to -1.5: Barabasi’s stack model
and like the zero-crossings of a random walk# in links(log)
-1.6
days after post
(
log
)
Graph Analytics wkshp
125
B. A. Prakash; C. FaloutsosSlide126
B. A. Prakash; C. Faloutsos
-1.5 slopeJ. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein.
Nature 437, 1251 (2005) . [PDF]
Graph Analytics wkshp
126Slide127
3 - 127B. A. Prakash; C. Faloutsos
Topological patterns: Cascades
Procedure for gathering cascades:
Find all initiators (nodes with out-degree 0)
Follow in-links
Produces directed acyclic graph
Count cascade shapes (use our multi-level graph isomorphism testing algorithm)
a
b
c
d
e
a
b
c
d
e
d
e
b
c
e
a
Graph Analytics wkshpSlide128
Topological Observations
How do we measure how information flows through the network?
Common cascade shapes extracted using algorithms in
[Leskovec, Singh, Kleinberg; PAKDD 2006].
Graph Analytics wkshp128B. A. Prakash; C. FaloutsosSlide129
B. A. Prakash; C. Faloutsos
Topological Observations
Cascade size distributions also follow power law.
What graph properties do cascades exhibit?
Observation 2: The probability of observing a cascade on n nodes follows a Zipf distribution:
p(n)
∝
n
-2
Cascade size (# of nodes)
Count
a=-2
Graph Analytics wkshp
129Slide130
Topological Observations
What graph properties do cascades exhibit?
Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5).
Size of chain (# nodes)
Count
Size of star (# nodes)
Count
a=-3.1
a=-8.5
Graph Analytics wkshp
130
B. A. Prakash; C. FaloutsosSlide131
Blogs and structure
Cascades take on different shapes (sorted by frequency):
How can we use cascades
to identify communities?
Graph Analytics wkshp131B. A. Prakash; C. FaloutsosSlide132
PCA on cascade types
Perform PCA on sparse matrix.
Use log(count+1)
Project onto 2 PC…
.01
…
.07
.67
…
1.1
2.1
…
5.1
…
4.2
…
.07
3.4
1.1
3.2
boingboing
.09
2.1
4.6
slashdot
…………
~9,000 cascade types
~44,000 blogs
Graph Analytics wkshp
132
B. A. Prakash; C. FaloutsosSlide133
28
PCA on cascade types
Observation: Content of blogs and cascade behavior are often related.
Distinct clusters for
“conservative”
and
“humorous”
blogs (hand-labeling).
M.
McGlohon
, J.
Leskovec
, C.
Faloutsos
, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.
Graph Analytics wkshp
133
B. A. Prakash; C. FaloutsosSlide134
29
PCA on cascade types
Observation: Content of blogs and cascade behavior are often related.
Distinct clusters for
“conservative”
and
“humorous”
blogs (hand-labeling).
M.
McGlohon
, J.
Leskovec
, C.
Faloutsos
, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.
Graph Analytics wkshp
134
B. A. Prakash; C. FaloutsosSlide135
Part 3: Empirical StudiesQ6: How do cascades look like?Q7: How does activity evolve over time?
Q8: How does external influence act?135
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide136
Meme (# of mentions in blogs)short phrases Sourced from U.S. politics in 2008
136
“you can put lipstick on a pig”
“yes we can”
Rise and fall patterns in social mediaGraph Analytics wkshpB. A. Prakash; C. FaloutsosSlide137
Rise and fall patterns in social media
137
Can we find a unifying model, which includes these patterns?
four
classes on YouTube [Crane et al. ’08]
six
classes on Meme [Yang et al. ’11]
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide138
Rise and fall patterns in social media
138Answer: YES!
We can represent all patterns
by single model
In
Matusbara
+ SIGKDD 2012
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide139
Main idea - SpikeM
1. Un-informed bloggers (uninformed about rumor)
2. External shock at time nb (e.g, breaking news)
3. Infection (word-of-mouth)
139Infectiveness of a blog-post at age n: Strength of infection (quality of news)
Decay function
(how infective a blog posting is)
Time n=0
Time n=
n
b
Time n=n
b
+1
β
Power Law
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide140
SpikeM - with periodicityFull equation of SpikeM
140
Periodicity
12pm
Peak activity
3am
Low activity
Time n
Bloggers change their activity over time
(e.g., daily, weekly, yearly)
activity
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide141
DetailsAnalysis –
exponential rise and power-raw fall
141
Liner-log
Log-log
Rise-part
SI
-> exponential
SpikeM
-> exponential
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide142
DetailsAnalysis –
exponential rise and power-raw fall
142
Liner-log
Log-log
Fall-part
SI -> exponential
SpikeM
-> power law
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide143
Tail-part forecasts
143SpikeM
can capture tail part
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide144
“What-if” forecasting
144
e.g., given (1) first spike,
(2) release date of two sequel movies (3) access volume before the release date?
?
(1) First spike
(2) Release date
(3) Two weeks before release
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide145
“What-if” forecasting
145SpikeM can forecast not only tail-part, but also rise-part!
SpikeM
can forecast upcoming spikes(1) First spike(2) Release date
(3) Two weeks before release
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide146
Part 3: Empirical StudiesQ6: How do cascades look like?Q7: How does activity evolve over time?
Q8: How does external influence act?146
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide147
Tweets Diffusion: Problem DefinitionGiven: Action log of people tweeting a #
hashtagA network of usersFind:How external influence varies with #hashtags
?
?
?
?
?
??
??
147
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide148
Tweet Diffusion: DataYahoo! Twitter
firehoseMore than 750 million tweets (> 10 Tera-bytes)
Test-bed of > 6000 machinesHadoop+PIG system ver 0.20.204.0 Took top 500 hashtags (by volume) in Feb 2011
Network of users:connecting user X to user Y if X directed at least 3 @-messages to Y (or RT-ed a tweet)
148Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide149
Tweet DiffusionPropagation = Influence + ExternalDeveloped a model
takes the previous observations into accountwith parameters representing external influenceLearn from previous dataEM-style alternating minimizing
algorithmGroup tags according to learnt params 149
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide150
Results: External Influence vs Time
time
“External Effects”
#
nowwatching
, #
nowplaying
, #
epictweets
#
purpleglasses
, #
brits
, #
famouslies
#
oscar
, #25jan
#
openfollow
, #
ihatequotes
, #
tweetmyjobs
Can also use for Forecasting, Anomaly Detection!
Bursty
, external events
“Word-of-mouth” Not trending
Long-running tags
“Word-of-mouth”
150
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide151
OutlineMotivationPart 1: Understanding Epidemics
(Theory)Part 2: Policy and Action (Algorithms)Part 3: Learning Models
(Empirical Studies)Conclusion151Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide152
ConclusionsEpidemic ThresholdIt’s the
EigenvalueFast ImmunizationMax. drop in eigenvalue
, linear-time near-optimal algorithmBursts: SpikeM modelExponential growth, Power-law decay
152
Graph Analytics wkshpB. A. Prakash; C. FaloutsosSlide153
ML & Stats.
Comp. Systems
Theory &
Algo
.
Biology
Econ.
Social Science
Physics
153
Propagation on Networks
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide154
Publications
Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B.
Aditya Prakash, Alex Beutel, Roni
Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon
Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis
Faloutsos, Nicholas Valler, Christos
Faloutsos)
- In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal
Best Papers of ICDM
.)
Times Series Clustering: Complex is Simpler! (Lei Li
, B.
Aditya
Prakash
)
- In ICML 2011, Bellevue
Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas
Valler
, B.
Aditya
Prakash
,
Hanghang
Tong,
Michalis
Faloutsos
and Christos
Faloutsos
) –
In IEEE NETWORKING 2011, Valencia, Spain
Formalizing the BGP stability problem: patterns and a chaotic model
(B.
Aditya
Prakash
,
Michalis
Faloutsos
and Christos
Faloutsos
)
– In IEEE INFOCOM
NetSciCom
Workshop, 2011.
On the Vulnerability of Large Graphs (
Hanghang
Tong,
B.
Aditya
Prakash
, Tina
Eliassi-Rad
and Christos
Faloutsos
) – In IEEE ICDM 2010, Sydney, Australia
Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms
(B. Aditya
Prakash, Hanghang
Tong, Nicholas Valler, Michalis
Faloutsos and Christos Faloutsos)
– In ECML-PKDD 2010, Barcelona, Spain
MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (
Keith Henderson, Tina Eliassi-Rad
, Christos Faloutsos, Leman
Akoglu, Lei Li, Koji Maruhashi
, B. Aditya
Prakash
and Hanghang
Tong) - In SIGKDD 2010, Washington D.C.
Parsimonious Linear Fingerprinting for Time Series (Lei Li,
B. Aditya
Prakash and Christos Faloutsos
) - In VLDB 2010, SingaporeEigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (
B.
Aditya
Prakash
,
Ashwin
Sridharan
,
Mukund
Seshadri
, Sridhar
Machiraju
and Christos
Faloutsos
) –
In PAKDD 2010, Hyderabad, India
BGP-lens: Patterns and Anomalies in Internet-Routing Updates (
B.
Aditya
Prakash
, Nicholas
Valler
, David Andersen,
Michalis
Faloutsos
and Christos
Faloutsos
)
– In ACM SIGKDD 2009, Paris, France.
Surprising Patterns and Scalable Community Detection in Large Graphs (
B.
Aditya
Prakash
,
Ashwin
Sridharan
,
Mukund
Seshadri
, Sridhar
Machiraju
and Christos
Faloutsos
) –
In IEEE ICDM Large Data Workshop 2009, Miami
FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining
(
Shipra
Agarwal
,
Jayant
R.
Haritsa
and
B.
Aditya
Prakash
) –
In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes
Gehrke
.
Complex Group-By Queries For XML (C.
Gokhale
, N. Gupta, P. Kumar, L. V. S.
Lakshmanan
, R. Ng and
B.
Aditya
Prakash
) –
In IEEE ICDE 2007, Istanbul, Turkey.
*
**
*
*
*
*
*
Graph Analytics wkshp
154
B. A. Prakash; C. FaloutsosSlide155
AcknowledgementsCollaborators
Christos Faloutsos
Roni Rosenfeld, Michalis Faloutsos
, Lada Adamic
, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu, Varun
Gupta, Jilles
Vreeken,
Deepayan
Chakrabarti
,
Hanghang
Tong,
Kunal
Punera
,
Ashwin
Sridharan
,
Sridhar
Machiraju
,
Mukund
Seshadri
,
Alice
Zheng
,
Lei Li,
Polo
Chau
,
Nicholas
Valler
,
Alex
Beutel
,
Xuetao
Wei
155
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide156
AcknowledgementsFunding
156
Graph Analytics wkshp
B. A. Prakash; C. FaloutsosSlide157
Analysis
Policy/Action
Data
Dynamical Processes on Large Networks
B. Aditya Prakash
Christos Faloutsos
157
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos