/
Mining Large Graphs: Spectral Methods, Tensors and Influenc Mining Large Graphs: Spectral Methods, Tensors and Influenc

Mining Large Graphs: Spectral Methods, Tensors and Influenc - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
383 views
Uploaded On 2017-05-30

Mining Large Graphs: Spectral Methods, Tensors and Influenc - PPT Presentation

Christos Faloutsos CMU Thanks Alex Smola Jia Yu Tim Pan Google June 2013 C Faloutsos CMU 2 C Faloutsos CMU 3 Roadmap Graph problems G1 Fraud detection BP G2 Botnet ID: 554188

cmu faloutsos june google faloutsos cmu google june 2013 detection graph tensors time graphs eigenspokes propagation fraud rise patterns

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Mining Large Graphs: Spectral Methods, T..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Mining Large Graphs: Spectral Methods, Tensors and Influence propagation

Christos Faloutsos

CMUSlide2

Thanks

Alex

Smola

Jia Yu (Tim) Pan

Google, June 2013

C. Faloutsos (CMU)

2Slide3

C. Faloutsos (CMU)

3

Roadmap

Graph problems:

G1: Fraud detection – BP

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

C1:

spikeM

modelConclusions

Google, June 2013Slide4

Google, June 2013

C. Faloutsos (CMU)

4

E-bay Fraud detection

w/ Polo Chau &

Shashank Pandit, CMU

[www’07]Slide5

Google, June 2013

C. Faloutsos (CMU)

5

E-bay Fraud detectionSlide6

Google, June 2013

C. Faloutsos (CMU)

6

E-bay Fraud detectionSlide7

Google, June 2013

C. Faloutsos (CMU)

7

E-bay Fraud detection - NetProbeSlide8

Google, June 2013

C. Faloutsos (CMU)

8

E-bay Fraud detection - NetProbe

F

A

H

F

99%

A

99%

H

49%

49%

Compatibility

matrix

heterophily

detailsSlide9

C. Faloutsos (CMU)

9

Background 1:

Belief Propagation Equations

[Pearl ‘82][Yedidia+ ‘02]

…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

Google, June 2013

~b

i

(x

i

)Slide10

Popular press

And less desirable attention:

E-mail from ‘Belgium police’ (‘copy of your code?’)

Google, June 2013

C. Faloutsos (CMU)

10Slide11

C. Faloutsos (CMU)

11

Roadmap

Graph problems:

G1: Fraud detection – BP

Ebay

Symantec

Unification

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

Conclusions

Google, June 2013Slide12

Polo Chau

Machine Learning Dept

Carey Nachenberg

Vice President & Fellow

Jeffrey Wilhelm

Principal Software Engineer

Adam Wright

Software Engineer

Prof. Christos Faloutsos

Computer Science Dept

Polonium:

Tera

-Scale Graph Mining and Inference for Malware Detection

PATENT PENDING

SDM 2011, Mesa, ArizonaSlide13

Polonium: The Data

60+ terabytes

of data

anonymously

contributed

by participants of worldwide Norton Community Watch

program

50+ million

machines

900+ million

executable files

Constructed a machine-file bipartite graph (0.2 TB+)

1 billion

nodes (machines and files)

37 billion

edges

Google, June 2013

13

C. Faloutsos (CMU)Slide14

Polonium: Key Ideas

Use

“guilt-by-association”

(i.e.,

homophily

)

E.g., files that appear on machines with many bad files are more likely to be badScalability

:

handles 37 billion-edge graph

Google, June 2013

14

C. Faloutsos (CMU)Slide15

Polonium: One-Interaction Results

84.9%

True Positive Rate

1%

False Positive Rate

True Positive Rate

% of malware

correctly identified

False Positive Rate

% of non-malware wrongly labeled as malware

15

Ideal

Google, June 2013

C. Faloutsos (CMU)Slide16

C. Faloutsos (CMU)

16

Roadmap

Graph problems:

G1: Fraud detection – BP

Ebay

Symantec

Unification

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

Conclusions

Google, June 2013Slide17

Unifying Guilt-by-Association Approaches:

Theorems and Fast Algorithms

Danai Koutra

U Kang

Hsing-Kuo Kenneth Pao

Tai-You KeDuen Horng (Polo) ChauChristos Faloutsos

ECML PKDD, 5-9 September 2011, Athens, GreeceSlide18

Problem Definition:G

B

A techniques

C. Faloutsos (CMU)

18

Given

: Graph; & few labeled nodesFind

: labels of rest(assuming network effects)

?

?

?

?

Google, June 2013Slide19

Homophily and Heterophily

C. Faloutsos (CMU)

19

Step 1

Step 2

homophily

heterophily

All methods handle homophily

NOT

all methods handle heterophily

BUT

proposed method

does!

Google, June 2013Slide20

Are they related?

RWR (Random Walk with Restarts)

google’s pageRank (‘

if my friends are important, I’m important, too’)

SSL (Semi-supervised learning) minimize the differences among neighborsBP (Belief propagation)

send messages to neighbors, on what you believe about them

Google, June 2013C. Faloutsos (CMU)

20Slide21

Are they related?

RWR (Random Walk with Restarts)

google’s pageRank (‘

if my friends are important, I’m important, too’)

SSL (Semi-supervised learning) minimize the differences among neighborsBP (Belief propagation)

send messages to neighbors, on what you believe about them

Google, June 2013C. Faloutsos (CMU)

21

YES!Slide22

C. Faloutsos (CMU)

22

Background 1:

Belief Propagation Equations

[Pearl ‘82][Yedidia+ ‘02]

…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

Google, June 2013Slide23

Correspondence of Methods

C. Faloutsos (CMU)

23

Method

Matrix

Unknown

known

RWR

[

I

c

A

D

-1

]

×

x

=

(1-c)

y

SSL

[

I

+

a

(

D

- A)

] ×

x=

yF

ABP[

I + a

D

- c’A

] ×

bh

h

0 1 0

1 0 1

0 1 0

?

0

1 1

d1

d2

d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

Google, June 2013Slide24

Correspondence of Methods

C. Faloutsos (CMU)

24

Method

Matrix

Unknown

known

RWR

[

I

c

A

D

-1

]

×

x

=

(1-c)

y

SSL

[

I

+

a

(

D

- A)

] ×

x=

yF

ABP[

I + a

D

- c’A

] ×

bh

h

0 1 0

1 0 1

0 1 0

?

0

1 1

d1

d2

d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

Google, June 2013

We know when it converges!Slide25

Results: Scalability

C. Faloutsos (CMU)

25

F

A

BP is

linear

on the number of edges.

# of edges (Kronecker graphs)

runtime (min)

Google, June 2013Slide26

Results: Parallelism

C. Faloutsos (CMU)

26

F

A

BP

~2x faster

& wins/ties on

accuracy.

runtime (min)

% accuracy

Google, June 2013Slide27

C. Faloutsos (CMU)

27

Conclusions for BP

NetProbe

’, ‘Polonium’, and

belief propagation

: exploit network effects.

FaBP

: fast & accurate (and -> convergence conditions)

Google, June 2013Slide28

C. Faloutsos (CMU)

28

Roadmap

Graph problems:

G1: Fraud detection – BP

Ebay

Symantec

Unification

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

Conclusions

Google, June 2013Slide29

EigenSpokes

B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos:

EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs,

PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU)

29

Google, June 2013Slide30

EigenSpokes

Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph

)

30

C. Faloutsos (CMU)

Google, June 2013Slide31

EigenSpokes

Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph

)

31

C. Faloutsos (CMU)

Google, June 2013

N

N

detailsSlide32

EigenSpokes

Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph

)

32

C. Faloutsos (CMU)

Google, June 2013

N

N

detailsSlide33

EigenSpokes

Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph

)

33

C. Faloutsos (CMU)

Google, June 2013

N

N

detailsSlide34

EigenSpokes

Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph

)

34

C. Faloutsos (CMU)

Google, June 2013

N

N

detailsSlide35

EigenSpokes

EE plot:

Scatter plot of scores of u1 vs u2

One would expect

Many points @ originA few scattered ~randomly

C. Faloutsos (CMU)

35

u1

u2

Google, June 2013

1

st

Principal

component

2

nd

Principal

componentSlide36

EigenSpokes

EE plot:

Scatter plot of scores of u1 vs u2

One would expect

Many points @ originA few scattered ~randomly

C. Faloutsos (CMU)

36

u1

u2

90

o

Google, June 2013Slide37

EigenSpokes - pervasiveness

Present in mobile social graph

across time and space

Patent citation graph

37

C. Faloutsos (CMU)

Google, June 2013Slide38

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

38

C. Faloutsos (CMU)

Google, June 2013Slide39

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

39

C. Faloutsos (CMU)

Google, June 2013Slide40

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

40

C. Faloutsos (CMU)

Google, June 2013Slide41

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So

what?

Extract nodes with high s

cores

high connectivity

Good “communities”

spy plot of top 20 nodes

41

C. Faloutsos (CMU)

Google, June 2013Slide42

Bipartite Communities!

magnified bipartite community

patents from

same inventor(s)

`cut-and-paste’

bibliography!

42

C. Faloutsos (CMU)

Google, June 2013Slide43

(maybe, botnets?)

Victim IPs?

Botnet members?

43

C. Faloutsos (CMU)

Google, June 2013

Exploring

it

with Dr.

Eric Mao

(III-Taiwan)Slide44

C. Faloutsos (CMU)

44

Roadmap

Graph problems:

G1: Fraud detection – BP

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

Conclusions

Google, June 2013Slide45

GigaTensor

: Scaling Tensor Analysis Up By 100 Times

Algorithms

and Discoveries

U

Kang

Christos

Faloutsos

KDD’12

Evangelos

Papalexakis

Abhay

Harpale

Google, June 2013

45

C. Faloutsos (CMU)Slide46

Background: Tensors

Tensors (=multi-dimensional arrays) are everywhere

Hyperlinks &anchor text [Kolda+,05]

URL 1

URL 2

Anchor Text

Java

C++

C#

1

1

1

1

1

1

1

Google, June 2013

46

C. Faloutsos (CMU)Slide47

Background: Tensors

Tensors (=multi-dimensional arrays) are everywhere

Sensor stream (time, location, type)

Predicates (subject, verb, object) in knowledge base

Barack Obama

is

president

of U.S

.”

Eric Clapton

plays

guitar

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros

=144M

Google, June 2013

47

C. Faloutsos (CMU)Slide48

Background: Tensors

Tensors (=multi-dimensional arrays) are everywhere

Sensor stream (time, location, type)

Predicates (subject, verb, object) in knowledge base

Google, June 2013

48

C. Faloutsos (CMU)

IP-destination

IP-source

Time-stamp

Anomaly

Detection in

Computer

networksSlide49

Problem Definition

How to decompose a billion-scale tensor?

Corresponds to SVD in 2D case

Google, June 2013

49

C. Faloutsos (CMU)Slide50

Problem Definition

How to decompose a billion-scale tensor?

Corresponds to SVD in 2D case

Google, June 2013

50

C. Faloutsos (CMU)

‘Politicians’

‘Artists’Slide51

Problem Definition

Q1: Dominant concepts/topics?

Q2: Find synonyms to a given noun phrase?

(and how to scale up: |data| > RAM)

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros

=144M

Google, June 2013

51

C. Faloutsos (CMU)Slide52

Experiments

GigaTensor

solves

100x larger problem

Number of

nonzero

= I / 50

(J)

(I)

(K)

GigaTensor

Tensor

Toolbox

Out of

Memory

100x

Google, June 2013

52

C. Faloutsos (CMU)Slide53

A1: Concept Discovery

Concept Discovery in Knowledge Base

Google, June 2013

53

C. Faloutsos (CMU)Slide54

A1: Concept Discovery

Google, June 2013

54

C. Faloutsos (CMU)Slide55

A2: Synonym Discovery

Google, June 2013

55

C. Faloutsos (CMU)Slide56

C. Faloutsos (CMU)

56

Roadmap

Graph problems:

G1: Fraud detection – BP

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

Conclusions

Google, June 2013Slide57

Rise and Fall Patterns of Information Diffusion:

Model and Implications

Yasuko Matsubara (Kyoto University),

Yasushi Sakurai (NTT),

B. Aditya Prakash (CMU), Lei Li

(UCB), Christos Faloutsos (CMU)KDD’12, Beijing China

KDD 2012

57

Y. Matsubara et al.Slide58

Meme (# of mentions in blogs)short phrases Sourced from U.S. politics in 2008

58

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

C. Faloutsos (CMU)

Google, June 2013Slide59

Rise and fall patterns in social media

59

four

classes on YouTube [Crane et al. ’08]

six

classes on Meme [Yang et al. ’11]

C. Faloutsos (CMU)

Google, June 2013Slide60

Rise and fall patterns in social media

60

Can we find a unifying model, which includes these patterns?

four

classes on YouTube [Crane et al. ’08]

six

classes on Meme [Yang et al. ’11]

C. Faloutsos (CMU)

Google, June 2013Slide61

Rise and fall patterns in social media

61

Answer: YES!

We

can represent

all patterns

by

single model

C. Faloutsos (CMU)

Google, June 2013Slide62

62

Main idea -

SpikeM

1.

Un

-informed bloggers

(uninformed about rumor)2. External

shock at time nb

(

e.g

, breaking news)

3. Infection (word-of-mouth)

Time n=0

Time n=

n

b

β

C. Faloutsos (CMU)

Google, June 2013

Infectiveness of a blog-post at age

n

:

Strength of infection (quality of news)

Decay function

Time n=n

b

+1Slide63

63

1.

Un

-informed bloggers

(uninformed about rumor)

2. External shock

at time nb

(e.g, breaking news)3.

Infection

(word-of-mouth)

Time n=0

Time n=

n

b

β

C. Faloutsos (CMU)

Google, June 2013

Infectiveness of a blog-post at age

n

:

Strength of infection (quality of news)

Decay function

Time n=n

b

+1

Main idea - SpikeMSlide64

Google, June 2013

C. Faloutsos (CMU)

64

-1.5 slope

J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein.

Nature

437,

1251 (2005) . [

PDF

]

Response time (log)

Prob(RT

>

x

)

(log)

-1.5Slide65

SpikeM - with periodicity

Full equation of SpikeM

65

Periodicity

noon

Peak

3am

Dip

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Details

C. Faloutsos (CMU)

Google, June 2013Slide66

Details

Analysis –

exponential

rise and

power-raw fall

66

Lin-log

Log-log

Rise-part

SI

->

exponential

SpikeM

->

exponential

C. Faloutsos (CMU)

Google, June 2013Slide67

Details

Analysis –

exponential

rise and

power-raw fall

67

Lin-log

Log-log

Fall-part

SI -> exponential

SpikeM

-> power law

C. Faloutsos (CMU)

Google, June 2013Slide68

Tail-part forecasts

68

SpikeM

can capture tail part

C. Faloutsos (CMU)

Google, June 2013Slide69

“What-if” forecasting

69

e.g., given (1) first spike,

(2) release date of two sequel movies

(3) access volume before the release date

?

(1) First spike

(2) Release date

(3) Two weeks before release

C. Faloutsos (CMU)

Google, June 2013

?Slide70

“What-if” forecasting

70

SpikeM

can forecast upcoming spikes

(1) First spike

(2) Release date

(3) Two weeks before release

C. Faloutsos (CMU)

Google, June 2013Slide71

Conclusions for spikesExp rise; PL decay

spikeM

’ captures all patterns, with a few parmsAnd can do extrapolationAnd forecasting

Google, June 2013

C. Faloutsos (CMU)

71Slide72

C. Faloutsos (CMU)

72

Roadmap

Graph problems:

G1: Fraud detection – BP

G2:

Botnet

detection – spectral

G3: Beyond graphs: tensors and ``NELL’’

Influence propagation and spike modeling

Future research

Conclusions

Google, June 2013Slide73

Challenge#1: Time evolving networks / tensors

Periodicities?

Burstiness

?What is ‘typical’ behavior of a node, over timeHeterogeneous graphs (= nodes w/ attributes)

Google, June 2013

C. Faloutsos (CMU)

73

…Slide74

Challenge #2: ‘Connectome’ – brain wiring

Google, June 2013

C. Faloutsos (CMU)

74

Which neurons get activated by ‘bee’

How wiring evolves

Modeling epilepsy

N.

Sidiropoulos

George

Karypis

V.

Papalexakis

Tom MitchellSlide75

C. Faloutsos (CMU)

75

Thanks

Google, June 2013

Thanks to:

NSF IIS-0705359, IIS-0534205,

CTA-INARC

;

Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP,

iLabSlide76

C. Faloutsos (CMU)

76

Project info: PEGASUS

Google, June 2013

www.cs.cmu.edu/~pegasus

Results on large graphs: with Pegasus + hadoop + M45

Apache license

Code, papers, manual, video

Prof. U Kang

Prof. Polo ChauSlide77

C. Faloutsos (CMU)

77

Cast

Akoglu

,

Leman

Chau

,

Polo

Kang, U

McGlohon

,

Mary

Tong,

Hanghang

Prakash

,

Aditya

Google, June 2013

Koutra

,

Danai

Beutel

,

Alex

Papalexakis

,

VagelisSlide78

C. Faloutsos (CMU)

78

References

Deepayan

Chakrabarti

, Christos

Faloutsos

:

Graph mining: Laws, generators, and algorithms

. ACM

Comput

. Surv

. 38(1): (2006)

Google, June 2013Slide79

C. Faloutsos (CMU)

79

References

Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools

. Tutorial, SIGMOD Conference 2007: 1174

Google, June 2013Slide80

ReferencesYasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos, "

Rise and Fall Patterns of Information Diffusion: Model and Implications

", KDD’12, pp. 6-14, Beijing, China, August 2012

Google, June 2013

C. Faloutsos (CMU)

80Slide81

References

Jimeng Sun, Dacheng Tao, Christos Faloutsos:

Beyond streams and graphs: dynamic tensor analysis

. KDD 2006: 374-383

Google, June 2013

C. Faloutsos (CMU)

81Slide82

Overall ConclusionsG1: fraud detection

BP: powerful method

FaBP

: faster; equally accurate; known convergenceG2: botnets -> Eigenspokes

G3: Subject-Verb-Object -> Tensors/GigaTensor

Spikes: ‘spikeM’ (exp rise; PL drop)

Google, June 2013

C. Faloutsos (CMU)

82