/
You Are How You Click Clickstream Analysis for You Are How You Click Clickstream Analysis for

You Are How You Click Clickstream Analysis for - PowerPoint Presentation

reagan
reagan . @reagan
Follow
342 views
Uploaded On 2022-06-15

You Are How You Click Clickstream Analysis for - PPT Presentation

Sybil D etection Gang Wang Tristan Konolige Christo Wilson Xiao Wang Haitao Zheng and Ben Y Zhao UC Santa Barbara Northeastern University Renren Inc ID: 918440

users sybil user sybils sybil users sybils user similarity detection normal clickstreams cluster time clickstream social clusters click good

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "You Are How You Click Clickstream Analys..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

You Are How You Click

Clickstream Analysis for

Sybil Detection

Gang Wang, Tristan Konolige, Christo Wilson†, Xiao Wang‡ Haitao Zheng and Ben Y. ZhaoUC Santa Barbara †Northeastern University ‡Renren Inc.

Slide2

Sybils

in Online Social NetworksSybil (

sɪbəl): fake identities controlled by attackersFriendship is a pre-cursor to other malicious activitiesDoes not include benign fakes (secondary accounts)Large Sybil populations*

214.3 Million Sybils (August, 2012) 20 Million Sybils (April, 2013) 20 Mutual Friends

*

Numbers from CNN 2012, NYT 2013

Slide3

Sybil Attack: a Serious Threat

Social spam Advertisement, malware, phishing

Steal user information Sybil-based political lobbying efforts

3Malicious URLTaliban uses sexy Facebook profiles to lure troops into giving away military secrets

Slide4

Sybil Defense: Cat-and-Mouse

Game4

Social Networks

AttackersStop automated account creationCAPTCHA Detect suspicious profilesSpam features, URL blacklistsUser reportDetect Sybil communities

[SIGCOMM’06

], [Oakland’08

],

[NDSS’09

], [

NSDI’12]

Crowdsourcing CAPTCHA solving

[USENIX’10]

Realistic profile generation

Complete bio info, profile pic

[WWW’12]

Slide5

Graph-based Sybil Detectors

A k

ey assumptionSybils have difficulty “friending” normal usersSybils form tight-knit communitiesMeasuring Sybils in

Renren social network [IMC’11]Ground-truth 560K Sybils collected over 3 years Most Sybils befriend real users, integrate into real-user communitiesMost Sybils don’t befriend other Sybils5

Is This True?

Sybils don’t need to form communities!

Sybil

Real

Slide6

Sybil detection with

static profiles analysis [NDSS’13]

Leverage human intuition to detect fake profiles (crowdsourcing)Successful user-study shows it scales well with high accuracyProfile-based detection has limitations Some profiles are easy to mimic (e.g. CEO profile ) Information can be found online

A new direction: look at what users do!How users browse/click social network pagesBuild user behavior models using clickstreamsSybil Detection Without Graphs6

Slide7

Clickstreams and User Behaviors

Clickstream: a list of server-side user-generated events

E.g. profile load, link follow, photo browse, friend inviteIntuition: Sybil users act differently from normal usersGoal-oriented: concentrate on specific actions

Time-limited: fast event generation (small inter-arrival time)UserIDEvent GeneratedTimestamp345678Send Friend Request_23908

1303022295242

214567

Visit Profile_12344

1300784205886

7

Analyze ground-truth clickstreams for Sybil detectio

n

Slide8

Outline

Motivation

Clickstream Similarity GraphGround-truth DatasetModeling User ClickstreamsGenerating Behavioral ClustersReal-time Sybil Detection

8

Slide9

Ground-truth Dataset

Renren Social NetworkA large online social

network in China (280M+ users)Chinese Facebook Ground-truthGround-truth provided by Renren’s security team16K users, clickstreams over two months in 2011, 6.8M clicks

9*Our study is IRB approved.DatasetUsersSessionsClicksDate (2011)Sybil9,994113,5951,008,031Feb.28-Apr.30Normal5,998467,1795,856,941Mar.31-Apr.30

Slide10

Normal users use

many social

network

featuresSybils focus on a few actions (e.g. friend invite, browse profiles)Basic Analysis: Click TransitionsSybil ClickstreamFriend InvitePhoto

Browse Profiles

Initial

Final

89%

91%

57%

38%

7

%

34%

44%

6

%

4

%

5%

Spammers

Crawlers

10

Normal Clickstream

Photo

Initial

Final

39%

4

%

Share

Blog

Notification

Browse Profiles

7

%

14%

25%

31%

19%

13%

3

1%

4

6%

4

7

%

31%

42%

21%

16%

16%

17%

93%

33%

11%

Sybils and normal users have very different click patterns!

Slide11

Identifying Sybils F

rom Normal Users

Goal: quantify the differences in user behaviorsMeasure the similarity between user clickstreamsApproach: map user’s clickstreams to a similarity graphClickstreams are nodes

Edge-weights indicate the similarity of two clickstreamsClusters in the similarity graph capture user behaviors Each cluster represents certain type of click/behavior patternHypothesis: Sybils and normal users fall into different clusters11

Slide12

Legit

Sybils

Clickstream Log

Behavior Cluster

s

?

Unknown

User Clickstream

Similarity Graph

Labeled Clusters

Good Clusters

Sybil Cluster

Model Training

Detection

12

Slide13

Capturing User Clickstreams

Click Sequence

Model: order of click eventse.g. ABCDA …Time-based Model:

sequence of inter-arrival timee.g. {t1, t2, t3, …}Complete Model: sequence of click events with time e.g. A(t1)B(t2)C(t3)D(t4)A …13User1:User2:

X

X

X

X

X

X

Time

A

B

C

D

A

A

X

X

D

B

X

X

A

A

X

X

E

A

X

X

X

X

X

X

Time

B

B

C

D

E

C

X

X

B

A

X

X

B

D

X

X

D

E

Slide14

Clickstream Similarity Functions

Similarity of sequences

Common subsequence

Common subsequence with countsAdding “time” to the sequenceBucketize inter-arrival time, encode time into the sequenceApply the same sequence similarity functionngram1= {A, B, AA, AB, AAB}ngram2= {A, C,

AA, AC, AAC}

S

1

= AAB

S

2

= AAC

ngram1= {

A(2), B(1), AA(1), AB(1), AAB(1)

}

ngram2= {

A(2), C(1), AA(1), AC(1), AAC(1)

}

S

1

= AAB

S

2

= AAC

Euclidean Distance

V

1

=(2,1,0,1,0,1,1,0

)

V

2

=(2,0,1,1,1,0,0,1)

14

Slide15

Clickstream Clustering

Similarity graph (fully-connected)Nodes

: user’s clickstreamsEdges: weighted by the similarity score of two users’ clickstreams Clustering similar c

lickstreams togetherMinimum edge weight cutGraph partitioning using METISPerform clustering on ground-truth dataComplete model produces very accurate behavior clusters3% false negatives and 1% false positives

15

Sybils in normal clusters

Normal users in Sybil clusters

Slide16

Outline

Motivation

Clickstream Similarity GraphReal-time Sybil DetectionSybil Detection Using Similarity GraphUnsupervised Approach

16

Slide17

Detection in a

Nutshell

?

Normal

Sybi

l

17

Fastest, scalable

Sybil

detection methodology

A

ssign

the

unclassified clickstream

to

the “nearest” cluster

If

the nearest cluster is a Sybil

cluster,

then the user is a

Sybil

Assigning clickstreams to clusters

K

nearest

neighbor (KNN

)

Nearest cluster

(NC

)

Nearest cluster

with

center

(NCC

)

New Clickstreams

Clustered Similarity Graph

Slide18

Detection Evaluation

Split 12K clickstreams into training and testing datasetsTrain initial clusters with 3K Sybil + 3K normal users

Classify remaining 6K testing clickstreams 18

NCC (fastest) is as good as the others< 0.7% false positive rateK-nearest neighbor Nearest Cluster Nearest Cluster (center)

Slide19

4

00 random good users are enough to color all behavior clusters

For unknown dataset, add good users until diminishing returnsStill achieve high detection accuracy (1% fp, 4% fn)(Semi) unsupervised Approach

What if we don’t have a big ground-truth dataset?Need a method to label clustersUse a (small) set of known-good users to color clustersAdding known users to existing clustersClusters that contain good users are “good” clusters19

Good Clusters

Sybil Cluster

Known

Good Users

Details here

Slide20

Real-world Experiments

Deploy system prototypes onto social networks Shipped our prototype code to Renren and LinkedIn

All user data remained on-site 20

Scanned 40K ground-truth user’s clickstreamsFlagged 200 previous unknown Sybils

Scanned 1M user’s clickstreams

Flagged 22K suspicious users

Identified a new attack

“Image” Spammers

Embed spam content in images

Easy to evade text/URL based detectors

Slide21

Evasion and Challenges

In order to evade our system, Sybils may …Slow down their click speed

Generate “normal” actions as cover trafficPractical challengesHow to update behavior clusters over time (incrementally)?How to integrate with other existing detection techniques? (e.g. profile, content based detectors )

21Force Sybils tomimic normal users= Win

Slide22

Thank You!

Questions?

22