Sybil D etection Gang Wang Tristan Konolige Christo Wilson Xiao Wang Haitao Zheng and Ben Y Zhao UC Santa Barbara Northeastern University Renren Inc ID: 918440
Download Presentation The PPT/PDF document "You Are How You Click Clickstream Analys..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
You Are How You Click
Clickstream Analysis for
Sybil Detection
Gang Wang, Tristan Konolige, Christo Wilson†, Xiao Wang‡ Haitao Zheng and Ben Y. ZhaoUC Santa Barbara †Northeastern University ‡Renren Inc.
Slide2Sybils
in Online Social NetworksSybil (
sɪbəl): fake identities controlled by attackersFriendship is a pre-cursor to other malicious activitiesDoes not include benign fakes (secondary accounts)Large Sybil populations*
214.3 Million Sybils (August, 2012) 20 Million Sybils (April, 2013) 20 Mutual Friends
*
Numbers from CNN 2012, NYT 2013
Slide3Sybil Attack: a Serious Threat
Social spam Advertisement, malware, phishing
Steal user information Sybil-based political lobbying efforts
3Malicious URLTaliban uses sexy Facebook profiles to lure troops into giving away military secrets
Slide4Sybil Defense: Cat-and-Mouse
Game4
Social Networks
AttackersStop automated account creationCAPTCHA Detect suspicious profilesSpam features, URL blacklistsUser reportDetect Sybil communities
[SIGCOMM’06
], [Oakland’08
],
[NDSS’09
], [
NSDI’12]
Crowdsourcing CAPTCHA solving
[USENIX’10]
Realistic profile generation
Complete bio info, profile pic
[WWW’12]
Slide5Graph-based Sybil Detectors
A k
ey assumptionSybils have difficulty “friending” normal usersSybils form tight-knit communitiesMeasuring Sybils in
Renren social network [IMC’11]Ground-truth 560K Sybils collected over 3 years Most Sybils befriend real users, integrate into real-user communitiesMost Sybils don’t befriend other Sybils5
Is This True?
Sybils don’t need to form communities!
Sybil
Real
Slide6Sybil detection with
static profiles analysis [NDSS’13]
Leverage human intuition to detect fake profiles (crowdsourcing)Successful user-study shows it scales well with high accuracyProfile-based detection has limitations Some profiles are easy to mimic (e.g. CEO profile ) Information can be found online
A new direction: look at what users do!How users browse/click social network pagesBuild user behavior models using clickstreamsSybil Detection Without Graphs6
Slide7Clickstreams and User Behaviors
Clickstream: a list of server-side user-generated events
E.g. profile load, link follow, photo browse, friend inviteIntuition: Sybil users act differently from normal usersGoal-oriented: concentrate on specific actions
Time-limited: fast event generation (small inter-arrival time)UserIDEvent GeneratedTimestamp345678Send Friend Request_23908
1303022295242
214567
Visit Profile_12344
1300784205886
…
…
…
7
Analyze ground-truth clickstreams for Sybil detectio
n
Slide8Outline
Motivation
Clickstream Similarity GraphGround-truth DatasetModeling User ClickstreamsGenerating Behavioral ClustersReal-time Sybil Detection
8
Slide9Ground-truth Dataset
Renren Social NetworkA large online social
network in China (280M+ users)Chinese Facebook Ground-truthGround-truth provided by Renren’s security team16K users, clickstreams over two months in 2011, 6.8M clicks
9*Our study is IRB approved.DatasetUsersSessionsClicksDate (2011)Sybil9,994113,5951,008,031Feb.28-Apr.30Normal5,998467,1795,856,941Mar.31-Apr.30
Slide10Normal users use
many social
network
featuresSybils focus on a few actions (e.g. friend invite, browse profiles)Basic Analysis: Click TransitionsSybil ClickstreamFriend InvitePhoto
Browse Profiles
Initial
Final
89%
91%
57%
38%
7
%
34%
44%
6
%
4
%
5%
Spammers
Crawlers
10
Normal Clickstream
Photo
Initial
Final
39%
4
%
Share
Blog
Notification
Browse Profiles
7
%
14%
25%
31%
19%
13%
3
1%
4
6%
4
7
%
31%
42%
21%
16%
16%
17%
93%
33%
11%
Sybils and normal users have very different click patterns!
Slide11Identifying Sybils F
rom Normal Users
Goal: quantify the differences in user behaviorsMeasure the similarity between user clickstreamsApproach: map user’s clickstreams to a similarity graphClickstreams are nodes
Edge-weights indicate the similarity of two clickstreamsClusters in the similarity graph capture user behaviors Each cluster represents certain type of click/behavior patternHypothesis: Sybils and normal users fall into different clusters11
Slide12Legit
Sybils
Clickstream Log
Behavior Cluster
s
?
Unknown
User Clickstream
Similarity Graph
Labeled Clusters
Good Clusters
Sybil Cluster
Model Training
Detection
12
Slide13Capturing User Clickstreams
Click Sequence
Model: order of click eventse.g. ABCDA …Time-based Model:
sequence of inter-arrival timee.g. {t1, t2, t3, …}Complete Model: sequence of click events with time e.g. A(t1)B(t2)C(t3)D(t4)A …13User1:User2:
X
X
X
X
X
X
Time
A
B
C
D
A
A
X
X
D
B
X
X
A
A
X
X
E
A
X
X
X
X
X
X
Time
B
B
C
D
E
C
X
X
B
A
X
X
B
D
X
X
D
E
Slide14Clickstream Similarity Functions
Similarity of sequences
Common subsequence
Common subsequence with countsAdding “time” to the sequenceBucketize inter-arrival time, encode time into the sequenceApply the same sequence similarity functionngram1= {A, B, AA, AB, AAB}ngram2= {A, C,
AA, AC, AAC}
S
1
= AAB
S
2
= AAC
ngram1= {
A(2), B(1), AA(1), AB(1), AAB(1)
}
ngram2= {
A(2), C(1), AA(1), AC(1), AAC(1)
}
S
1
= AAB
S
2
= AAC
Euclidean Distance
V
1
=(2,1,0,1,0,1,1,0
)
V
2
=(2,0,1,1,1,0,0,1)
14
Slide15Clickstream Clustering
Similarity graph (fully-connected)Nodes
: user’s clickstreamsEdges: weighted by the similarity score of two users’ clickstreams Clustering similar c
lickstreams togetherMinimum edge weight cutGraph partitioning using METISPerform clustering on ground-truth dataComplete model produces very accurate behavior clusters3% false negatives and 1% false positives
15
Sybils in normal clusters
Normal users in Sybil clusters
Slide16Outline
Motivation
Clickstream Similarity GraphReal-time Sybil DetectionSybil Detection Using Similarity GraphUnsupervised Approach
16
Slide17Detection in a
Nutshell
?
Normal
Sybi
l
17
Fastest, scalable
Sybil
detection methodology
A
ssign
the
unclassified clickstream
to
the “nearest” cluster
If
the nearest cluster is a Sybil
cluster,
then the user is a
Sybil
Assigning clickstreams to clusters
K
nearest
neighbor (KNN
)
Nearest cluster
(NC
)
Nearest cluster
with
center
(NCC
)
New Clickstreams
Clustered Similarity Graph
Slide18Detection Evaluation
Split 12K clickstreams into training and testing datasetsTrain initial clusters with 3K Sybil + 3K normal users
Classify remaining 6K testing clickstreams 18
NCC (fastest) is as good as the others< 0.7% false positive rateK-nearest neighbor Nearest Cluster Nearest Cluster (center)
Slide194
00 random good users are enough to color all behavior clusters
For unknown dataset, add good users until diminishing returnsStill achieve high detection accuracy (1% fp, 4% fn)(Semi) unsupervised Approach
What if we don’t have a big ground-truth dataset?Need a method to label clustersUse a (small) set of known-good users to color clustersAdding known users to existing clustersClusters that contain good users are “good” clusters19
Good Clusters
Sybil Cluster
Known
Good Users
Details here
Slide20Real-world Experiments
Deploy system prototypes onto social networks Shipped our prototype code to Renren and LinkedIn
All user data remained on-site 20
Scanned 40K ground-truth user’s clickstreamsFlagged 200 previous unknown Sybils
Scanned 1M user’s clickstreams
Flagged 22K suspicious users
Identified a new attack
“Image” Spammers
Embed spam content in images
Easy to evade text/URL based detectors
Slide21Evasion and Challenges
In order to evade our system, Sybils may …Slow down their click speed
Generate “normal” actions as cover trafficPractical challengesHow to update behavior clusters over time (incrementally)?How to integrate with other existing detection techniques? (e.g. profile, content based detectors )
21Force Sybils tomimic normal users= Win
Slide22Thank You!
Questions?
22