Filtering Service Kurt Thomas Chris Grier Justin Ma Vern Paxson Dawn Song University of California Berkeley International Computer Science Institute Motivation Social Networks Facebook Twitter ID: 213511
Download Presentation The PPT/PDF document "Design and Evaluation of a Real-Time URL..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Design and Evaluation of a Real-Time URL Spam Filtering Service
Kurt Thomas, Chris Grier, Justin Ma,Vern Paxson, Dawn Song
University of California, Berkeley
International Computer Science InstituteSlide2
Motivation
Social Networks
(
Facebook, Twitter)
Web Mail
(Gmail, Live Mail)
Blogs, Services
(Blogger, Yelp)
SpamSlide3
MotivationExisting solutions:
BlacklistsService-specific, account heuristicsDevelop new spam filter service:Filter spam: scams, phishing, malwareReal-time, fine-grained, generalizableSlide4
Overview
Our system – Monarch:Accepts millions of URLs from web serviceCrawls, labels each URL in real-timeSpam ClassificationDecision based on URL content, page behavior, hostingLarge-scale; distributed collection, classificationImplemented as a cloud serviceSlide5
Monarch in Action
Social Network
1. Spam Message
Spam Account
URLSlide6
Monarch in Action
MonarchSocial Network
1. Spam Message
2. Message URL
Spam Account
URLSlide7
Monarch in Action
Monarch
Social Network
1. Spam Message
2. Message URL
3
. Fetch Content
Spam URL Content
Spam Account
URLSlide8
Monarch in Action
Monarch
Social Network
1. Spam Message
2. Message URL
4. Decision
3
. Fetch Content
Spam URL Content
Spam Account
URLSlide9
Monarch in Action
Monarch
Social Network
Message Recipients
1. Spam Message
2. Message URL
4. Decision
3
. Fetch Content
Spam URL Content
Spam Account
URLSlide10
Challenges
AccuracyReal-Time
Scalability
Tolerant to Feature EvolutionSlide11
OutlineArchitectureResults & Performance
LimitationsConclusionSlide12
System ArchitectureSlide13
System ArchitectureSlide14
System ArchitectureSlide15
System ArchitectureSlide16
URL Aggregation
SourceSample SizeSpam email URLs1.25 million
Blacklisted
Twitter URLs
567
,000
Non-spam Twitter URLs9 million
Collection period: 9/8/2010 – 10/29/2010Slide17
Feature Collection
High Fidelity BrowserNavigationLexical features of URLs (length, subdomains)Obfuscation (directory operations, nested encoding)HostingIP/ASNA, NS, MX recordsCountry, city if availableSlide18
Feature Collection
ContentCommon HTML templates, keywordsSearch engine optimizationContent of request, response headersBehaviorPrevent navigating awayPop-up windowsPlugin, JavaScript redirectsSlide19
Classification
Distributed Logistic RegressionData overload for single machineSlide20
Classification
Distributed Logistic RegressionData overload for single machineL1-regularizationReduces feature space, over-fitting50 million features -> 100,000 featuresSlide21
ImplementationSystem implemented as a cloud service on Amazon EC2
Aggregation: 1 machineFeature Collection: 20 machinesFirefox, extension + modified sourceClassification & Feature Extraction: 50 machinesHadoop - Spark, Mesos
Straightforward to scale the architectureSlide22
Result OverviewHigh-level summary:
PerformanceOverall accuracyHighlight important featuresFeature evolutionSpam independence between servicesSlide23
PerformanceRate: 638,000 URLs/day
Cost: $1,600/moProcess time: 5.54 secNetwork delay: 5.46 secCan scale to 15 million URLs/dayEstimated $22,000/moSlide24
Measuring AccuracyDataset: 12 million URLs (<2 million spam)
Sample 500K spam (half tweets, half email)Sample 500K non-spamTraining, Testing5-fold validationVary training folds non-spam:spam ratioTest fold equal parts spam, non-spamSlide25
Overall Accuracy
Training RatioAccuracyFalse Positive RateFalse Negative Rate
1:1
94%
4.23%
7.5%
4:191%
0.87%17.6%10:1
87%0.29%26.5%
Non-spam labeleda
s spamSpam labeledas non-spam
Correctly labeled
samplesSlide26
Overall Accuracy
Non-spam labeledas spamSpam labeledas non-spam
Correctly labeled
samples
Training Ratio
Accuracy
False Positive Rate
False Negative Rate
1:1
94%
4.23%
7.5%
4:1
91%
0.87%
17.6%
10:1
87%
0.29%
26.5%Slide27
Error by Feature
Error (%)
Error =
1 - AccuracySlide28
Error by Feature
Error (%)
Error =
1 - AccuracySlide29
Error by Feature
Error (%)
Error =
1 - AccuracySlide30
Feature Evolution – Retraining Required
Accuracy (%)Slide31
Spam IndependenceUnexpected result: Twitter, email spam qualitatively different
Training SetTesting Set
Accuracy
False Negatives
Twitter
Twitter
94%22%
TwitterEmail81%
88%EmailTwitter80%
99%Email
Email99%4%Slide32
Spam IndependenceUnexpected result: Twitter, email spam qualitatively different
Training SetTesting Set
Accuracy
False Negatives
Twitter
Twitter
94%22%Twitter
Email81%
88%EmailTwitter
80%99%
EmailEmail99%4%Slide33
Distinct Email, Twitter FeaturesSlide34
Email Features Shorter LivedSlide35
LimitationsAdversarial Machine Learning
We provide oracle to spammersCan adversaries tweak content until passing?Time-based EvasionChange content after URL submitted for verificationCrawler FingerprintingIdentify IP space of Monarch, fingerprint Monarch browser clientDual-personality DNS, page behaviorSlide36
Related WorkC. Whittaker, B.
Ryner, and M. Nazif, “Large-Scale Automatic Classification of Phishing Pages”J. Ma, L. Saul, S. Savage, and G. Voelker, “Identifying suspicious URLs: an application of large-scale online learning”
Y. Zhang, J. Hong, and L.
Cranor
, “
Cantina: a content-based approach to detecting phishing web sites”
M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of drive- by-download attacks and malicious JavaScript code”Slide37
ConclusionMonarch provides:
Real-time scam, phishing, malware detectionExperiments show 91% accuracy, 0.87% false positivesReadily scalable cloud serviceApplicable to all URL-based spamSpam not guaranteed to overlap between web servicesTwitter, email qualitatively differentDespite overlap, can still provide generalizable filteringRequire training data from each service