in Social Computing Systems Krishna P Gummadi Max Planck Institute for Software Systems Social computing systems Online systems that allow people to interact Examples Social networking sites ID: 786582
Download The PPT/PDF document "1 On the Strength of Weak Identities" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
On the Strength of Weak Identities in Social Computing Systems
Krishna P. Gummadi
Max Planck Institute for Software Systems
Slide2Social computing systemsOnline systems that allow people to interactExamples:Social networking sites: Facebook, Goolge+Blogging sites: Twitter, LiveJournalContent-sharing sites: YouTube, FlickrSocial bookmarking sites: Delicious, RedditCrowd-sourced opinions: Yelp, eBay seller ratingsPeer-production sites: Wikipedia, AMTDistributed systems of people
Slide3The Achilles Heel of Social Computing Systems
Slide4Identity infrastructuresWeak identity infrastructure:
No verification by trusted authorities required. Fill up a simple profile to create account
Pros:
Provides some level of anonymity
Low entry barrier
Cons:
Lack accountability
Vulnerable to fake (Sybil) id attacks
Most platforms use a
weak identity infrastructure
Slide5Sybil attacks: Attacks using fake identitiesFundamental problem in systems with weak user idsNumerous real-world examples:Facebook: Fake likes and ad-clicks for businesses and celebritiesTwitter: Fake followers and tweet popularity manipulationYouTube, Reddit: Content owners manipulate popularityYelp: Restaurants buy fake reviewsAMT, freelancer: Offer Sybil identities to hire
Slide6Sybil attacks are a growing menace
There is an incentive to manipulate popularity of ids and information
Slide7The emergence of Abuse-As-A-Service
Slide8Sybil identities are a growing menace40% of all newly created Twitter ids are fake!
Slide9Sybil identities are a growing menace50% of all newly created Yelp ids are fake!
Slide10The Strength of Weak Identities
Slide11Strength of a weak identityEffort needed to forge the weak identityWeak ids come with zero external referencesStrength is the effort needed to forge ids’ activitiesAnd thereby, the ids’ reputationIdea: Could we measure ids’ strength by their blackmarket prices?
Slide12Slide13Slide14Key observationAttackers cannot tamper timestamps of activities E.g., join dates, id creation timestampsOlder ids are less likely to be fake than newer idsAttackers do not target till sites reach critical massOver time, older ids are more curated than newer ids
Spam filters had more time to check older ids
Slide15Most active fakes are new ids
Older ids are
less likely to be fake than newer ids
Slide16Assessing strength of weak identitiesLeverage the temporal evolution of reputation scores
Time
Reputation score
Evolution of reputation score of a single participant
Attacker
cannot forge the timestamps when the reputation score changed!
Join date timestamp
t
0
t
50
Evolution time
t
100
Current reputation score
Slide17Trustworthiness of Weak Identities
Slide18Trustworthiness of an identityProbability that its activities are in compliance with the online site’s ToSHow to assess trustworthiness?Ability to hold the user behind the identity accountableVia non-anonymous strong idsEconomic incentives vs. costs for the attackStrength of weak id determines attacker costsLeverage social behavioral stereotypes
Slide19Traditional Sybil defense approachesCatch & suspend ids with bad activitiesBy checking for spam content in postsCan’t catch manipulation of genuine content’s popularityProfile identities to detect suspicious-looking idsBefore they even commit fraudulent activitiesAnalyze info available about individual ids, such asDemographic and activity-related infoSocial network links
Slide20Lots of recent workGather a ground-truth set of Sybil and non-Sybil idsSocial turing tests: Human verification of accounts to determine Sybils [NSDI ‘10, NDSS ‘13]Automatically flagging anomalous (rare) user behaviors
[Usenix Sec. ’
14]Train ML classifiers to distinguish between them
[CEAS ’10]Classifiers trained to flag ids with similar profile featuresLike humans, they
look for features that arise suspicion
Does it have a profile photo? Does it have friends who look real? Do the posts look real?
Slide21Key idea behind id profilingFor many profile attributes, the values assumed by Sybils & non-Sybils tend to be different
Slide22Key idea behind id profilingFor many profile attributes, the values assumed by Sybils & non-Sybils tend to be differentLocation field is not set for >90% of Sybils, but <40% of non-SybilsLots of Sybils have low follower-to-following ratioA much smaller fraction of Sybils have more than 100,000 followers
Slide23Limitations of profiling identitiesPotential discrimination against good users With rare behaviors that are flagged as anomalousWith profile attributes that match those of SybilsSets up a rat-race with attackersSybils can avoid detection by assuming likely attribute values of good nodesSybils can set location attributes, lower follower to following ratiosOr, by attacking with new ids with no prior activity history
Slide24Attacks with newly created Sybils
All our bought fake followers were newly created!
Existing spam defenses cannot block them
Slide25Robust Tamper Detection in Crowd Computations
Slide26Is a crowd computation tampered? Does a large computation involve a sizeable fraction of Sybil participants?
Twitter profile
Follower
User with tampered follower count
Business page
Reviewer
Business with tampered rating
Slide27Are the following problems equivalent? 1. Detect whether a crowd computation is tamperedDoes the computation involve a sizeable fraction of Sybil participants?2. Detect whether an identity is Sybil
Slide28Are the following problems equivalent? 1. Detect whether a crowd computation is tamperedDoes the computation involve a sizeable fraction of Sybil participants?2. Detect whether an identity is SybilOur Stamper project: NO!Claim: We can robustly detect tampered computations even when we
cannot detect fake ids
Slide29Stamper: Detecting tampered crowds
Significant fraction of identities
dating all the way back to inception of site
Idea: Analyze
join date distributions of participants
Entropy of tampered computations tends to be lower
More generally,
temporal evolution of reputation scores
Slide30Robustness against adaptive attackersStamper can fundamentally alter the arms race with attackers
Any malicious identity that gets suspended leads to a near permanent damage to attacker's power!
What about attacks using compromised or colluding identities?
Compromised/colluding identities have to be selected in such a way that it would match the reference distribution
Slide31TrulyFollowing: A prototype systemDetects popular users (politicians) with fake followerstrulyfollowing.app-ns.mpi-sws.org
Slide32TrulyTweeting: A prototype system
Detects
popular hashtags, URLs, tweets with fake promoters
trulytweeting.app-ns.mpi-sws.org
Slide33DEMO
Slide34Detection by Stamper: How it worksAssume unbiased participation in a computationThe join date distributions for ids in any large-scale crowd computation must match that of a large random sample of ids on the siteAny deviation indicates Sybil tampering
Greater the deviation, the more likely the tampering
Deviation can be calculated using KL-divergenceRank computations based on their divergence Flag the most anomalous computations
Slide35Dealing with computations with biased participationWhen nodes come from a biased user population:All computations suffer high deviationsMaking the tamper detection process less effectiveSolution: Compute join dates’ reference distribution from a similarly biased sample user populationI.e., select a user population with similar demographicsHas the potential to improve accuracy further
Slide36Detection accuracy: Yelp case studyCase study: Find businesses with tampered reviews in YelpExperimental set-up: 3,579 businesses with more than 100 reviews"Ground-truth" obtained using Yelp's review filter
Stamper
flags
>97% of highly tampered crowds
Stamper flags only 3/54 (5.6%) normal crowds
Stamper flags 362 businesses (83% of all with more then 30% tampering)
Slide37Take-away lessonIds are increasingly being profiled to detect SybilsDon’t profile individual identities!Accuracy would be lowCan’t prevent tampering of computationsProfile groups of ids participating in a computationAfter all, the goal is to prevent tampering of computations
Slide38Take-away questionsWhat should a site do after detecting tampering?How do we know who tampered the computation?Could a politician / business slander competing politicians / businesses by buying fake endorsements for them?Can we eliminate the effects of tampering?Is it possible to discount tampered votes?
Slide39Take-away questionsIn practice, users have weak identities across multiple sitesSuch weak ids are increasingly being linkedCan we transfer trust between weak identities of a user across domains?Can Gmail help Facebook assess trust in Facebook ids created using Gmail ids?Can a collection of a user’s weak user ids substitute for a strong user id?