Endorsements Reputation Virality and Social Tagging OReilly Strata February 28 2013 Sam Shah samshah Pete Skomoroch peteskomoroch 2012 LinkedIn Corporation All Rights Reserved ID: 274675
Download Presentation The PPT/PDF document "LinkedIn" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
O’Reilly Strata - February 28, 2013Sam Shah @sam_shah Pete Skomoroch @peteskomoroch
©2012 LinkedIn Corporation. All Rights Reserved.Slide2
Sam
ShahPrincipal Engineer and Engineering Manager
@
sam_shah
www.linkedin.com/in/shahsam
©2012 LinkedIn Corporation. All Rights Reserved.
Peter
Skomoroch
Principal Data Scientist@peteskomorochwww.linkedin.com/in/peterskomorochSlide3
LinkedIn: The Professional Profile of Record©2012 LinkedIn Corporation. All Rights Reserved.
3
200+M
Members
200M
Member
ProfilesSlide4
LinkedIn’s Latest Data Product: Skill Endorsements
4Slide5
5
Viral Growth: 800M Endorsements in 4 MonthsSlide6
Data Amplifies Desire6
Desire + Social Proof
Viral Loops + Network Effects
Data Foundation + Recommendation AlgorithmsSlide7
71) Desire & Social ProofSlide8
A endorses B
B notified
B “accepts” endorsement
B endorses C
B endorses D
Endorsement recommendations
Email
Notification
News Feed
2
) Viral Loops & Network EffectsSlide9
3) Data Foundation: Skills & Suggested Skills9Slide10
Data Foundation: LinkedIn Skills10Slide11
Social Tagging Accelerates Adoption
Suggested endorsements
Skill
recommendations
Skill marketing
©2012 LinkedIn
Cororation
. All Rights Reserved.
Virality
onlySlide12
Outline12
Skill discoverySkill tagging
Skill recommendations
Suggested endorsementsSlide13
Unsupervised Topic Discovery from Profiles13
ExtractSlide14
What is the skills dictionary?A growing taxonomy
of skillsGenerated by mining profiles and maintained by the Skills team at LinkedInCreated using clustering and crowdsourcing. Multiple phrases, acronyms, and misspellings map to a single standardized skill.
250+ different phrases map to “Microsoft Office”
Building the Skills Dictionary
14
Profile
(specialties)
Tokenization
Clustering
Crowdsourcing
TaxonomySlide15
Topic Clustering & Phrase Sense Disambiguation15Slide16
ms officems office suite
computer skills including ms officeoffice 97microsoft office usermac officemicrosoft office 2003 & 2007microsoft office suits
microsoft
oficemicrosoft ofiicems office certifiedoffice 98…
Skills Dictionary: Microsoft Office16
Microsoft
Office
(Skill ID = 366)Slide17
Deduplication Signals from Mechanical Turk17Slide18
Sample Task for Mechanical Turk Workers18Slide19
Skill Phrase Deduplication19Slide20
Outline20
Skill discoverySkill tagging
Skill recommendations
Suggested endorsementsSlide21
Skills ClassificationUse skill dictionary metadata to tag
, standardize and infer skillsRun classifiers for each skill on member profiles
21
Public Speaking
Ruby on Rails
Entrepreneurship
Microsoft Office
AP StyleSlide22
Lead designer and engineer for the implementation of a user-centric, fully-configurable UI for data aggregation and reporting.
Developed over 20 SaaS custom applications using Python, Javascript and RoR
.
Tagging Skill Phrases
Tagging: Extract potential skill phrases from text
Standardize unambiguous phrase variants
22
JavaScript
RoR
SaaS
Python
ror
rubyonrails
ruby on rails
development
ruby
rails
ruby on rail
Ruby on Rails
Document (ex: Profile)
Tokenization
Skills Tagger
Phrases
(up to
6 words)
Skills Classifier
Skills
(unordered)
Skills
(ranked by relevance)Slide23
Outline23
Skill discoverySkill tagging
Skill recommendations
Suggested endorsementsSlide24
The skills classifier computes the likelihood of a member to have a skill based on the member’s profile, other profiles which share common attributes and their connections.
Skills Classification on Member Profiles
24
Tagging
Tokenize free
text into
phrase
tags
Standardization
Transform tags
into potential skills
Inference
Rank skills by
likelihood
Profile
text
Profile attributes & network
signalsSlide25
Skill Inference
How suggested/inferred skills work:Profiles with skills help build a massive dataset of (attribute: skills). Example with a title:
25
Profile
Extract attributes
- Company ID
- Title ID
- Groups ID
- Industry ID
- …
Skills Classifier
Skills
(ranked by likelihood)
Feature
Vectors
Software Engineer Java 100 000
Software Engineer
C++ 88 000
…
Title
Skill Occurrences
Slide26
Skill Inference
How suggested/inferred skills work:The skill likelihood is a conditional modelProbabilities are combined using a Naïve Bayes Classifier
If you are an engineer at Apple, you probably know about iPhone Development.
26
Profile
Extract attributes
- Company ID
- Title ID
- Groups ID
- Industry ID
- …
Skills Classifier
Skills
(ranked by likelihood)
Feature
VectorsSlide27Slide28Slide29
Skill Suggestions for Your LinkedIn Profile29
49% Conversion
4% ConversionSlide30
Outline30
Skill discoverySkill tagging
Skill recommendations
Suggested endorsementsSlide31
Social Tagging via Skill Endorsements31Slide32
Suggesting EndorsementsPeople-skill combinations in a member’s networkBinary classification
FeaturesSkill inference scoreCompany overlapSchool overlapGroup overlapIndustry and functional area similarityTitle similaritySite interactionsCo-interactions
32
Candidate
generation
- Company
- Title
- Groups
- Industry
- …
Classifier
Suggested Endorsements
(ranked by likelihood)
Feature
VectorsSlide33
Social Tagging Accelerates Adoption
Skill endorsements
Skill
recommendations
Skill marketing
©2012 LinkedIn
Cororation
. All Rights Reserved.Slide34
Can We Find Influencers In Venture Capital?34Slide35
Which Skills Are Important for a Data Scientist?35Slide36
What Technologies are Professionals Adopting?36Slide37
Data Amplifies Desire37
Desire + Social Proof
Viral Loops + Network Effects
Data
Catalyst + Recommendation AlgorithmsSlide38
Infrastructure©2012 LinkedIn Corporation. All Rights Reserved.
38
Apache
Hadoop
: Parallel
processing architecture
Apache Kafka: Ingress pipes
Azkaban:
Hadoop
scheduler
Voldemort
: Egress database
Apache
Pig: High-level MR language
DataFu
: Convenience routines
http://data.linkedin.com
R.
Sumbaly
,
J.
Kreps, and
S. Shah. “The ‘Big Data’
ecosystem at
LinkedIn”. In
SIGMOD 2013 (to appear
).Slide39
data.linkedin.comLearning More