How does the world communicate Jure Leskovec jurecscmuedu Machine Learning Department httpwwwcscmuedu jure Joint work with Eric Horvitz Microsoft Research Networks Why ID: 591200
Download Presentation The PPT/PDF document "Microsoft Instant Messenger Communicatio..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Microsoft Instant Messenger Communication NetworkHow does the world communicate?
Jure Leskovec (jure@cs.cmu.edu)Machine Learning Departmenthttp://www.cs.cmu.edu/~jure
Joint work with: Eric Horvitz, Microsoft ResearchSlide2
Networks: Why?
Today: large on-line systems leave detailed records of social activityOn-line communities: MyScace, Facebook Email, blogging, instant messagingOn-line publications repositories, arXiv, MedLineEmerging behavior (need lots of data):
Actions of individual nodes are independent but global patterns and regularities emergeSlide3
The Largest Social NetworkWhat is the largest social network in the world (that we can relatively easily obtain)?
For the first time we had a chance to look at complete (anonymized) communication of the whole planet (using Microsoft MSN instant messenger network)3Slide4
Instant Messaging
Contact (buddy) list Messaging window
4Slide5
Instant Messaging as a Network5
Buddy
ConversationSlide6
IM – Phenomena at planetary scaleObserve social phenomena at planetary scale:
How does communication change with user demographics (distance, age, sex)?How does geography affect communication?What is the structure of the communication network?6Slide7
Communication dataThe record of communication
Presence data user status events (login, status change)Communication data who talks to whomDemographics data user age, sex, location7Slide8
Data description: PresenceEvents:Login, Logout
Is this first ever loginAdd/Remove/Block buddyAdd unregistered buddy (invite new user)Change of status (busy, away, BRB, Idle,…)For each event:User IdTime8Slide9
Data description: CommunicationFor every conversation (session) we have a list of users who participated in the conversation
There can be multiple people per conversationFor each conversation and each user:User IdTime JoinedTime LeftNumber of Messages SentNumber of Messages Received9Slide10
Data description: DemographicsFor every user (self reported):Age
GenderLocation (Country, ZIP)LanguageIP address (we can do reverse geo IP lookup)10Slide11
Data collectionLog size: 150Gb/day
Just copying over the network takes 8 to 10hParsing and processing takes another 4 to 6hAfter parsing and compressing ~ 45 Gb/dayCollected data for 30 days of June 2006:Total: 1.3Tb of compressed data
11Slide12
Network: Conversations12
ConversationSlide13
Data statisticsActivity over June 2006 (30 days)245 million users logged in
180 million users engaged in conversations17,5 million new accounts activatedMore than 30 billion conversations13Slide14
Data statistics per dayActivity on June 1 20061 billion conversations
93 million users login65 million different users talk (exchange messages)1.5 million invitations for new accounts sent14Slide15
User characteristics: age
15Slide16
Age piramid: MSN vs. the world
16Slide17
Conversation: Who talks to whom?Cross gender edges:
300 male-male and 235 female-female edges640 million female-male edges17Slide18
Number of people per conversation
Max number of people simultaneously talking is 20, but conversation can have more people18Slide19
Conversation durationMost conversations are short
19Slide20
Conversations: number of messages
Sessions between fewer people run out of steam20Slide21
Time between conversationsIndividuals are highly diverse
What is probability to login into the system after t minutes?Power-law with exponent 1.5Task queuing model [Barabasi]My email, Darvin’s and Einstein’s letters follow the same pattern
21Slide22
Age: Number of conversations
User self reported age
High
Low
22Slide23
Age: Total conversation duration
User self reported age
High
Low
23Slide24
Age: Messages per conversation
User self reported age
High
Low
24Slide25
Age: Messages per unit time
User self reported age
High
Low
25Slide26
Who talks to whom: Number of conversations
26Slide27
Who talks to whom: Conversation duration
27Slide28
Geography and communicationCount the number of users logging in from particular location on the earth
28Slide29
How is Europe talking
Logins from Europe29Slide30
Users per geo location
Blue circles have more than 1 million logins.30Slide31
Users per capita
Fraction of population using MSN:Iceland: 35%Spain: 28%Netherlands, Canada, Sweden, Norway: 26%
France, UK: 18%USA, Brazil: 8%
31Slide32
Communication heat mapFor each conversation between geo points (A,B) we increase the intensity on the line between A and B
32Slide33
Correlation:
Probability
:
Homophily
(
gliha
v
kup
štriha)
Age vs. Age
33Slide34
Per country statisticsOn a particular typical day…
34Country# of logins
# of users
# of messages
Messages
per user
USA
38,319,363
13,261,337
412,729,278
31.12
Brazil
20,582,613
7,864,424
467,972,522
59.50
France
19,163,131
6,475,858
518,931,785
80.13
Unknown
18,444,352
6,872,347
191,167,085
27.81
Spain
16,868,549
6,140,895
503,759,240
82.03
UK
16,659,009
5,724,826
487,018,470
85.07
Canada
14,558,692
5,021,185
160,249,686
31.91
China
14,225,163
5,314,463
101,003,729
19.00
Turkey
13,619,789
4,696,555
353,540,475
75.27
Mexico
10,756,989
4,359,932
209,195,100
47.98
Note that global usage and market share statistics are higher if we accumulate data over longer time periods.Slide35
Per typical user per countryOn a typical day MSN user from a country …
35CountryLogins on a particular day
Users on a particular
day
Messages sent
Messages per user
Slovenia
364,988
130,884
15,919,892
121.6335992
Malta
122,846
41,829
4,993,316
119.3745009
Hungary
1,214,268
427,320
47,623,604
111.4471684
Bosnia
105,584
35,689
3,254,170
91.18131637
Teunion
100,335
33,399
3,041,635
91.0696428
Gibraltar
19,096
6,452
581,195
90.07982021
UK
16,659,009
5,724,826
487,018,470
85.07131396
Macedonia
126,729
43,754
3,669,977
83.87751977
Netherlands
7,399,160
2,696,669
221,300,210
82.06428375
Spain
16,868,549
6,140,895
503,759,240
82.03352117
Note that global usage and market share numbers are higher if we accumulate data over longer time periods.Slide36
What about Slovenia (per capita)?
StatisticNumberRank (
per capita)
Conversations inside
19,868,886
22
Conversation to outside
7,868,483
48
Total conversations
27,737,369
29
Avg.
time inside
309.49
147
Avg. time outside
314.39
80
Avg. time inside (pct.)
0.4960
Messages sent inside
9.78
32
Messages sent outside
9.46
19
Messages inside (pct.)
0.5083
36Slide37
Who is Slovenia talking to?37
RankTarget Country
Pairs of people
Number of conversations
Avg. time
per conv.
Avg. # of messages
1
Slovenia
13,41,250
19,868,886
309.4
9.78
2
USA
61,794
922,527
303.4
9.14
3
Spain
27,650
310,357
289.4
7.97
4
UK
14,709
204,335
325.4
9.02
5
Germany
9,047
129,551
350.3
10.20
6
Bosnia
9,956
114,509
385.9
14.62
7
Yugoslavia
8,194
104,270
381.7
12.55
8
Italy
8,612
100,698
358.8
9.89
9
Croatia
6,838
84,362
359.011.0010
Turkey10,76377,651292.4
8.0811Albania
9,517
76,440320.710.88
12Sweden5,083
69,019306.98.34
13Netherlands5,061
68,287315.98.87
14
Canada
5,003
60,617
301.8
7.38Slide38
Instant Messaging as a Network38
BuddySlide39
IM Communication NetworkBuddy graph:
240 million people (people that login in June ’06)9.1 billion edges (friendship links)Communication graph:There is an edge if the users exchanged at least one message in June 2006180 million people1.3 billion edges30 billion conversations39Slide40
Buddy network: Number of buddies
Buddy graph: 240 million nodes, 9.1 billion edges (~40 buddies per user)40Slide41
Communication Network: DegreeNumber of people a users talks to in a month
41Slide42
Network: Small-world
6 degrees of separation [Milgram ’60s]Average distance 5.590% of nodes can be reached in < 8 hops
Hops
Nodes
1
10
2
78
3
396
4
8648
5
3299252
6
28395849
7
79059497
8
52995778
9
10321008
10
1955007
11
518410
12
149945
13
44616
14
13740
15
4476
16
1542
17
536
18
167
19
71
20
29
21
16
22
10
23
3
24
2
25
3Slide43
Network: Searchability
Milgram’s experiment showed:(1) short paths exist in networks(2) humans are able to find themAssume the following setting:Nodes are scattered on a planeGiven starting node
u and we want to reach target node vAlgorithm: always navigate to a neighbor that is geographically closest to target node
vSurprise: Geo-routing finds the short paths (for appropriate distance measure)
43
u
vSlide44
Communication network: Clustering
How many triangles are closed?Clustering normally decays as k-1Communication network is highly clustered: k-0.37
High clustering
Low clustering
44Slide45
Communication Network Connectivity
45Slide46
k-Cores decompositionWhat is the structure of the core of the network?
46Slide47
k-Cores: core of the networkPeople with k<20 are the periphery
Core is composed of 79 people, each having 68 edges among them47Slide48
Network robustnessWe delete nodes (
in some order) and observe how network falls apart:Number of edges deletedSize of largest connected component48Slide49
Robustness: Nodes vs. Edges
49Slide50
Robustness: Connectivity
50Slide51
ConclusionA first look at planetary scale social networkThe largest social network analyzed
Strong presence of homophily: people that communicate share attributesWell connected: in only few hops one can research most of the networkVery robust: Many (random) people can be removed and the network is still connected51Slide52
ReferencesLeskovec and Horvitz:
Worldwide Buzz: Planetary-Scale Views on an Instant-Messaging Network, 2007http://www.cs.cmu.edu/~jure 52