Introduction 2 Model Social Graph From SIGMOD11tutorial 3 Model Social Graph From SIGMOD11tutorial 4 Model Social Graph From SIGMOD11tutorial 5 Model Social Graph Facebook graph from ID: 218489
Download Presentation The PPT/PDF document "1 Privacy in Social Networks:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
Privacy in Social Networks:
IntroductionSlide2
2
Model: Social Graph
From SIGMOD11-tutorialSlide3
3
Model: Social Graph
From SIGMOD11-tutorialSlide4
4
Model: Social Graph
From SIGMOD11-tutorialSlide5
5
Model: Social Graph
Facebook
graph from:
http://www.flickr.com/photos/greenem/11696663/Slide6
6
Model: Social Graph
Twitter
graph from:
http://www.connectedaction.net/2009/03/30/social-networks-in-the-news/Slide7
7
Social
networks model social relationships by
graph structures using vertices and edges.
Vertices model individual social actors in a network, while edges model relationships between social actors.
Model: Social Graph
Labels (type of edges, vertices)
Directed/undirected
G = (V, E, L, L
V
, L
E
) V: set of vertices (nodes), E
V x V, set of edges, L set of labels,
L
V
: V
L, L
E:
E
L
Bipartite graphs Tag – Document - UsersSlide8
8Slide9
9
Digital traces in a wide variety of on-line settings =>
rich sources of data for large-scale studies of social networks
Some made based on publicly
crawlable blocking and social networking sites =>
users have explicitly "chosen" to publish their links to others
Privacy Preserving PublishingSlide10
10
Privacy Preserving Publishing
User
Attacker
Background Knowledge
participation in many networks
or
Specific Attack
Types of Attacks
structural
active
vs
passive
Quasi Identifiers
Analysts
Utility
Graph properties
number of nodes/edges
average path length, network diameter
clustering coefficient
average degree, degree distribution
Slide11
11
Mappings that preserve the graph structure
A
graph
homomorphism f
from
a
graph
G
= (
V
,
E
)
to
a
graph
G
' = (
V
',
E'), is a mapping f: G G’, from the vertex set of G to the vertex set of G’ such that (u, u’) G (f(u), f(u’)) G’ If the homomorphism is a
bijection
whose
inverse function
is
also
a
graph
homomorphism
,
then
f
is
a
graph
isomorphism
[
(u, u’)
G
(f(u), f(u’))
G’]Slide12
12Slide13
13
The general graph isomorphic problem which determines whether two graphs are isomorphic is NP-hardSlide14
14
Privacy Preserving PublishingSlide15
15
Mappings that preserve the graph structure
A graph
automorphism
is a graph isomorphism with itself, i.e, a mapping from the vertices of the given graph G back to vertices of G such that the resulting graph is isomorphic with G. An
automorphism
f is
non-trivial
if it is not identity function.
A
bijection
,
or
a
bijective
function
,
is
a f
unction
f
from
a set X to a set Y with the property that, for every y in
Y,
there
is
exactly
one
x
in
X
such
that
f(x) = y.
Alternatively
, f
is
bijective
if
it
is
a
one
-
to
-
one
correspondence
between
those
sets
; i
.e.
,
both
one
-
to
-
one
(injective
)
and
onto
(s
urjective
)
).Slide16
16
Social networks: Privacy classified into
vertex existence
Identity disclosure
L
ink or edge disclosure
vertex (or link attribute) disclosure (sensitive or non-sensitive attributes)
content
disclosure: the sensitive data associated with each vertex is compromised, for example, the email message sent and/or received by the individuals in an email communication network
.
property disclosure
Privacy Models
Relational data: Identify (sensitive attribute of an individual)
Background knowledge and attack model: know the values of quasi identifiers and attacks come from identifying individuals from quasi identifiersSlide17
17
Anonymization Methods
Clustering-based
or Generalization-based approaches
:
cluster
vertices and edges into groups and
replace
a
subgraph
with a super-vertex
Graph Modification approaches
: modifies (inserts or deletes) edges and vertices in the graph (Perturbations)Slide18
18
A subgraph H of a graph G is said to be
induced if,
for any pair of vertices x and y of H, (x, y)
is an edge of H if and only if
(
x
,
y
)
is an edge of G.
In other words, H is an induced subgraph of G if it has exactly the edges that appear in G over the same vertex set.
If the vertex set of H is the subset S of V(G), then H can be written as G[S] and is said to be
induced by
S.
Neighborhood
Some Graph-Related DefinitionsSlide19
19
Publishing
Accessing the Risk (privacy score, analysis)
Access Control (tools, etc)
Next
Active Attack
Example of publishingSlide20
20
Type of Attacks
Slide21
21
Active and Passive Attacks
Lars Backstrom, Cynthia Dwork and Jon Kleinberg,
Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography
Proceedings of the 16th international conference on World Wide Web,
2007
(
WWW07
)Slide22
22
Model
Purest form of social network:
Nodes corresponding to individuals Edges indicating social interactions
(no labels, no directions, no annotations)
Simple
Anonymization
Can this work?Slide23
23
Walk-based Active Attack
3 Requirement for the construction of HSlide24
24
Experiments
Data: Network of friends on
LiveJournal
4.4∙106 nodes, 77∙106 edges
Uniqueness: With 7 nodes, an average of 70 nodes can be de-anonymized
Although log(4.4∙10
6
) ≈ 15
Efficiency: |T| is typically ~9∙10
4
Detectability:
Only 7 nodes
Many subgraphs of 7 nodes in G are dense and well-connectedSlide25
25
Probability that
H
is UniqueSlide26
26
Efficient recovery
Detectability
Only 7 nodesInternal structure Slide27
27
Passive Attack
H
is a coalition, recovered by same search algorithmNothing guaranteed, but works in practiceSlide28
28
Passive AttackSlide29
29
Passive Attacks
An adversary tries to learn the identities of the nodes only
after
the anonymized
network has been released
Simply
try to find themselves
in the released network and from this to discover the existence of edges among users to whom they are linked
a user can
collude with a coalition of k-1
friends after the release
Active Attacks
An
adversary tries to compromise privacy by strategically creating new user accounts and links
before
the
anonymized
network is
released
Active work in with high probability in any network – passive rely on the chance that a use can uniquely find themselves after the network is released
Passive attacks can
only compromise
the privacy of users
linked
to the
attacker
Passive attacks, no observable wrong-doingSlide30
30Slide31
31
Επιπλέον υλικό από διάφορες παρουσιάσεις αυτού του άρθρουSlide32
32
Note that the adversary may be a user of the system being anomymized
Focus of the paper:
Identify type of attacks that even from
a single anonymized copy
of a social network, it is possible for an adversary to learn
whether edges exist or not
between specific targeted pair of nodes
Privacy threat: De-anonymize 2 nodes and learn if connectedSlide33
33Slide34
34Slide35
35
Active Attacks - Challenges
Let
G be the network, H the subgraph
With high probability, H must be:Uniquely identifiable in GFor any GEfficiently locatable
Tractable instance of subgraph isomorphism
But undetectable
From the point of view of the
data curatorSlide36
36
Active Attacks - Approaches
Basic idea:
H is randomly generatedStart with
k nodes, add edges independently at randomTwo variants:k = Θ(log
n
) de-anonymizes
Θ
(log
2
n
) users
k
=
Θ
(√log
n
) de-anonymizes
Θ
(√ log
n
) users
H needs to be “more unique”
Achieved by “thin” attachment of H to G
The “Walk-based” attack
– better in practice
The “Cut-based” attack –
matches theoretical boundSlide37
37
Outline
Attacks on anonymized networks –
high level descriptionThe Walk-Based active attack
DescriptionAnalysisExperimentsPassive attackSlide38
38
The Walk-Based Attack – Simplified Version
Construction:
Pick target users
W = {w1,…,wk
}
Create new users
X
= {
x
1
,…,
x
k
} and random subgraph
G
[
X
] =
H
Add edges (
x
i
,
w
i
)RecoveryFind H in G ↔ No subgraph of G isomorphic to HLabel H as x1,…,x
k
↔ No automorphisms
Find
w
1
,…,
w
k
W
1
X
2
W
2
X
1Slide39
39
The Walk-Based Attack –
Full Version
Construction:Pick target users W = {
w1,…,wb}Create new users X
= {
x
1
,…,
x
k
} and
H
Connect
w
i
to a
unique
subset
N
i
of
X
Between
H and G – H Add Δi edges from x
i
where
d
0
≤
Δ
i
≤
d
1
=
O
(log
n
)
Inside
H
, add edges (
x
i
,
x
i+1
)
To help find
H
X
1
X
2
X
3Slide40
40
(2+
δ
)log
n
O
(log
2
n
)
w
1
w
2
w
4
w
3
x
1
x
2
x
3
N
1
Δ
3
Total degree of
x
i
is
Δ
'
i
G
Construction of
HSlide41
41Slide42
42Slide43
43
Recovering
H
Search G based on:Degrees Δ'i
Internal structure of H
α
1
α
l
Search tree
T
G
root
f
(
α
1
)
f
(
α
l
)
v
βSlide44
44Slide45
45
Analysis
Theorem 1 [
Correctness]: With high probability,
H is unique in G. Formally: H is a
random
subgraph
G
is
arbitrary
Edges between H and G – H are
arbitrary
There are edges (
x
i
,
x
i
+1
)
Then WHP no subgraph of
G
is isomorphic to
H
.
Theorem 2 [Efficiency]: Search tree T does not grow too large. Formally: For every ε, WHP the size of T is O(n
1+
ε
)Slide46
46
Theorem 1 [
Correctness
]H is unique in G. Two cases:
For no disjoint subset S, G[
S
] isomorphic to
H
For no
overlapping
S
,
G
[
S
] isomorphic to
H
Case 1:
S
= <
s
1
,…,
s
k
> nodes in G – H εS – the event that si ↔ xi is an isomorphism By Union Bound,Slide47
47
Theorem 1 continued
Case 2:
S and X overlap. Observation – H does no have much internal symmetry
Claim (a): WHP, there are no disjoint isomorphic subgraphs of size c1
log
k
in
H
. Assume this from now on.
Claim (b):
Most of A goes to B, most of Y is fixed under
f
(except
c
1
log
k
nodes) (except
c
2
log
k
nodes)
G
X
B
Y
A
B
Y
Y
A
fSlide48
48
Theorem 1 - Proof
What is the probability of an overlapping second copy of
H in
G?fABCD : A
U
Y → B
U
Y = X
Let
j
= |
A
| = |
B
| = |
C
|
ε
ABCD
– the event that
f
ABCD
is
an isomorphism
#random edges inside C ≥ j(j-1)/2 – (j-1)#random edges between C and Y' ≥ (|Y'|)j – 2jProbability that the random edges match those of A Pr[
ε
ABCD
] ≤ 2
#random edges
X
A
D
Y'
B
C
A
B,C
DSlide49
49
Theorem 2 [
Efficiency
]Claim: Size of search tree T is near-linear.
Proof uses similar methods:Define random variables: #nodes in T
=
Γ
Γ
=
Γ
' +
Γ
'' = #paths in
G
–
H
+ #paths passing in
H
This time we bound
E
(
Γ
') [and similarly
E
(
Γ
'')]Number of paths of length j with max degree d1 is bounded Probability of such a path to have correct internal structure is boundedE(Γ') ≤ (#paths * Pr[correct internal struct])Slide50
50Slide51
51Slide52
52Slide53
53Slide54
54Slide55
55Slide56
56Slide57
57Slide58
58Slide59
59Slide60
60
Outline
Attacks on anonymized networks –
high level descriptionThe Walk-Based active attackDescription
AnalysisExperimentsPassive attackSlide61
61Slide62
62Slide63
63
Passive Attack - ResultsSlide64
64
Passive Attack
H
is a coalition, recovered by same search algorithmNothing guaranteed, but works in practiceSlide65
65Slide66
66
Potential SolutionsSlide67
67