/
1 Privacy in Social Networks: 1 Privacy in Social Networks:

1 Privacy in Social Networks: - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
403 views
Uploaded On 2015-12-08

1 Privacy in Social Networks: - PPT Presentation

Introduction 2 Model Social Graph From SIGMOD11tutorial 3 Model Social Graph From SIGMOD11tutorial 4 Model Social Graph From SIGMOD11tutorial 5 Model Social Graph Facebook graph from ID: 218489

edges graph nodes social graph edges social nodes attack passive network attacks model privacy set based active vertices users

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Privacy in Social Networks:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Privacy in Social Networks:

IntroductionSlide2

2

Model: Social Graph

From SIGMOD11-tutorialSlide3

3

Model: Social Graph

From SIGMOD11-tutorialSlide4

4

Model: Social Graph

From SIGMOD11-tutorialSlide5

5

Model: Social Graph

Facebook

graph from:

http://www.flickr.com/photos/greenem/11696663/Slide6

6

Model: Social Graph

Twitter

graph from:

http://www.connectedaction.net/2009/03/30/social-networks-in-the-news/Slide7

7

Social

networks model social relationships by

graph structures using vertices and edges.

Vertices model individual social actors in a network, while edges model relationships between social actors.

Model: Social Graph

Labels (type of edges, vertices)

Directed/undirected

G = (V, E, L, L

V

, L

E

) V: set of vertices (nodes), E

 V x V, set of edges, L set of labels,

L

V

: V

L, L

E:

E

L

Bipartite graphs Tag – Document - UsersSlide8

8Slide9

9

Digital traces in a wide variety of on-line settings =>

rich sources of data for large-scale studies of social networks

Some made based on publicly

crawlable blocking and social networking sites =>

users have explicitly "chosen" to publish their links to others

Privacy Preserving PublishingSlide10

10

Privacy Preserving Publishing

User

Attacker

Background Knowledge

participation in many networks

or

Specific Attack

Types of Attacks

structural

active

vs

passive

Quasi Identifiers

Analysts

Utility

Graph properties

number of nodes/edges

average path length, network diameter

clustering coefficient

average degree, degree distribution

Slide11

11

Mappings that preserve the graph structure

A

graph

homomorphism f

from

a

graph

G

= (

V

,

E

)

to

a

graph

G

' = (

V

',

E'), is a mapping f: G  G’, from the vertex set of G to the vertex set of G’ such that (u, u’)  G  (f(u), f(u’))  G’ If the homomorphism is a

bijection

whose

inverse function

is

also

a

graph

homomorphism

,

then

f

is

a

graph

isomorphism

[

(u, u’)

 G 

(f(u), f(u’))

 G’]Slide12

12Slide13

13

The general graph isomorphic problem which determines whether two graphs are isomorphic is NP-hardSlide14

14

Privacy Preserving PublishingSlide15

15

Mappings that preserve the graph structure

A graph

automorphism

is a graph isomorphism with itself, i.e, a mapping from the vertices of the given graph G back to vertices of G such that the resulting graph is isomorphic with G. An

automorphism

f is

non-trivial

if it is not identity function.

A

bijection

,

or

a

bijective

function

,

is

a f

unction

f

from

a set X to a set Y with the property that, for every y in

Y,

there

is

exactly

one

x

in

X

such

that

f(x) = y.

Alternatively

, f

is

bijective

if

it

is

a

one

-

to

-

one

correspondence

between

those

sets

; i

.e.

,

both

one

-

to

-

one

(injective

)

and

onto

(s

urjective

)

).Slide16

16

Social networks: Privacy classified into

vertex existence

Identity disclosure

L

ink or edge disclosure

vertex (or link attribute) disclosure (sensitive or non-sensitive attributes)

content

disclosure: the sensitive data associated with each vertex is compromised, for example, the email message sent and/or received by the individuals in an email communication network

.

property disclosure

Privacy Models

Relational data: Identify (sensitive attribute of an individual)

Background knowledge and attack model: know the values of quasi identifiers and attacks come from identifying individuals from quasi identifiersSlide17

17

Anonymization Methods

Clustering-based

or Generalization-based approaches

:

cluster

vertices and edges into groups and

replace

a

subgraph

with a super-vertex

Graph Modification approaches

: modifies (inserts or deletes) edges and vertices in the graph (Perturbations)Slide18

18

A subgraph H of a graph G is said to be

induced if,

for any pair of vertices x and y of H, (x, y)

is an edge of H if and only if

(

x

,

y

)

is an edge of G.

In other words, H is an induced subgraph of G if it has exactly the edges that appear in G over the same vertex set.

If the vertex set of H is the subset S of V(G), then H can be written as G[S] and is said to be

induced by

S.

Neighborhood

Some Graph-Related DefinitionsSlide19

19

Publishing

Accessing the Risk (privacy score, analysis)

Access Control (tools, etc)

Next

Active Attack

Example of publishingSlide20

20

Type of Attacks

Slide21

21

Active and Passive Attacks

Lars Backstrom, Cynthia Dwork and Jon Kleinberg,

Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography

Proceedings of the 16th international conference on World Wide Web,

2007

(

WWW07

)Slide22

22

Model

Purest form of social network:

Nodes corresponding to individuals Edges indicating social interactions

(no labels, no directions, no annotations)

Simple

Anonymization

Can this work?Slide23

23

Walk-based Active Attack

3 Requirement for the construction of HSlide24

24

Experiments

Data: Network of friends on

LiveJournal

4.4∙106 nodes, 77∙106 edges

Uniqueness: With 7 nodes, an average of 70 nodes can be de-anonymized

Although log(4.4∙10

6

) ≈ 15

Efficiency: |T| is typically ~9∙10

4

Detectability:

Only 7 nodes

Many subgraphs of 7 nodes in G are dense and well-connectedSlide25

25

Probability that

H

is UniqueSlide26

26

Efficient recovery

Detectability

Only 7 nodesInternal structure Slide27

27

Passive Attack

H

is a coalition, recovered by same search algorithmNothing guaranteed, but works in practiceSlide28

28

Passive AttackSlide29

29

Passive Attacks

An adversary tries to learn the identities of the nodes only

after

the anonymized

network has been released

Simply

try to find themselves

in the released network and from this to discover the existence of edges among users to whom they are linked

a user can

collude with a coalition of k-1

friends after the release

Active Attacks

An

adversary tries to compromise privacy by strategically creating new user accounts and links

before

the

anonymized

network is

released

Active work in with high probability in any network – passive rely on the chance that a use can uniquely find themselves after the network is released

Passive attacks can

only compromise

the privacy of users

linked

to the

attacker

Passive attacks, no observable wrong-doingSlide30

30Slide31

31

Επιπλέον υλικό από διάφορες παρουσιάσεις αυτού του άρθρουSlide32

32

Note that the adversary may be a user of the system being anomymized

Focus of the paper:

Identify type of attacks that even from

a single anonymized copy

of a social network, it is possible for an adversary to learn

whether edges exist or not

between specific targeted pair of nodes

Privacy threat: De-anonymize 2 nodes and learn if connectedSlide33

33Slide34

34Slide35

35

Active Attacks - Challenges

Let

G be the network, H the subgraph

With high probability, H must be:Uniquely identifiable in GFor any GEfficiently locatable

Tractable instance of subgraph isomorphism

But undetectable

From the point of view of the

data curatorSlide36

36

Active Attacks - Approaches

Basic idea:

H is randomly generatedStart with

k nodes, add edges independently at randomTwo variants:k = Θ(log

n

) de-anonymizes

Θ

(log

2

n

) users

k

=

Θ

(√log

n

) de-anonymizes

Θ

(√ log

n

) users

H needs to be “more unique”

Achieved by “thin” attachment of H to G

The “Walk-based” attack

– better in practice

The “Cut-based” attack –

matches theoretical boundSlide37

37

Outline

Attacks on anonymized networks –

high level descriptionThe Walk-Based active attack

DescriptionAnalysisExperimentsPassive attackSlide38

38

The Walk-Based Attack – Simplified Version

Construction:

Pick target users

W = {w1,…,wk

}

Create new users

X

= {

x

1

,…,

x

k

} and random subgraph

G

[

X

] =

H

Add edges (

x

i

,

w

i

)RecoveryFind H in G ↔ No subgraph of G isomorphic to HLabel H as x1,…,x

k

↔ No automorphisms

Find

w

1

,…,

w

k

W

1

X

2

W

2

X

1Slide39

39

The Walk-Based Attack –

Full Version

Construction:Pick target users W = {

w1,…,wb}Create new users X

= {

x

1

,…,

x

k

} and

H

Connect

w

i

to a

unique

subset

N

i

of

X

Between

H and G – H Add Δi edges from x

i

where

d

0

Δ

i

d

1

=

O

(log

n

)

Inside

H

, add edges (

x

i

,

x

i+1

)

To help find

H

X

1

X

2

X

3Slide40

40

(2+

δ

)log

n

O

(log

2

n

)

w

1

w

2

w

4

w

3

x

1

x

2

x

3

N

1

Δ

3

Total degree of

x

i

is

Δ

'

i

G

Construction of

HSlide41

41Slide42

42Slide43

43

Recovering

H

Search G based on:Degrees Δ'i

Internal structure of H

α

1

α

l

Search tree

T

G

root

f

(

α

1

)

f

(

α

l

)

v

βSlide44

44Slide45

45

Analysis

Theorem 1 [

Correctness]: With high probability,

H is unique in G. Formally: H is a

random

subgraph

G

is

arbitrary

Edges between H and G – H are

arbitrary

There are edges (

x

i

,

x

i

+1

)

Then WHP no subgraph of

G

is isomorphic to

H

.

Theorem 2 [Efficiency]: Search tree T does not grow too large. Formally: For every ε, WHP the size of T is O(n

1+

ε

)Slide46

46

Theorem 1 [

Correctness

]H is unique in G. Two cases:

For no disjoint subset S, G[

S

] isomorphic to

H

For no

overlapping

S

,

G

[

S

] isomorphic to

H

Case 1:

S

= <

s

1

,…,

s

k

> nodes in G – H εS – the event that si ↔ xi is an isomorphism By Union Bound,Slide47

47

Theorem 1 continued

Case 2:

S and X overlap. Observation – H does no have much internal symmetry

Claim (a): WHP, there are no disjoint isomorphic subgraphs of size c1

log

k

in

H

. Assume this from now on.

Claim (b):

Most of A goes to B, most of Y is fixed under

f

(except

c

1

log

k

nodes) (except

c

2

log

k

nodes)

G

X

B

Y

A

B

Y

Y

A

fSlide48

48

Theorem 1 - Proof

What is the probability of an overlapping second copy of

H in

G?fABCD : A

U

Y → B

U

Y = X

Let

j

= |

A

| = |

B

| = |

C

|

ε

ABCD

– the event that

f

ABCD

is

an isomorphism

#random edges inside C ≥ j(j-1)/2 – (j-1)#random edges between C and Y' ≥ (|Y'|)j – 2jProbability that the random edges match those of A Pr[

ε

ABCD

] ≤ 2

#random edges

X

A

D

Y'

B

C

A

B,C

DSlide49

49

Theorem 2 [

Efficiency

]Claim: Size of search tree T is near-linear.

Proof uses similar methods:Define random variables: #nodes in T

=

Γ

Γ

=

Γ

' +

Γ

'' = #paths in

G

H

+ #paths passing in

H

This time we bound

E

(

Γ

') [and similarly

E

(

Γ

'')]Number of paths of length j with max degree d1 is bounded Probability of such a path to have correct internal structure is boundedE(Γ') ≤ (#paths * Pr[correct internal struct])Slide50

50Slide51

51Slide52

52Slide53

53Slide54

54Slide55

55Slide56

56Slide57

57Slide58

58Slide59

59Slide60

60

Outline

Attacks on anonymized networks –

high level descriptionThe Walk-Based active attackDescription

AnalysisExperimentsPassive attackSlide61

61Slide62

62Slide63

63

Passive Attack - ResultsSlide64

64

Passive Attack

H

is a coalition, recovered by same search algorithmNothing guaranteed, but works in practiceSlide65

65Slide66

66

Potential SolutionsSlide67

67