Data Aggregation at Scale Michael J Freedman Princeton University Joint work with Benny Applebaum Haakon Ringberg Matthew Caesar and Jennifer Rexford Problem Network Anomaly Detection ID: 798957
Download The PPT/PDF document "Collaborative, Privacy-Preserving" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Collaborative, Privacy-PreservingData Aggregation at Scale
Michael J. Freedman
Princeton University
Joint work with: Benny
Applebaum
, Haakon
Ringberg
,
Matthew Caesar, and Jennifer Rexford
Slide2Problem:Network Anomaly Detection
Slide3Collaborative anomaly detectionSome attacks look like normal traffice.g., SQL-injection, application-level DoS
[
Srivatsa
TWEB ‘08]Is it a DDoS attack or a flash crowd? [Jung WWW ‘02]
Yahoo!
Google
Bing
I’m not sure about Beasty!
I’m not sure about Beasty!
I’m not sure
about Beasty!
Slide4Collaborative anomaly detection
Targets (victims) could correlate attacks/attackers
[
Katti
IMC ’05], [
Allman
Hotnets
‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03]
Yahoo!
Google
Bing
“
Fool us once, shame on you. Fool us N times, shame on us.
”
Slide5Problem:Network Anomaly Detection
Solution:
Aggregate suspect
IPs
from many ISPs
Flag those
IPs
that appear > threshold
τ
Slide6Problem:
Distributed Ranking
Solution:
Collect domain statistics from many users
Aggregate data by domain
Slide7Problem:…
Solution:
Aggregate (id, data) from many sources
Analyze data grouped by id
Slide8But what about privacy?What inputs are submitted?Who submitted what?
Slide9Data Aggregation ProblemMany participants, each with (key, value) observationGoal: Aggregate observations by key
Key
Values
k
1
( va, vb
)k2
( vi
, vj, vk )…
kn
( vx
)
A
A
A
Many participants, each with (key, value) observationGoal: Aggregate observations by key
Key
Values
k
1
( va, vb
)k2
( vi
, vj, vk )…
kn
( vx
)
A
A
A
F
(
F
(
F
(
)
)
)
PDA: Only release the value column
CR-PDA: Plus keys whose values satisfy some
func
Slide11Data Aggregation ProblemMany participants, each with (key, value) observationGoal: Aggregate observations by key
Key
Values
k
1
( 1, 1 )k2
( 1, 1, 1 )…k
n ( 1
)
Σ
Σ
Σ
PDA: Only release the value column
CR-PDA: Plus keys whose values satisfy some
func
≥
τ
?
≥
τ
?
≥
τ
?
Slide12GoalsKeyword privacy: No party learns anything about keysParticipant privacy: No party learns who submitted what
Efficiency:
Scale to many participants, each with many inputs
Flexibility:
Support variety of computations over valuesLack of coordination: No synchrony required, individuals cannot prevent progressAll participants need not be online at same time
Slide13Potential solutionsApproach
Keyword
Privacy
Participant
PrivacyEfficiency
FlexibilityLack ofCoord
GarbledCircuitEvaluation
Multiparty
Set Intersection
Yes Yes Very Poor Yes No
Yes Yes Poor No No
Decentralized
Slide14Security
Efficiency
Weaken security assumptions?
Assume honest but curious participants?
Assume no collusion among malicious participants? In large/open setting, easy to operate multiple nodes (so-called “Sybil attack”)
Slide15Towards Centralization?DB
Participants
Slide16Potential solutionsApproach
Keyword
Privacy
Participant
PrivacyEfficiency
FlexibilityLack ofCoord
GarbledCircuitEvaluation
Multiparty
Set Intersection
HashingInputs
Network
Anonymization
Yes Yes Very Poor Yes No
Yes Yes Poor No No
No No Very Good Yes Yes
No Yes Very Good Yes Yes
Decentralized
Centralized
Slide17Towards semi-centralizationParticipants
Proxy
DB
Assumption: Proxy and DB do not collude
Slide18Potential solutionsApproach
Keyword
Privacy
Participant
PrivacyEfficiency
FlexibilityLack ofCoord
GarbledCircuitEvaluation
Multiparty
Set Intersection
HashingInputs
Network
Anonymization
This
Work
Yes Yes Very Poor Yes No
Yes Yes Poor No No
No No Very Good Yes Yes
No Yes Very Good Yes Yes
Yes Yes Good Yes Yes
Decentralized
Centralized
Slide19Privacy GuaranteesPrivacy of PDA against malicious entities and participants Malicious participant may collude with either malicious proxy or DB, but not bothMay violate correctness
in almost arbitrary ways
Privacy of CR-PDA against
honest-but-curious entities and malicious participants
Slide20PDA Strawman #0Participant
Proxy
DB
Client sends input
k
k
Slide21PDA Strawman #1Participant
Proxy
DB
Client sends encrypted input
k
Proxy batches and retransmits
DB decrypts
input
ds
k
#
1.1.1.1
1
2.2.2.2
9
Violates keyword privacy
E
DB
(k
)
E
DB
(k
)
Slide22dsPDA Strawman
#2
Participant
Proxy
DB
Client sends hashes of
k
Proxy batches and retransmits
DB decrypts input
H (
k
)
#
H(1.1.1.1)
1
H(2.2.2.2)
9
Still violates keyword privacy:
IPs
drawn from small domains
E
DB
( H (
k
) )
E
DB
( H (
k
) )
Slide23PDA Strawman #3Participant
Proxy
DB
5. Proxy recovers
k
from
E
PRX
(k
)
Client sends keyed hashes of
k
Keyed hash function (PRF)
Key
s
known only by proxy
F
s
(
k
)
#
F
s
(1.1.1.1)
1
F
s
(2.2.2.2)
9
E
DB
( F
s
(
k
) )
E
DB
( F
s
(
k
) )
But how do clients
learn F
s
(IP)) ?
Secret
s
Slide24Our Basic PDA ProtocolParticipant
Proxy
DB
Client sends keyed hashes of
k
F
s
(x
) learned by client through
Oblivious PRF protocol
Proxy batches and retransmits keyed hash
DB decrypts input
F
s
(
k
)
#
F
s
(1.1.1.1)
1
F
s
(2.2.2.2)
9
E
DB
( F
s
(
k
) )
OPRF
E
DB
( F
s
(
k
) )
F
s
(
k
)
Secret
s
Slide25F
s
(
k
)
#
F
s
(1.1.1.1)
1
F
s
(2.2.2.2)
9
retransmits
Basic CR-PDA Protocol
Participant
Proxy
DB
Client sends keyed hashes of
k
,
and encrypted
k
for recovery
Proxy retransmits keyed hash
DB decrypts input
Identify rows to release and transmit E
PRX
(
k
) to proxy
Proxy decrypts
k
and releases
E
DB
( F
s
(
k
) )
F
s
(
k
)
E
DB
(E
PRX
(
k
))
E
PRX
(
k
)
F
s
(
k
)
#
Enc’d
k
F
s
(1.1.1.1)
1
E
PRX
(
1.1.1.1
)
F
s
(2.2.2.2)
9
E
PRX
(
2.2.2.2
)
Secret
s
Slide26retransmitsPrivacy Properties
Participant
Proxy
DB
Any coalition of HBC participants
HBC coalition of proxy and participants
HBC database
E
DB
( F
s
(
k
) )
F
s
(
k
)
E
DB
(E
PRX
(
k
))
E
PRX
(
k
)
Keyword privacy:
Nothing learned
about unreleased keys
Participant privacy:
Key
Participant not learned
Secret
s
Slide27retransmitsPrivacy Properties
Participant
Proxy
DB
Any coalition of HBC participants
HBC coalition of proxy and participants
HBC database
E
DB
( F
s
(
k
) )
F
s
(
k
)
E
DB
(E
PRX
(
k
))
E
PRX
(
k
)
Keyword privacy:
Nothing learned
about unreleased keys
Participant privacy:
Key
Participant not learned
Secret
s
malicious participants
HBC coalition of DB and participants
Slide28retransmitsMore Robust PDA Protocol
Participant
Proxy
DB
Any coalition of HBC participants
HBC coalition of proxy and participants
HBC database
E
DB
( F
s
(
k
) )
F
s
(
k
)
E
DB
(E
PRX
(
k
))
E
PRX
(
k
)
Secret
s
malicious participants
HBC coalition of DB and participants
ORPF
Encrypted OPRF Protocol
C
iphertext
re-randomization by proxy
Proof by participant that submitted
k’s
match
Slide29Encrypted-OPRF protocolProblem: in basic OPRF protocol, participant learns Fs(k)
Encrypted-OPRF protocol:
Client learns blinded
F
s(k)Client encrypts to DBProxy can unblind Fs(k) “under the encryption”
( )
r
-1
Enc ( ) ( )
rFs(k)
(
π si
)
ki=1
El
Gamal
g
mod
p
Problem: in basic OPRF protocol, participant learns Fs(k)
Encrypted-OPRF protocol
Client learns blinded
F
s(k)Client encrypts to DBProxy can unblind Fs(k) “under the encryption”
OPRF runs OT protocol for each bit of input k
OT protocols expensive, so use batch OT protocol [Ishai et al]
( )
r-1
Enc ( ) ( ) r
Fs(k
)
Slide31Scalable Protocol ArchitectureParticipants
Client-Facing
Proxies
Share
secret
s
Proxy Decryption
Oracles
Share
PRX key
Front-End
DB Tier
Share
DB key
Back-
End
DB
Storage
Partition
F
s
keyspace
Slide32EvaluationScalable architecture implementedBasic CR-PDA / PDA protocol + and encrypted-OPRF protocol w/ Batch OT
~5000 lines of
threaded C
+
+, GnuPG for cryptoTestbed of 2 GHz Linux machinesAlgorithm
ParameterValueRSA /
ElGamalkey size1024 bits
Oblivious Transferk80
AESkey size256 bits
Slide33Throughput vs. participant batch sizeSingle CPU core for DB and proxy each
Slide34Maximum throughput per serverFour CPU cores for DB and proxy
(each)
Slide35Throughput scalabilityNumber CPU
cores
per DB
and
proxy (each)
Slide36SummaryPrivacy-Preserving Data Aggregation protects:Participants: Do not reveal who submitted what
Keywords: Only
reveal
values / released keys
Novel composition of crypto primitivesBased on assumption that 2+ known parties don’t colludeEfficient implementation of architecture
Scales linearly with computing resourcesEx: Millions of suspected IPs in hours
Of independent interest… Introduced encrypted OPRF protocolFirst implementation/validation of Batch OT protocol