PrivacyPreserving Machine Learning Payman Mohassel and Yupeng Zhang Machine Learning More data Better Models Image processing Speech recognition Ad recommendation Playing Go ID: 651706
Download Presentation The PPT/PDF document "SecureML : A System for Scalable" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
SecureML: A System for ScalablePrivacy-Preserving Machine Learning
Payman Mohassel and Yupeng ZhangSlide2
Machine LearningMore data
→ Better Models
Image processing
Speech recognition
Ad recommendation
Playing GoSlide3
Ad recommendation
Machine Learning
More data
→ Better Models
Image processing
Speech recognition
Playing Go
Data Privacy?Slide4
Example: Fraud Detection
Card
#
TimeLocationAmout
xxxxxxxx8/8/2016CA, USAxx.xx……
xxxxxxxx
x/xx/
xxxx
xx,xxx
xx.xx
……
Name
SSN
Alice
xxxxx
……
Alice
xxxxx
…..
Products
xxx
……Slide5
Privacy-preserving Machine LearningDecision trees
[LP00, …]k-means clustering [JW05, BO07, …]SVM classification [YVJ06, VYJ08, …]Linear regression [DA01, DHC04, SKLR04, NWI
+13, GSB+16, GLL+16, …]Logistic regression
[SNT07, WTK+13, AHTW16, …]Neural networks [SS15, GDL+16, …]
……Slide6
Two-server Model
server
server
two party computation
model
user
=
+
data
More efficient than MPC and FHE
Users can be offline during the training
Used in many prior work
[NWI
+
13, NIW
+
13, GSB
+
16, …] Slide7
Our ContributionsNew protocols for linear regression, logistic regression and neural networks training
Secret sharing and arithmetic with precomputed triplets + Garbled circuitSystem:54 – 1270× faster than prior workScale to large datasets (1 million records, 5000 features for logistic regression)Slide8
Linear RegressionSlide9
Linear Regression
x
y
Output: model
w
Input: data value pairs (
x
,
y
)s
Stochastic Gradient Decent (SGD):
w
Initialize
w
randomly
Select a random sample (
x
,
y
)
Update Slide10
Secret Sharing
server
server
a
a
0
= a
-
r
mod
p
a
1
= r
mod
pSlide11
Secret Sharing and Addition
server
server
a
0a
1
b
0
b
1
+
+
=
=
c
0
c
1
c
0
+ c
1
= a + bSlide12
Secret Sharing and Multiplication Triplets
server
server
a
0a
1
b
0
b
1
u
0
, v
0
, z
0
u
1
, v
1
, z
1
(
u
0
+ u
1
)
×
(
v0
+ v1)= (z0 + z1)
a0 - u0 , b0 - v0 a1 – u1 , b1 – v1
e = a - uf = b - ve = a - uf = b - vc0= -
ef + a0 f + eb0 + z0
c
1
= a
1
f + eb
1
+ z
1
c
0
+ c
1
= a
×
bSlide13
Privacy-preserving Linear Regression
SGD:Users secret share data and values (x,y)Servers initialize and secret share the model w
Run SGD using pre-computed multiplication tripletsDecimal number?Slide14
Decimal Multiplications in Integer Fields
a
.
16 bits
b
.
16 bits
×
c
.
32 bits
c
.
16 bits
Truncation:
fixed-point multiplication
Same as integer multiplication
Decimal part grows
→
overflowSlide15
Truncation on shared values
a0
.
b0
.
×
c
0
.
.
a
1
.
b
1
.
c
1
.
c
0
.
c
1
Truncation:
c
.
+1, +0 or -1 on the last bit, with high probabilitySlide16
Privacy-preserving Linear Regression
SGD:Users secret share data and values (x,y)Servers initialize and secret share the model w
Run SGD using pre-computed multiplication triplets Truncate the shares after every multiplicationSlide17
Effects of Our Technique4-8× faster than fix-point multiplication garbled circuitSlide18
Logistic RegressionSlide19
Logistic Regression
x
Output: model
w
Input: data value pairs (
x
,
y
)s
y
=0 or 1Slide20
Privacy-preserving Logistic Regression
Logistic function
degree 10 polynomial
degree 2 polynomialSlide21
Privacy-preserving Logistic Regression
Logistic function
Our function
Almost the same accuracy as logistic function
Much faster than polynomial approximationSecure-computation-friendly activation functionSlide22
Privacy-preserving Logistic Regression
Logistic function
Our function
Run our protocol for linear regression
Switch to garbled circuit for f
[DSZ15]
Switch back to arithmetic secret sharingSlide23
Vectorization
Mini-batch SGD:Take a batch of B records and update w by their averageConverge faster and smoother
Fast matrix-vector/matrix-matrix multiplicationSlide24
Vectorization
Mini-batch SGD:Multiplication triplets for matrix-vector/matrix multiplications2× online computational overhead compared to plaintext training4-66× offline speedupSlide25
Neural NetworksSlide26
Neural Networks
Mini-batch SGD: coefficient matrices are updated by close-form formulas using matrix/element-wise multiplicationsSlide27
Experimental ResultsSlide28
Experiments Results: Linear Regression100,000 records, 500 features
54 - 1270× faster than systems in
[NWI+13, GSB+16]Support arbitrary partitioning of data
10,000
1000
100
10
1
378
1.4
8782
20
20
4.9
465
141
time(s)
LAN: 1.2GB/s, delay 0.17ms
WAN: 9MB/s, delay 72ms
offline
online
Client-aided
triplets
Client-aided
tripletsSlide29
Experiments Results: Logistic Regression100,000 records, 500 features
10,000
1000
100
10
1
378
9.6
8782
20
20
11.5
652
422
time(s)
LAN: 1.2GB/s, delay 0.17ms
WAN: 9MB/s, delay 72ms
offline
online
Client-aided
triplets
Client-aided
triplets
Scale to 1 million records and 5,000 featuresSlide30
Experiments: Neural Networks2 hidden layers with 128 neurons each
LAN: 25,200 sec online + offlinePlaintext training: 700 sec. 35× overhead.WAN: 220,000* sec online + offlineSlide31
SummaryPrivacy-preserving linear, logistic regression and neural networks
Decimal arithmetic on integer fieldSecure-computation-friendly activation functionsVectorization (mini-batch SGD)System:Orders of magnitude faster than prior workScale to large datasetsSlide32
Future Work
Privacy-preserving Neural NetworksAccuracy: softmax, convolutional neural networks, etc.Efficiency: partitioning, parallelization etc. Multi-party model
Thank you!!!
Q&ASlide33
Large Scale Logistic Regression
1,000,000 records, 5,000 features
LAN: 2,500 sec client-aided offline, 623.5 sec onlineSlide34
Garbled Circuits
AND
a
b
c
a
b
c
0
0
0
0
1
0
1
0
0
1
1
1
Truth Table
a
b
a
b
Garbled Table
c
(
)
(
)
(
)
(
)
cSlide35
Garbled Circuits
server o
server 1
k
b
k
0
, k
1
b
0
b
1
b
0
+b
1
=bSlide36
Switching Between Secret Sharing and GC
server 0
server 1
x
0
x
1
C(
x
0
, x
1
): modulo addition circuit, then output the most significant bit
Garbled circuit C
k
b
k
0
, k
1
b
0
b
1
b
0
+b
1
=b
m
0
=
x0 b0+rm1 = x0 (1-b0)+rOT(b
1)m = x0 b+rm0 = x1 b
1+r’m1 = x1 (1-b1)+r’
OT(
b
0
)
m =
x
1
b+r
’
f
(
x
) =
x
×
(
x>0)- r
- r’