/
Airavat : Security and Privacy for Airavat : Security and Privacy for

Airavat : Security and Privacy for - PowerPoint Presentation

ashley
ashley . @ashley
Follow
27 views
Uploaded On 2024-02-02

Airavat : Security and Privacy for - PPT Presentation

MapReduce Indrajit Roy Srinath TV Setty Ann Kilzer Vitaly Shmatikov Emmett Witchel The University of Texas at Austin Computing in the year 201X 2 Illusion of infinite resources ID: 1044130

data privacy airavat untrusted privacy data untrusted airavat differential code input mapper output ipad trusted mapreduce programming control enforce

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Airavat : Security and Privacy for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Airavat: Security and Privacy for MapReduceIndrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett WitchelThe University of Texas at Austin

2. Computing in the year 201X2Illusion of infinite resourcesPay only for resources usedQuickly scale up or scale down …Data

3. Programming model in year 201X3Frameworks available to ease cloud programmingMapReduce: Parallel processing on clusters of machinesReduceMapOutputData Data mining Genomic computation Social networks

4. Programming model in year 201X4Thousands of users upload their data Healthcare, shopping transactions, census, click stream Multiple third parties mine the data for better serviceExample: Healthcare dataIncentive to contribute: Cheaper insurance policies, new drug research, inventory control in drugstores…Fear: What if someone targets my personal data?Insurance company can find my illness and increase premium

5. Privacy in the year 201X ?5OutputInformation leak? Data mining Genomic computation Social networksHealth DataUntrusted MapReduce program

6. Use de-identification?6Achieves ‘privacy’ by syntactic transformationsScrubbing , k-anonymity …Insecure against attackers with external informationPrivacy fiascoes: AOL search logs, Netflix datasetRun untrusted code on the original data?How do we ensure privacy of the users?

7. Audit the untrusted code?Audit all MapReduce programs for correctness?Aim: Confine the code instead of auditing7Also, where is the source code?Hard to do! Enlightenment?

8. This talk: Airavat8Framework for privacy-preserving MapReduce computations with untrusted code.Airavat is the elephant of the clouds (Indian mythology).Untrusted ProgramProtectedDataAiravat

9. Airavat guarantee9Bounded information leak* about any individual data after performing a MapReduce computation.*Differential privacyUntrusted ProgramProtectedDataAiravat

10. Outline10MotivationOverviewEnforcing privacyEvaluationSummary

11. map(k1,v1)  list(k2,v2)reduce(k2, list(v2))  list(v2)Data 1Data 2Data 3Data 4OutputBackground: MapReduce11Map phaseReduce phase

12. iPadTablet PCiPadLaptopMapReduce example12Map(input){ if (input has iPad) print (iPad, 1) }Reduce(key, list(v)){ print (key + “,”+ SUM(v)) }(iPad, 2)Counts no. ofiPads sold(ipad,1)(ipad,1)SUMMap phaseReduce phase

13. Airavat model13Airavat framework runs on the cloud infrastructure Cloud infrastructure: Hardware + VMAiravat: Modified MapReduce + DFS + JVM + SELinuxCloud infrastructureAiravat framework1Trusted

14. Airavat model14Data provider uploads her data on AiravatSets up certain privacy parametersCloud infrastructureData provider2Airavat framework1Trusted

15. Airavat model15Computation provider writes data mining algorithmUntrusted, possibly maliciousCloud infrastructureData provider2Airavat framework13Computation providerOutputProgramTrusted

16. Threat model16Airavat runs the computation, and still protects the privacy of the data providersCloud infrastructureData provider2Airavat framework13Computation providerOutputProgramTrustedThreat

17. Roadmap17What is the programming model?How do we enforce privacy?What computations can be supported in Airavat?

18. Programming model18MapReduce program for data mining Split MapReduce into untrusted mapper + trusted reducerDataDataNo need to auditAiravatUntrusted MapperTrusted ReducerLimited set of stock reducers

19. Programming model19MapReduce program for data mining DataDataNo need to auditAiravatUntrusted MapperTrusted ReducerNeed to confine the mappers !Guarantee: Protect the privacy of data providers

20. Challenge 1: Untrusted mapper20Untrusted mapper code copies data, sends it over the networkPeterMegReduceMapPeterDataChrisLeaks using system resources

21. Challenge 2: Untrusted mapper21Output of the computation is also an information channel Output 1 million if Peter bought Vi*graPeterMegReduceMapDataChris

22. Airavat mechanisms22Prevent leaks throughstorage channels like network connections, files…ReduceMapMandatory access controlDifferential privacy Prevent leaks through the output of the computation OutputData

23. Back to the roadmap23What is the programming model?How do we enforce privacy?Leaks through system resourcesLeaks through the outputWhat computations can be supported in Airavat?Untrusted mapper + Trusted reducer

24. Airavat confines the untrusted codeMapReduce + DFSSELinuxUntrusted programGiven by the computation providerAdd mandatory access control (MAC)Add MAC policy Airavat

25. Airavat confines the untrusted codeMapReduce + DFSSELinuxUntrusted programWe add mandatory access control to the MapReduce frameworkLabel input, intermediate values, outputMalicious code cannot leak labeled dataData 1Data 2Data 3OutputAccess control labelMapReduce

26. Airavat confines the untrusted codeMapReduce + DFSSELinuxUntrusted programSELinux policy to enforce MACCreates trusted and untrusted domainsProcesses and files are labeled to restrict interactionMappers reside in untrusted domainDenied network access, limited file system interaction

27. But access control is not enough27Labels can prevent the output from been readWhen can we remove the labels?iPadTablet PCiPadLaptop(iPad, 2)Output leaks the presence of Peter !Peterif (input belongs-to Peter) print (iPad, 1000000)(ipad,1000001)(ipad,1)SUMAccess control labelMap phaseReduce phase(iPad, 1000002)

28. But access control is not enough28Need mechanisms to enforce that the output does not violate an individual’s privacy.

29. Background: Differential privacy29A mechanism is differentially private if every output is produced with similar probability whether any given input is included or notCynthia Dwork. Differential Privacy. ICALP 2006

30. Differential privacy (intuition)30A mechanism is differentially private if every output is produced with similar probability whether any given input is included or notOutput distributionF(x)ABCCynthia Dwork. Differential Privacy. ICALP 2006

31. Differential privacy (intuition)31A mechanism is differentially private if every output is produced with similar probability whether any given input is included or notSimilar output distributionsBounded risk for D if she includes her data!F(x)F(x)ABCABCDCynthia Dwork. Differential Privacy. ICALP 2006

32. Achieving differential privacy32A simple differentially private mechanismHow much noise should one add?Tell me f(x)f(x)+noise…xnx1

33. Achieving differential privacy33Function sensitivity (intuition): Maximum effect of any single input on the outputAim: Need to conceal this effect to preserve privacyExample: Computing the average height of the people in this room has low sensitivityAny single person’s height does not affect the final average by too muchCalculating the maximum height has high sensitivity

34. Achieving differential privacy34Function sensitivity (intuition): Maximum effect of any single input on the outputAim: Need to conceal this effect to preserve privacyExample: SUM over input elements drawn from [0, M]X1X2X3X4SUMSensitivity = MMax. effect of any input element is M

35. Achieving differential privacy35A simple differentially private mechanismf(x)+Lap(∆(f))…xnx1Tell me f(x)Intuition: Noise needed to mask the effect of a single inputLap = Laplace distribution∆(f) = sensitivity

36. Back to the roadmap36What is the programming model?How do we enforce privacy?Leaks through system resourcesLeaks through the outputWhat computations can be supported in Airavat?Untrusted mapper + Trusted reducerMAC

37. Enforcing differential privacy37Mapper can be any piece of Java code (“black box”) but…Range of mapper outputs must be declared in advanceUsed to estimate “sensitivity” (how much does a single input influence the output?)Determines how much noise is added to outputs to ensure differential privacyExample: Consider mapper range [0, M] SUM has the estimated sensitivity of M

38. Enforcing differential privacy38Malicious mappers may output values outside the rangeIf a mapper produces a value outside the range, it is replaced by a value inside the rangeUser not notified… otherwise possible information leakData 1Data 2Data 3Data 4Range enforcerNoiseMapperReducerRange enforcerMapperEnsures that code is not more sensitive than declared

39. Enforcing sensitivity39All mapper invocations must be independentMapper may not store an input and use it later when processing another inputOtherwise, range-based sensitivity estimates may be incorrectWe modify JVM to enforce mapper independenceEach object is assigned an invocation numberJVM instrumentation prevents reuse of objects from previous invocation

40. Roadmap. One last time40What is the programming model?How do we enforce privacy?Leaks through system resourcesLeaks through the outputWhat computations can be supported in Airavat?Untrusted mapper + Trusted reducerMACDifferential Privacy

41. What can we compute?41Reducers are responsible for enforcing privacyAdd an appropriate amount of random noise to the outputs Reducers must be trustedSample reducers: SUM, COUNT, THRESHOLDSufficient to perform data mining algorithms, search log processing, recommender system etc.With trusted mappers, more general computations are possibleUse exact sensitivity instead of range based estimates

42. Sample computations42Many queries can be done with untrusted mappersHow many iPads were sold today?What is the average score of male students at UT?Output the frequency of security books that sold more than 25 copies today.… others require trusted mapper codeList all items and their quantity soldSumMeanThresholdMalicious mapper can encode information in item names

43. Revisiting Airavat guarantees43Allows differentially private MapReduce computationsEven when the code is untrustedDifferential privacy => mathematical bound on information leakWhat is a safe bound on information leak ?Depends on the context, datasetNot our problem

44. Outline44MotivationOverviewEnforcing privacyEvaluationSummary

45. Implementation details45450 LoC5000 LoC500 LoCLoC = Lines of Code

46. Evaluation : Our benchmarks46Experiments on 100 Amazon EC2 instances1.2 GHz, 7.5 GB RAM running Fedora 8BenchmarkPrivacy groupingReducer primitiveMapReduce operationsAccuracy metricAOL queriesUsersTHRESHOLD,SUMMultiple% queries releasedkNN recommenderIndividual ratingCOUNT, SUMMultipleRMSEK-MeansIndividual pointsCOUNT, SUMMultiple, till convergenceIntra-cluster varianceNaïve BayesIndividual articlesSUMMultipleMisclassification rate

47. Performance overhead47Normalized execution timeOverheads are less than 32%

48. Evaluation: accuracy48Accuracy increases with decrease in privacy guaranteeReducer : COUNT, SUMPrivacy parameterAccuracy (%)No information leakDecrease in privacy guarantee*Refer to the paper for remaining benchmark results

49. Related work: PINQ49Set of trusted LINQ primitivesAiravat confines untrusted code and ensures that its outputs preserve privacyPINQ requires rewriting code with trusted primitivesAiravat provides end-to-end guarantee across the software stackPINQ guarantees are language level[McSherry SIGMOD 2009]

50. Airavat in brief50Airavat is a framework for privacy preserving MapReduce computationsConfines untrusted codeFirst to integrate mandatory access control with differential privacy for end-to-end enforcementProtectedAiravatUntrusted Program

51. Thank you