/
Understanding Hierarchical Methods for Differentially Private Histograms Understanding Hierarchical Methods for Differentially Private Histograms

Understanding Hierarchical Methods for Differentially Private Histograms - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
396 views
Uploaded On 2017-04-10

Understanding Hierarchical Methods for Differentially Private Histograms - PPT Presentation

1961 1958 1959 1964 1957 1955 1962 1960 1965 1963 1956 1954 ID: 338233

1961 1958 1959 1964 1957 1955 1962 1960 1965 1963 1956 1954

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Understanding Hierarchical Methods for D..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1961 1958 1959 1964 1957 1955 1962 1960 1965 1963 1956 1954 UnderstandingHierarchicalMethodsforDifferentiallyPrivateHistogramsWahbehQardaji,WeiningYang,NinghuiLiPurdueUniversity305N.UniversityStreet,WestLafayette,IN47907,USAfwqardaji,yang469,ninghuig@cs.purdue.eduABSTRACTInrecentyears,manyapproachestodi erentiallyprivatelypublishhistogramshavebeenproposed.Severalapproach-esrelyonconstructingtreestructuresinordertodecreasetheerrorwhenanswerlargerangequeries.Inthispaper,weexaminethefactorsa ectingtheaccuracyofhierarchicalap-proachesbystudyingthemeansquarederror(MSE)whenansweringrangequeries.Westartwithone-dimensionalhis-tograms,andanalyzehowtheMSEchangeswithdi eren-tbranchingfactors,afteremployingconstrainedinference,andwithdi erentmethodstoallocatetheprivacybudgetamonghierarchylevels.Ouranalysisandexperimentalre-sultsshowthatcombiningthechoiceofagoodbranchingfactorwithconstrainedinferenceoutperformthecurrents-tateoftheart.Finally,weextendouranalysistomulti-dimensionalhistograms.Weshowthatthebene tsfromemployinghierarchicalmethodsbeyondasingledimensionaresigni cantlydiminished,andwhenthereare3ormoredimensions,itisalmostalwaysbettertousetheFlatmethodinsteadofahierarchy.1.INTRODUCTIONAhistogramisanimportanttoolforsummarizingdata.Inrecentyears,signi cantimprovementshavebeenmadeintheproblemofpublishinghistogramswhilesatisfyingd-i erentiallyprivacy.Theutilitygoalistoensurethatrangequeriesovertheprivatehistogramcanbeansweredasaccu-ratelyaspossible.Thenaiveapproach,whichwecallthe\ratmethod,istoissueacountqueryforeachunitbininthehis-togram,andanswerthesequeriesusingtheLaplacianmech-anismintroducedbyDworketal.[8].Becauseeachsuchqueryhassensitivity1,themagnitudeoftheaddednoiseissmall.Whilethisworkswellwithhistogramsthathaveasmallnumberofunitbins,theerrorduetothismethodin-creasessubstantiallyasthenumberofthebinsincreases.Inparticular,whenansweringrangequeriesusingtheprivatehistogram,themeansquarederrorincreaseslinearlyinthesizeofthequeryrange.Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.ArticlesfromthisvolumewereinvitedtopresenttheirresultsatThe39thInternationalConferenceonVeryLargeDataBases,August26th­30th2013,RivadelGarda,Trento,Italy.ProceedingsoftheVLDBEndowment,Vol.6,No.14Copyright2013VLDBEndowment2150­8097/13/14...$10.00.Hayetal.[14]introducedthehierarchicalmethodforop-timizingdi erentiallyprivatehistograms.Inthisapproach,inadditiontoaskingforcountsofunit-lengthintervals,onealsoasksforcountsoflargerintervals.Conceptually,onecanarrangeallqueriedintervalsintoatree,wheretheunit-lengthintervalsaretheleaves.Thebene tofthismethodisthatarangequerywhichincludesmanyunitbinscanbean-sweredusingasmallnumberofsub-intervalswhichexactlycoverthequeryrange.Thetradeo overthe\ratmethodisthatwhenqueriesareissuedatmultiplelevels,eachquerymustsatisfydi erentialprivacyforasmaller.Werefertothisastheprivacybudgetallocatedto(orconsumedby)thequery.Hayetal.[14]alsointroducednovelconstrainedinferencetechniquestoimproveoverthebasichierarchicalmethod,whichexploitstheobservationthatqueryresult-satdi erentlevelsshouldsatisfycertainconsistencyre-lationshipstoobtainimprovedestimates.Thishierarchicalmethodwithconstrainedinferencehasbeenadoptedbyoth-erresearchers[5,4].Anumberofothermethodshavealsobeendevelopedandappliedtothehistogramprogram.Forexample,Xiaoetal.[21]introducedthePrivletmethod,whichdecomposestheoriginalhistogramusingtheHaarwavelet,thenaddsnoisetothedecomposedcoecients,and nallyreconstruct-sthehistogramusingthenoisycoecients.Thebene tofthismethodisthat,whenansweringrangequeries,noiseaddedtothecoecientscanbepartiallycancelledbecauseofthenatureofHaarwaveletprocessing.Lietal.[15]in-troducedthematrixmechanism,whichtriestooptimizean-swersfora xedsetofcountingqueries.Assolvingtheoptimizationproblemiscomputationallyinfeasible,severalapproximationtechniqueshavebeendeveloped,suchastheeigen-selectalgorithm[16]andthelow-rankapproximationalgorithm[24].Weobservethathierarchicalmethodscanbeparameter-izedinseveraldi erentways,anddi erentparametervaluesmaysigni cantlya ecttheaccuracyoftheresults.Suchpa-rametersincludethebranchingfactorforthehierarchy,aswellasthedivisionofprivacybudgetamongitslevels.Fur-thermore,severalotherparametersalsoa ecttherelativestrengthofeachmethod,includingthetotalnumberofhis-togrambins,thenumberofdimensions,andthedistributionofthequeries.Tothoroughlyunderstandthestrengthsandweaknessesofthesemethods,weneedtounderstandhowtheseparametersa ecttheaccuracy.Inthispaper,wecombinecombinatorialanalysisofhowthedi erentparametersa ecttheaccuracyofeachmethod,withexperimentalevaluationthatvalidatesthetheoretical