256K - views

Privacypreserving Queries over Relational Databases Femi Olumon and Ian Goldberg Cheriton School of Computer Science University of Waterloo Waterloo ON Canada NL G fgolumofiang cs

uwaterlooca Abstract We explore how Private Information Retrieval PIR can help users keep their sensitive information from being leak ed in an SQL query We show how to retrieve data from a relational databas e with PIR by hiding sensitive constants c

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Privacypreserving Queries over Relationa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Privacypreserving Queries over Relational Databases Femi Olumon and Ian Goldberg Cheriton School of Computer Science University of Waterloo Waterloo ON Canada NL G fgolumofiang cs






Presentation on theme: "Privacypreserving Queries over Relational Databases Femi Olumon and Ian Goldberg Cheriton School of Computer Science University of Waterloo Waterloo ON Canada NL G fgolumofiang cs"— Presentation transcript:

Privacy-preservingQueriesoverRelationalDatabases?FemiOlumo nandIanGoldbergCheritonSchoolofComputerScienceUniversityofWaterlooWaterloo,ON,CanadaN2L3G1ffgolumof,iangg@cs.uwaterloo.caAbstract.WeexplorehowPrivateInformationRetrieval(PIR)canhelpuserskeeptheirsensitiveinformationfrombeingleakedinanSQLquery.WeshowhowtoretrievedatafromarelationaldatabasewithPIRbyhidingsensitiveconstantscontainedinthepredicatesofaquery.Experimentalresultsandmicrobenchmarkingtestsshowourapproachincursreasonablestorageoverheadfortheaddedprivacybene tandperformsbetween7and480timesfasterthanpreviouswork.Keywords:Privateinformationretrieval,relationaldatabases,SQL1IntroductionMostsoftwaresystemsrequestsensitiveinformationfromuserstoconstructaquery,butprivacyconcernscanmakeauserunwillingtoprovidesuchinforma-tion.Theproblemaddressedbyprivateinformationretrieval(PIR)[3,9]istoprovidesuchauserwiththemeanstoretrievedatafromadatabasewithoutthedatabase(orthedatabaseadministrator)learninganyinformationabouttheparticularitemthatwasretrieved.DevelopmentofpracticalPIRschemesiscrucialtomaintaininguserprivacyinimportantapplicationdomainslikepatentdatabases,pharmaceuticaldatabases,onlinecensuses,real-timestockquotes,location-basedservices,andInternetdomainregistration.Forinstance,thecur-rentprocessforInternetdomainnameregistrationrequiresauserto rstdisclosethenameforthenewdomaintoanInternetdomainregistrar.Subsequently,theregistrarcouldthenusethisinsideinformationtopreemptivelyregisterthenewdomainandtherebydeprivetheuseroftheregistrationprivilegeforthatdo-main.Thispracticeisknownasfrontrunning[17].Manyusers,therefore, nditunacceptabletodisclosethesensitiveinformationcontainedintheirqueriesbythesimpleactofqueryingaserver.Users'concernforqueryprivacyandourproposedapproachtoaddressitarebynomeanslimitedtodomainnames;theyapplytopubliclyaccessibledatabasesinseveralapplicationdomains,assuggestedbytheexamplesabove. ?Anextendedversionofthispaperisavailable[22]. AlthoughICANNclaimsthepracticeofdomainfrontrunninghassubsided[17],wewill,however,usethedomainnameexampleinthispapertoenablehead-to-headperformancecomparisonswithasimilarapproachbyReardonetal.[23],whichisbasedonthissameexample.Whiletoday'smostdevelopedanddeployedprivacytechniques,suchasonionroutersandmixnetworks,o eranonymizingprotectionforusers'identities,theycannotpreservetheprivacyoftheusers'queries.Forthefrontrunningexample,theusercouldtunnelthequerythroughTor[12]topreservetheprivacyofhisorhernetworkaddress.Nevertheless,theservercouldstillobservetheuser'sdesireddomainname,andlaunchasuccessfulfrontrunningattack.ThedevelopmentofapracticalPIR-basedtechniqueforprotectingqueryprivacyo ersusersandserviceprovidersanattractivevalueproposition.Usersareincreasinglyawareoftheproblemofprivacyandtheneedtomaintainpri-vacyintheironlineactivities.ThegrowingawarenessispartlyduetoincreaseddependenceontheInternetforperformingdailyactivities|includingonlinebanking,Twittering,andsocialnetworking|andpartlybecauseoftherisingtrendofonlineprivacyinvasion.Privacy-conscioususerswillacceptaservicebuiltonPIRforqueryprivacyprotectionbecausenocurrentlydeployedsecu-rityorprivacymechanismo erstheneededprotection;theywilllikelybewillingtotradeo queryperformanceforqueryprivacyandevenpaytosubscribeforsuchaservice.Similarly,serviceprovidersmayadoptsuchasystembecauseofitspotentialforrevenuegenerationthroughsubscriptionsandaddisplays.AsmoreInternetusersvalueprivacy,mostonlinebusinesseswouldbemotivatedtoembraceprivacy-preservingtechnologiesthatcanimprovetheircompetitivenesstowinthisgrowinguserpopulation.Sincetheprotectionofauser'sidentityisnotaproblemaddressedbyPIR,existingservicemodelsrelyingonserviceprovidersbeingabletoidentifyauserforthepurposeoftargetedadswillnotbedisabledbythisproposal.Inotherwords,protectionofqueryprivacywillprovideadditionalrevenuegenerationopportunitiesfortheseserviceproviders,whilestillallowingfortheutilizationofinformationcollectedthroughothermeanstosendtargetedadstotheusers.Thus,usersandserviceprovidershaveplausibleincentivestouseaPIR-basedsolutionformaintainingqueryprivacy.Inaddition,theveryexistenceofapracticalprivacy-preservingdatabasequerytechniquecouldbeenoughtopersuadeprivacylegislatorsthatitisreasonabletodemandthatcertainsortsofdatabasesenforceprivacypolicies,sinceitispossibletodeploythesetechniqueswithoutseverelylimitingtheutilityofsuchdatabases.However,therudimentarydataaccessmodelofPIRisalimitingfactorindeployingsuccessfulPIR-basedsystems.Thesemodelsarelimitedtoretrievingasinglebit,ablockofbits[3,9,18],oratextualkeyword[8].Thereisthereforeaneedforanextensiontoamoreexpressivedataaccessmodel,andtoamodelthatenablesdataretrievalfromstructureddatasources,suchasfromarelationaldatabase.WeaddressthisneedbyintegratingPIRwiththewidelydeployedSQL. DynamicSQLisanincompleteSQLstatementwithinasoftwaresystem,meanttobefullyconstructedandexecutedatruntime[26].Itrequiresonlyasinglecompilationthatpreparesitforsubsequentexecutions.Itisthereforea\rexible,ecient,andsecurewayofusingSQLinsoftwaresystems.WeobservethattheshapeortextualcontentofanSQLquerypreparedwithinasystemisnotprivate,buttheconstantstheusersuppliesatruntimeareprivate,andmustbeprotected.Fordomainnameregistration,thetextualcontentofthequeryisexposedtothedatabase,butonlythetextualkeywordforthedomainnameisreallyprivate.Forexample,theshapeofthedynamicqueryinListing1isnotprivate;thequestionmark?isusedasaplaceholderforaprivatevaluetobeprovidedbeforethequeryisexecutedatruntime. Listing1ExampleDynamicSQLquery(databaseschemaasin[22]) SELECTt1.domain,t1.expiry,t2.contactFROMregdomainst1,registrart2WHERE(t1.reg_id=t2.reg_id)AND(t1.domain=?) Ourapproachtopreservingqueryprivacyoverarelationaldatabaseisbasedonhidingsuchprivateconstantsofaquery.TheclientsendsadesensitizedversionofthepreparedSQLqueryappropriatelymodi edtoremoveprivateinformation.ThedatabaseexecutesthispublicSQLquery,andgeneratesap-propriatecachedindicestosupportfurtherroundsofinteractionwiththeclient.Theclientsubsequentlyperformsanumberofkeyword-basedPIRoperations[8]usingthevaluefortheplaceholdersagainsttheindicestoobtaintheresultforthequery.Noneoftheexistingproposalsrelatedtoenablingprivacy-preservingqueriesandrobustdataaccessmodelsforprivateinformationretrievalmakesthenotedobservationabouttheprivacyofconstantswithinanotherwise-publicquery.Theseincludetechniquesthateliminatedatabaseoptimizationbylocalizingqueryprocessingtotheuser'scomputer[23],problemsonqueryingDatabase-as-a-Service[16,15],thosethatrequireanencrypteddatabasebeforepermittingprivatedataaccess[25],andthoserestrictedtosimplekeywordsearchontextualdatasources[4].Thisobservationiscrucialforpreservingtheexpressivenessandbene tsofSQL,andforkeepingtheinterfacebetweenadatabaseandexistingsoftwaresystemsfromchangingwhilebuildinginsupportforuserqueryprivacy.Ourapproachimprovesoverpreviousworkwithadditionaldatabaseoptimiza-tionopportunitiesandfewerPIRoperationsneededtoretrievedata.Tothebestofourknowledge,wearethe rsttoproposeapracticaltechniquethatlever-agesPIRtopreservetheprivacyofsensitiveinformationinanSQLqueryoverexistingcommercialandopen-sourcerelationaldatabasesystems.Ourcontributions.Weaddresstheproblemofpreservingtheprivacyofsen-sitiveinformationwithinanSQLqueryusingPIR.Indoingthis,weaddresstwoobstaclestodeployingsuccessfulPIR-basedsystems.First,wedevelopagenericdataaccessmodelforprivateinformationretrievalfromarelationaldatabaseusingSQL.Weshowhowtohidesensitivedatawithinaqueryandhowtouse PIRtoretrievedatafromarelationaldatabase.Second,wedevelopanapproachforembeddingPIRschemesintothewell-establishedcontextandorganizationofrelationaldatabasesystems.IthasbeenarguedthatperformingatrivialPIRoperation,whichinvolveshavingadatabasesenditsentiredatatotheuser,andhavingtheuserselecttheitemofinterest,ismoreecientthanrunningacomputationalPIRscheme[1,27];however,information-theoreticPIRschemesaremuchmoreecient.WeshowhowthelatterPIRschemescanbeappliedinrealisticscenarios,achievingbotheciencyandqueryexpressivity.Sincere-lationaldatabasesandSQLarethemostin\ruentialofalldatabasemodelsandquerylanguages,wearguethatmanyrealisticsystemsneedingqueryprivacyprotectionwill ndourapproachquiteuseful.Therestofthispaperisorganizedasfollows:Section2providesbackgroundinformationonPIRanddatabaseindexing.Section3discussesrelatedwork,whileSection4detailsthethreatmodel,security,andassumptionsforthepaper.Section5providesadescriptionofourapproach.Section6givesanoverviewoftheprototypeimplementation,resultsofmicrobenchmarkingandtheexperimentusedtoevaluatethisprototypeingreaterdepth.Section7concludesthepaperandsuggestssomefuturework.2Preliminaries2.1PrivateInformationRetrieval(PIR)PIRprovidesameanstoretrievedatafromadatabasewithoutrevealinganyinformationaboutwhichitemisretrieved.Initssimplestform,thedatabasestoresann-bitstringX,organizedasrdatablocks,eachofsizebbits.Theuser'sprivateinputorqueryisanindexi2f1;:::;rgrepresentingtheithdatablock.AtrivialsolutionforPIRisforthedatabasetosendallrblockstotheuserandhavetheuserselecttheblockofinterestatindexi(i.e.,Xi),butthiscarriesaverypoorcommunicationcomplexity.ThethreeimportantrequirementsforanyPIRschemearecorrectness(re-turnsthecorrectblockXitotheuser),privacy(leaksnoinformationtothedatabaseaboutiandXi)andnon-triviality(communicationcomplexityissub-linearinn)[10].Anadditionalrequirement,whichisnotoftenaddressedinthepublishedliterature,isimplementation(i.e.,computational)eciency[1,27].Whiletheperformanceofinformation-theoreticPIRschemesaregenerallybet-ter[14],thisneglectofcomputationaloverheadhasledtosingle-databasePIRschemesthatareslowforlargedatabases[27].Ontheotherhand,multi-serverinformation-theoreticPIRschemesaremuchmoreecientthanthetrivialso-lutionandtheiruseisjusti edinsituationswheretheuserlacksthebandwidthandlocalstorageforthetrivialdownloadofdata.Recentattemptsatbuild-ingpracticalsingle-databasePIR[31]usinggeneral-purposesecurecoprocessorso ersseveralordersofmagnitudeimprovementinperformance.Nevertheless,thepotentialapplicationofPIRinseveralpracticaldomainshasbeenlargelyunrealizedwithno\fruitful"or\realworld"practicalapplication. ArelatedcryptographicconstructiontoPIRisoblivioustransfer(OT)[20,21].InOT,adatabase(orsender)transmitssomeofitsitemstoauser(orchooser),inamannerthatpreservestheirmutualprivacy.Thedatabasehasassurancethattheuserdoesnotlearnanyinformationbeyondwhatheorsheisentitledto,andtheuserhasassurancethatthedatabaseisunawareofwhichparticularitemsitreceived.OTandtherelatedSymmetricPIR(SPIR)[19]canthusbeseentobegeneralizationsofPIR.ThoseprotocolscouldeasilybeusedinplaceofPIRinourwork,withtheconcomitantextracomputationalcost.2.2IndexingDatacanbeindexedbyakeyformedeitherfromthevaluesofoneormoreattributesorfromhashes(generallynotcryptographichashes)ofthosevalues.Indicesaretypicallyorganizedintotreestructures,suchasB+treeswherein-ternalornon-leafnodesdonotcontaindata;theyonlymaintainreferencestochildrenorleafnodes.Dataareeitherstoredintheleafnodes,ortheleafnodesmaintainreferencestothecorrespondingtuples(i.e.,records)inthedatabase.Furthermore,theleafnodesofB+treesmaybelinkedtogethertoenablese-quentialdataaccessduringrangequeriesovertheindex;rangequeriesreturnalldatawithkeyvaluesinaspeci edrange.Hashedindicesarespeci callyusefulforpointqueries,whichreturnasingledataitemforagivenkey.Formanysituationswhereecientretrievaloverasetofuniquekeysisneeded,hashedindicesarepreferredoverB+treeindices.However,itischallengingtogeneratehashfunctionsthatwillhasheachkeytoauniquehashvalue.Manyhashedindicesusedincommercialdatabases,forthisreason,usedatapartitioning(bucketization)[16]techniquestohasharangeofvaluestoasinglebucket,insteadoftoindividualbuckets.Recentadvances[5,6]inperfecthashfunctions(PHF)haveproducedafamilyofhashfunctionsthatcanecientlymapalargesetofnkeyvalues(ontheorderofbillions)toasetofmintegerswithoutcollisions,wherenislessthanorequaltom.3RelatedWorkAcommonassumptionforPIRschemesisthattheuserknowstheindexoraddressoftheitemtoberetrieved.However,Choretal.[8]proposedawaytoaccessdatawithPIRusingkeywordsearchesoverthreedatastructures:binarysearchtree,trieandperfecthashing.Ourworkextendskeyword-basedPIRtoB+treesandPHF.Inaddition,weprovideanimplementedsystemandcombinethetechniquewiththeexpressiveSQL.Thetechniquein[8]neitherexploresB+treesnorconsidersexecutingSQLqueriesusingkeyword-basedPIR.Reardonetal.[23]similarlyexploreusingSQLforprivateinformationre-trieval,andproposedtheTransPIRprototypesystem.Thisworkistheclosesttoourproposalandwillbeusedasthebasisforcomparisons.TransPIRperformstraditionaldatabasefunctions(suchasparsingandoptimization)locallyonthe client;itusesPIRfordatablockretrievalfromthedatabaseserver,whosefunc-tionhasbeenreducedtoablock-servingPIRserver.Thebene tofTransPIRisthatthedatabasewillnotlearnanyinformationevenaboutthetextualcon-tentoftheuser'squery.Thedrawbacksarepoorqueryperformancebecausethedatabaseisunabletoperformanyoptimization,andthelackofinteroperabilitywithanyexistingrelationaldatabasesystem.AninterestingattempttobuildapracticalpseudonymousmessageretrievalsystemusingthetechniqueofPIRispresentedin[24].Thesystem,knownasthePynchonGate,helpspreservetheanonymityofusersastheyprivatelyretrievemessagesusingpseudonymsfromacentralizedserver.UnlikeouruseofPIRtopreserveauser'squeryprivacy,thegoalofthePynchonGateistomaintainprivacyforusers'identities.Itdoesthisbyensuringthemessagesauserretrievescannotbelinkedtohisorherpseudonym.Theconstructionresiststracanalysis,thoughusersmayneedtoperformsomedummyPIRqueriestopreventapassiveobserverfromlearningthenumberofmessagesshehasreceived.4ThreatModel,SecurityandAssumptions4.1SecurityandadversarycapabilitiesOurmainassumptionisthattheshapeofSQLqueriessubmittedbytheusersispublicorknowntothedatabaseadministrator.Applicablepracticalscenariosin-cludedesign-timespeci cationofdynamicSQLbyprogrammers,whoexpecttheuserstosupplysensitiveconstantsatruntime.Moreover,thedatabaseschemaandalldynamicSQLqueriesexpectedtobesubmittedto,forexample,apatentdatabase,arenotreallyhiddenfromthepatentdatabaseadministrator.Simul-taneousprotectionofboththeshapeandconstantsofaqueryareoutsideofthescopeofthiswork,andwouldlikelyrequiretreatingthedatabasemanagementsystemasotherthanablackbox.Theapproachpresentedinthispaperissucientlygenerictoallowanappli-cationtorelyonanyblock-basedPIRsystem,includingsingle-server,multi-server,andcoprocessor-assistedvariants.WeassumeanadversarywiththesamecapabilityasthatassumedfortheunderlyingPIRprotocol.Thetwocom-monadversarycapabilitiesconsideredintheoreticalprivateinformationretrievalschemesarethecuriouspassiveadversaryandthebyzantineadversary[3,9].Ei-theroftheseadversariescanbeadatabaseadministratororanyotherinsidertoaPIRserver.AcuriouspassiveadversarycanobservePIR-encodedqueries,butshouldbeincapableofdecodingthecontent.Inaddition,itshouldnotbepossibletodi er-entiatebetweenqueriesoridentifythedatathatmakesuptheresultofaquery.Inourcontext,theinformationthisadversarycanobserveisthedesensitizedSQLqueryfromtheclientandthePIRqueries.Theinformationobtainedfromthedesensitizedquerydoesnotcompromisetheprivacyoftheuser'squery,sinceitdoesnotcontainanyprivateconstants.Similarly,theadversarycannotobtainanyinformationfromthePIRqueriesbecausePIRprotocolsaredesignedtoberesistantagainstanadversaryofthiscapability. Abyzantineadversarywithadditionalcapabilitiesisassumedforsomemulti-serverPIRprotocols[3,14].Inthismodel,thedatainsomeoftheserverscouldbeoutdated,orsomeoftheserverscouldbedown,malfunctioningormalicious.Nevertheless,theclientisstillabletocomputethecorrectresultanddeterminewhichserversmisbehaved,andtheserversarestillunabletolearntheclient'squery.Again,inourspeci ccontext,theadversarymaycompromisesomeoftheserversinamulti-serverPIRscenariobygeneratingandobtainingtheresultforasubstitutefakequeryorexecutingtheoriginalqueryontheseservers,butmodifyingsomeofthetuplesintheresultsarbitrarily.TheadversarymayrespondtoaPIRrequestwithacorruptedqueryresultorevendesistfromactingontherequest.Nevertheless,alloftheseactiveattackscenarioscanbee ectivelymitigatedwithabyzantine-robustmulti-serverPIRscheme.4.2DatasizeassumptionsWeservicePIRrequestsusingindexeddataextractedfromrelationaldatabases.Thesizeofthesedatadependsonthenumberoftuplesresultingfromthedesen-sitizedquery.Wenotethatevenintheeventthatthisdesensitizedqueryyieldsasmallnumberoftuples(includingjustone),theprivacyofthesensitivepartoftheSQLqueryisnotcompromised.ThepropertiesofPIRensurethattheadversarygainsnoinformationaboutthesensitiveconstantsfromobservingthePIRprotocol,overwhathealreadyknewbyobservingthedesensitizedquery.Ontheotherhand,manydatabaseschemasaredesignedinawaythatanumberofrelationswillcontainveryfewrowsofdata,allofwhicharemeanttoberetrievedandusedbyeveryuser.Therefore,itispointlesstoperformPIRoperationsontheseitems,sinceeveryuserisexpectedtoretrievethemallatsomepoint.Theadversarydoesnotviolateauser'squeryprivacybyobservingthispublicretrieval.4.3AvoidingservercollusionInformation-theoreticPIRisgenerallymorecomputationallyecientthancom-putationalPIR,butrequiresthattheserversnotcolludeifprivacyistobepre-served;thisisthesameassumptioncommonlymadeinotherprivacy-preservingtechnologies,suchasmixnetworks[7]andTor[12].Wepresentscenariosinwhichcollusionamongserversisunlikely,yieldinganopportunitytousethemoreecientinformation-theoreticPIR.The rstscenarioiswhenseveralindependentserviceprovidershostacopyofthedatabase.Thisappliestonaturallydistributeddatabases,suchasInternetdomainregistries.Inthisparticularinstance,theproblemofcolludingserversismitigatedbypracticalbusinessconcerns.Realistically,theInternetdomaindatabaseismaintainedbydi erentgeographicallydispersedorganizationsthatareindependentoftheregistrarsthatausermayquery.However,di erentreg-istrarswouldberesponsibleforthecontent'sdistributiontoendusersaswellasintegrationofpartnersthroughbanneradsandpromotions.Sincetheregistrarsareoperatinginthesamelineofbusinesswheretheycompetetowinusersand deliverdomainregistryservices,aswellashavingtheirownadvertisingmodelstoreapeconomicbene ts,thereisnorealincentivetocolludeinordertobreaktheprivacyofanyuser.Inthismodel,itisfeasiblethatauserwouldperformadomainnameregistrationqueryonmultipleregistrars'serversconcurrently.Theuserwouldthencombinetheresults,withoutfearofthequeriesrevealingitscontent.Additionally,individualserviceagreementscanforecloseanychanceofcollusionwithathirdpartyonlegalgrounds.Usersthenenjoygreatercon -denceinusingtheservice,andtheregistrarsinturncancapitalizeonrevenuegenerationopportunitiessuchaspay-per-usesubscriptionsandrevenue-sharingadopportunities.Thesecondscenariothato erslessdangerofcollusioniswhenthequeryneedstobeprivateonlyforashorttime.Inthiscase,theusermaybecomfortablewithknowingthatbythetimetheserverscolludeinordertolearnherquery,thequery'sprivacyisnolongerrequired.Notethateveninscenarioswherecollusioncannotbeforestalled,oursystemcanstilluseanycomputationalPIRprotocol;recentsuchprotocols[1,31]o erconsiderableeciencyimprovementsoverpreviousworkinthearea.5HidingSensitiveConstants5.1OverviewOurapproachistopreservetheprivacyofsensitivedatawithintheWHEREandHAVINGpredicatesofanSQLquery.Forbrevity,wewillfocusontheWHEREclause;asimilarprocessingprocedureappliestotheHAVINGclause.Thismayrequiretheuser(orapplication)tospecifytheconstantsthatmaybesensitive.FortheexamplequeryinListing2,thedomainnameandthecreationdatemaybesensitive.OurapproachsplitstheprocessingofSQLqueriescontainingsensitivedataintotwostages.Inthe rststage,theclientcomputesapublicsubquery,whichissimplytheoriginalquerythathasbeenstrippedofthepredicateconditionscontainingsensitivedata.Theclientsendsthissubquerytotheserver,andtheserverexecutesittoobtainaresultforthesubquery.Thedesiredresultfortheoriginalqueryiscontainedwithinthesubqueryresult,butthedatabaseisnotawareoftheparticulartuplesthatareofinterest.Inthesecondstage,theclientperformsPIRoperationstoretrievethetuplesofinterestfromthesubqueryresult.Toenablethis,thedatabasecreatesacachedindexonthesubqueryresultandsendsmetadataforqueryingtheindextothe Listing2ExamplequerywithaWHEREclausefeaturingsensitiveconstants. SELECTt1.contact,t1.email,t2.created,t2.expiryFROMregistrart1,regdomainst2WHERE(t1.reg_id=t2.reg_id)AND(t2.created�20090101)AND(t2.domain='anydomain.com') li PIR Server aba il Server Client ubque ubque ubque ubque index helper data nde PIR retrieval of q(i) PIR result ......... pu Fig.1.AsequencediagramforevaluatingAlice'sprivateSQLqueryusingPIR.client.TheclientsubsequentlyperformsPIRretrievalsontheindexand nallycombinestheretrieveditemstobuildtheresultfortheoriginalquery.Theimportantbene tsofthisapproachascomparedwiththepreviousap-proach[23]aretheoptimizationsrealizablefromhavingthedatabaseexecutethenon-privatesubquery,andthefewernumberofPIRoperationsrequiredtore-trievethedataofinterest.Inaddition,thePIRoperationsareperformedagainstacachedindexwhichwillusuallybesmallerthanthecompletedatabase.Thisisparticularlytrueiftherearejoinsandnon-privateconditionsintheWHEREclausethatconstrainthetuplesinthequeryresult.Inparticular,asinglePIRqueryisneededforpointqueriesonhashtableindices,whilerangequeriesonB+treeindicesareperformedonfewerdatablocks.Figure1illustratesthesequenceofeventsduringaqueryevaluation.Wenotethatoften,thenon-privatesubquerieswillbecommontomanyusers,andthedatabasedoesnotneedtoexecutethemeverytimeausermakesarequest.Nevertheless,ouralgorithmdetails,presentednextinSection5.2,showthestepsforprocessingasubqueryandgeneratingindices.Suchdetailsareusefulinanadhocenvironment,wheretheshapeofaqueryisunknowntothedatabaseapriori;eachuserwriteshisorherownqueryasneeded.Ourassumptionisthatrevealingtheshapeofaquerywillnotviolateusers'privacy(seeSection4).5.2AlgorithmWedescribeouralgorithmwithanexamplebyassuminganinformation-theoreticPIRsetupwithtworeplicatedservers.WefocusonhidingsensitiveconstantsinthepredicatesoftheWHEREclause.ThealgorithmdetailsfortheSE-LECTqueryinListing2follows.Weassumethedate20090101andthedomainanydomain.comareprivate.Step1:Theclientbuildsanattributelist,aconstraintlist,andadesensitizedSELECTquery,usingtheattributenamesandtheWHEREconditionsoftheinputquery.Werefertothedesensitizedqueryasasubquery. Tobegin,initializetheattributelisttotheattributenamesinthequery'sSE-LECTclause,theconstraintlisttobeempty,andthesubquerytotheSELECTandFROMclausesoftheoriginalquery.{Attributelist:ft1.contact,t1.email,t2.created,t2.expiryg{Constraintlist:fg{Subquery:SELECTt1.contact,t1.email,t2.created,t2.expiryFROMregistrart1,regdomainst2Next,considereachWHEREconditioninturn.Ifaconditionfeaturesaprivateconstant,thenaddtheattributenametotheattributelist(ifnotalreadyinthelist),andadd(attributename,constantvalue,operator)totheconstraintlist.Otherwise,addtheconditiontothesubquery.Oncompletingtheabovesteps,theattributelist,constraintlist,andsub-querywithreducedconditionsfortheinputquerybecome:{Att.list:ft1.contact,t1.email,t2.created,t2.expiry,t2.domaing{Con.list:f(t2.created,20090101,�),(t2.domain,'anydomain.com',=)g{Subquery:SELECTt1.contact,t1.email,t2.created,t2.expiry,t2.domainFROMregistrart1,regdomainst2WHERE(t1.reg id=t2.reg id)Step2:Theclientsendsthesubquery,akeyattributename,andanindex letypetoeachserver.Thekeyattributenameisselectedfromtheattributenamesinthecon-straintlist|t2.created,t2.domaininourexample.Thechoicemayeitherberandom,madebytheapplicationdesigner,ordeterminedbyaclientopti-mizercomponentwithsomedomainknowledgethatcouldenableittomakeanoptimalchoice.Onewaytomakeagoodchoiceistoconsidertheselectivity|theratioofthenumberofdistinctvaluestakentothetotalnumberoftuples|expectedforeachconstraintlistattribute,andthenchoosetheonethatismostselective.Thisensurestheselectionofattributeswithuniquekeyvaluesbeforelessselectiveattributes.Forexample,inapatentdatabase,thepatentnumberisabetterchoiceforakeythantheauthor'sgender.ApoorchoiceofkeycanleadtomoreroundsofPIRqueriesthannecessary.PointqueriesonauniquekeyattributecanbecompletedwithasinglePIRquery.Similarly,agoodchoiceofkeywillreducethenumberofPIRqueriesforrangequeries.Fortheexamplequery,wechooset2.domainasthekeyattributename.Fortheindex letype,eitheraPHForaB+treeindextypeisspeci ed.Otherindexstructuresmaybepossible,withadditionalinvestigation,butthesearetheoneswecurrentlysupport.Moredetailsontheselectionofindextypesisprovidedbelow.Step3:Eachserver:executesthesubqueryonitsrelationaldatabase,generatesacachedindexofthespeci edtypeonthesubqueryresult,usingthekeyattributename,andreturnsmetadataforsearchingtheindicestotheclient.Theservercomputesthesizeofthesubqueryresult.IfitcansendtheentireresultmorecheaplythanperformingPIRoperationsonit,itdoesso.Otherwise, itproceedswiththeindexgeneration.Forhashtableindices,theserver rstcom-putestheperfecthashfunctionsforthekeyattributevalues.Thenitevaluateseachkeyandinsertseachtupleintoahashtable.Themetadatathatisreturnedtotheclientforhash-basedindicesconsistsofthePHFparameters,thecountoftuplesinthehashtable,andsomePIR-speci cinitializationparameters.ForB+treeindices,theserverbulkinsertsthesubqueryresultintoanewB+treeindex le.B+treebulkinsertionalgorithmsprovideahigh-speedtech-niqueforbuildingatreefromexistingdata[2].Theserveralsoreturnsmetadatatotheclient,includingthesizeofthetreeandits rstdatablock(theroot).Generatedindicesarestoredinadiskcacheexternaltothedatabase.Step4:Theclientreceivestheresponsesfromtheserversandveri estheyareoftheappropriatelength.Forabyzantinerobustmulti-serverPIR,aclientmaychoosetoproceedinspiteoferrorsresultingfromnon-respondingserversorfromresponsesthatareofinconsistentlength.Next,theclientperformsoneormorekeyword-basedPIRqueries,usingthevalueassociatedwiththekeyattributenamefromtheconstraintlist,andbuildsthedesiredqueryresultfromthedataretrievedwithPIR.TheencodingofaprivateconstantinaPIRqueryproceedsasfollows.ForPIRqueriesoverahash-basedindex,theclientcomputesthehashfortheprivateconstantusingthePHFfunctionsderivedfromthemetadata1.Thishashisalsotheblocknumberinthehashtableindexontheservers.ThisblocknumberisinputtothePIRschemetocomputethePIRqueryforeachserver.ForaB+treeindex,theusercomparestheprivatevalueforthekeyattributewiththevaluesintherootofthetree.Therootofthetreeisextractedfromthemetadataitreceivesfromtheserver.Eachkeyvalueinthisrootmaintainsblocknumbersforthechildrenblocksornodes.TheblocknumbercorrespondingtotheappropriatechildnodewillbetheinputtothePIRscheme.Forhash-basedindices,asinglePIRqueryissucienttoretrievetheblockcontainingthedataofinterestfromthehashtable.ForB+treeindices,however,theclientusesPIRtotraversethetree.Eachblockcanholdsomenumbermofkeys,andatablocklevel,theB+treecanbeconsideredanm-arytree.Theclienthasalreadybeensenttherootblockofthetree,whichcontainsthetopmkeys.Usingthisinformation,theclientcanperformasinglePIRblockquerytofetchoneofthemblockssoreferenced.Itrepeatsthisprocessuntilitreachestheleavesofthetree,atwhichpointitfetchestherequireddatawithfurtherPIRqueries.TheactualnumberofPIRqueriesdependsontheheightofthe(balanced)tree,andthenumberoftuplesintheresultset.TraversalsofB+treeindiceswithourapproachareobliviousinthattheyleaknoinformationaboutnodes'accesspattern;werealizeretrievalofanode'sdataasaPIRoperationoverthedatasetofallnodesinthetree.Inotherwords,itdoesnotmatterwhichparticularbranchofaB+treeisthelocationforthenextblocktoberetrieved.WedonotrestrictPIRoperationstothesubsetofblocksinthesubtreerooted 1UsingtheCMPHLibrary[5]forexample,theclientsavesthePHFdatafromthemetadataintoa le.Itreopensthis leandusesittocomputeahashbyfollowingappropriateAPIcallsequences. atthatbranch.Instead,eachPIRoperationconsidersthesetofblocksintheentireB+tree.Rangequeriesthatretrievedatafromdi erentsubtreesleaknoinformationabouttowhichsubtreeaparticularpieceofdatabelongs.Theonlyinformationtheserverlearnsisthenumberofblocksretrievedbysuchaquery.Therefore,speci cimplementationsmayutilizedummyqueriestopreventtheserverfromleaningtheamountofusefuldataretrievedbyaquery[24].Tocomputethe nalqueryresult,theclientappliestheotherprivatecon-ditionsintheconstraintlisttotheresultobtainedwithPIR.Fortheexamplequery,theclient ltersoutalltupleswitht2.creatednotgreaterthan20090101fromthetupledatareturnedwithPIR.Theremainingtuplesgivethe nalqueryresult.Capabilitiesfordealingwithcomplexqueriescanbebuiltintotheclient.Forexample,itmaybemoreecienttorequestasingleindexkeyedonthecon-catenationoftwoattributesthanseparateindices.Iftheclientrequestsseparateindices,itwillsubsequentlyperformPIRqueriesoneachofthoseindices,usingtheprivatevalueassociatedwitheachattributefromtheconstraintlist.Finally,theclientcombinesthepartialresultsobtainedfromthequerieswithsetopera-tions(union,intersection),andperformslocal lteringonthecombinedresult,usingprivateconstantvaluesforanyremainingconditionsintheconstraintlisttocomputethe nalqueryresult.Theclientthusneedsquery-optimizationca-pabilitiesinadditiontotheregularqueryoptimizationperformedbytheserver.6ImplementationandMicrobenchmarks6.1ImplementationWedevelopedaprototypeimplementationofouralgorithmtohidethesensi-tiveportionsofSQLqueriesusinggenerallyavailableopensourceC++librariesanddatabases.Wedevelopedacommand-linetooltoactastheclient,andaserver-sidedatabaseadaptertoprovidethefunctionsofaPIRserver.ForthePIRfunctions,weusedthePercy++PIRLibrary[13,14],whicho ersthreeva-rietiesofprivacyprotection:computational,informationtheoreticandhybrid(acombinationofboth).WeextendedPercy++tosupportkeyword-basedPIR.Forgeneratinghashtableindicesforpointqueries,weusedtheCMinimalPerfectHash(CMPH)Library[5,6],version0.9.WeusedtheAPIforCMPHtogener-ateminimumperfecthashfunctionsforlargedatasetsfromqueryresults;theseperfecthashfunctionsrequiresmallamountsofdiskstorageperkey.Forbuild-ingB+treeindicesforrangequeriesonlargedatasets,weusedtheTransparentParallelI/OEnvironment(TPIE)Library[11,30].Finally,webasetheimple-mentationontheMySQL[28]relationaldatabase,version5.1.37-1ubuntu5.1.6.2ExperimentalsetupWebeganevaluatingourprototypeimplementationusingasetofsixwhois-stylequeriesfromReardonetal.[23],whichisthemostappropriateexisting microbenchmarkforourapproach.Weexploredtestsusingindustry-standarddatabasebenchmarks,suchastheTransactionProcessingPerformanceCoun-cil(TPC)[29]benchmarks,andopen-sourcebenchmarkingkitssuchasOpenSourceDevelopmentLabsDatabaseTestSuite(OSDLDTS)[32],butnoneofthetestsfromthesebenchmarksissuitableforevaluatingourprototype,astheirtestdatabasescannotbereadily ttedintoascenariothatwouldmakeapplyingPIRmeaningful.Forexample,adatabaseschemathatisbasedoncompletingonlineorderswillonlyserveverylimitedpurposetoourgoalofprotectingtheprivacyofsensitiveinformationwithinaquery.Weranthemicrobenchmarktestsusingtwowhois-styledatasets,similartothosegeneratedfortheevaluationofTransPIR[23].Thesmallerdatasetconsistsof106domainnameregistrationtuples,and0:75106registrarandregistrantcontactinformationtuples.Theseconddatasetsimilarlyconsistsof4106and3106tuplesrespectively.Wedescribethetwodatabaserelationsandtheevaluationqueries,aswellastheresultsforthesmallerdataset,intheextendedversion[22].Inadditiontothemicrobenchmarks,weperformedanexperimenttoeval-uatethebehaviourofourprototypeoncomplexinputqueries,suchasaggre-gatequeries,BETWEENandLIKEqueries,andquerieswithmultipleWHEREclauseconditionsandjoins.Eachofthesecomplexquerieshasvaryingprivacyrequirementsforitssensitiveconstants.Werantheallexperimentsonaserverwithtwoquad-core2.50GHzIntelXeonE5420CPUs,8GBRAM,andrunningUbuntuLinux9.10.Weusedtheinformation-theoreticPIRsupportofPercy++,withtwodatabasereplicas.TheserveralsorunsalocalinstallationofaMySQLdatabase.6.3ResultoverviewTheresultsfromourevaluationindicatethatwhileourcurrentprototypeincurssomestorageandcomputationalcostsovernon-privatequeries,thecostsseementirelyacceptablefortheaddedprivacybene t(seeTables1and2).Inadditiontobeingabletodealwithcomplexqueriesandleveragedatabaseoptimizationopportunities,ourprototypeperformsmuchbetterthantheTransPIRprototypefromReardonetal.[23]|between7and480timesfasterforequivalentdatasets.ThemostindicativefactorofperformanceimprovementswithourprototypeisthereductioninthenumberofPIRqueriesinmostcases.Otherfactorsthatmaya ectthevalidityoftheresult,suchasvariationsinimplementationlibraries,areassumedtohavenegligibleimpactonperformance.OurworkisbasedonthesamePIRlibraryasthatof[23].OurcomparisonisbasedonthemeasurementswetookbycompilingandrunningthecodeforTransPIRonthesameexperimentalhardwareplatformasourprototype.WealsousedthesameunderlyingPIRlibraryasTransPIR. Table1.ExperimentalresultsformicrobenchmarktestscomparedwiththoseofRear-donetal.[23].BTREE=timingforourB+treeprototype,HASH=timingforourhashtableprototype,andTransPIR=timingfromTransPIR[23];Time=timetoevaluateprivatequery,PIRs=numberofPIRoperationsperformed,Tuples=countofrowsinqueryresult,QI=timingforsubqueryexecutionandindexgeneration,Xfer=totaldatatransferbetweentheclientandthetwoPIRservers. Query ApproachTime(s)PIRsTuplesQI(s)Xfer(KB) Q1 HASH21116128 BTREE43138384 TransPIR25211,017256 Q2 BTREE548032512 TransPIR99983801,01710,624 Q3 BTREE5416832512 TransPIR2,0551711681,01721,888 Q4 BTREE6523637640 TransPIR2,8852402361,01730,720 Q5 BTREE53167384 TransPIR37311,017384 Q6y BTREE5416866512 TransPIR3,087253127|y32,384 6.4MicrobenchmarkandcomplexqueryexperimentsForthebenchmarktests,weobtainedmeasurementsforthetimetoexecutetheprivatequery,thenumberofPIRqueriesperformed,thenumberoftuplesinthequeryresults,thetimetoexecutethesubqueryandgeneratethecachedindex,andthetotaldatatransferbetweentheclientandthetwoPIRservers.Table1showstheresultsoftheexperiment.Thecostofindexing(QI)canbeamortizedovermultiplequeries.TheindexingmeasurementsforBTREE(andHASH)consistofthetimespentretrievingdatafromthedatabase(subqueryexecution),writingthedata(subqueryresult)toa leandbuildinganindexfromthis le.SinceTransPIRisnotintegratedwithanyrelationaldatabase,itdoesnotincurthesamedatabaseretrievaland lewritingcosts.However,TransPIRincursaone-timepreprocessingcost(QI)whichpreparesthedatabaseforsubsequentqueryruns.ComparingthiscosttoitsindexingcounterpartwithourBTREEandHASHprototypesshowsthatourmethodsareoveranorderofmagnitudefaster.Fortheexperimentonquerieswithcomplexconditions,weusedanumberofsyntheticqueryscenarioshavingdi erentrequirementsforprivacy(see[22]fordetails).Themeasurements,asreportedinTable2,showexecutiondurationfortheoriginalquerywithoutprivacyprovisionovertheMySQLdatabase,andseveralothermeasurementstakenfromwithinourprototypeusingaB+treeindex. yWereproducedTransPIR'smeasurementsfrom[23]forqueryQ6becausewecouldnotgetTransPIRtorunQ6duetoprogramerrors.The`|'underQIindicatesmeasurementsmissingfrom[23] 6.5DiscussionTheempiricalresultsforthebenchmarktestsre\rectthebene tofourapproach.Forallofthetests,wemostlybaseourcomparisononthetimingsforqueryevaluationwithPIR(Time),andsometimesontheindexgenerationtimings(QI).Thetimetotransferdatabetweentheclientandtheserversisdirectlyproportionaltotheamountofdata(Xfer),butwewillnotuseitforcomparisonpurposesbecausethetestquerieswerenotrunoveranetwork.Ourhashindex(HASH)prototypeperformsthebestforqueryQ1,followedbyourB+tree(BTREE)prototype.ThequeryofQ1isapointqueryhavingasingleconditiononthedomainnameattribute.QueryQ2isapointqueryontheexpiry dateattribute,withthequeryresultexpectedtohavemultipletuples.ThenumberofPIRqueriesrequiredtoevaluateQ2withBTREEis5%ofthenumberrequiredbyTransPIR.AsimilartrendisrepeatedforQ3,Q4andQ6.NotethattheHASHprototypecouldnotbeusedforQ2becausehashindicesacceptuniquekeysonly;itcanonlyreturnasingletupleinitsqueryresult.QueryQ3isarangequeryonexpiry date.OurBTREEprototypewasapproximately411timesfasterthanTransPIR.OfnoteisthelargenumberofPIRqueriesthatTransPIRneedstoevaluatethequery;ourBTREEprototyperequiresonly2%ofthatnumber.WeobservedasimilartrendforQ4,whereBTREEwas480timesfaster.ThisqueryfeaturestwoconditionsintheSQLWHEREclause.ThecombinedmeasuredtimeforBTREE|thetimetakentobothbuildanindextosupportthequeryandtorunthequeryitself|isstill67timesfasterthanthetimeittakesTransPIRtoexecutethequeryalone.QueryQ5isapointquerywithasinglejoin.IttookBTREEonlyabout14%ofthetimeittookTransPIR.WeobservedthetimeourBTREEspentinexecutingthesubquerytodominate;onlyasmallfractionofthetimeisspentbuildingtheB+treeindex.OurBTREEprototypesimilarlyperformsfasterforQ6,withanorderofmagnitudesimilartoQ2,Q3,andQ4.Inallofthebenchmarkqueries,theproposedapproachperformsbetterthanTransPIRbecauseitleveragesdatabaseoptimizationopportunities,suchasfortheprocessingofsubqueries.Incontrast,TransPIRassumesatypeofblock-servingdatabasethatcannotgiveanyoptimizationopportunity.There-fore,inoursystem,theclientisrelievedfromhavingtoperformmanytraditionaldatabasefunctions,suchasqueryprocessing,inadditiontoitsregularPIRclientfunctions.Resultsforquerieswithcomplexconditions.WeseefromTable2thatinmostcases,thecosttoevaluatethesubqueryandcreatetheindexdominatesthetotaltimetoprivatelyevaluatethequery(BTREE),whilethetimetoevaluatethequeryonthealready-builtindex(Time)isminor.AnexceptionisCQ2,whichhasarelativelysmallsubqueryresult(rTuples),whilehavingtododozensof(consequentlysmaller)PIRoperationstoreturnthousandsofresultstotheoverallrangequery.NotethatinallbutCQ2,thetimetoprivatelyevaluatethequeryonthealready-builtindexisatmostafewsecondslongerthanperforming Table2.Measurementstakenfromexecuting vecomplexSQLquerieswithvaryingrequirementsforprivacy.oQm=timingforexecutingoriginalquerydirectlyagainstthedatabase,BTREE=overalltimingformeetingprivacyrequirementswithourB+treeprototype,Time=timetoevaluateprivatequerywithinBTREE,PIRs=numberofPIRoperationsperformed,Tuples=numberofrecordsin nalqueryresult,rTuples=numberofindexedrecordsinsubqueryresult,Xfer=totaldatatransferbetweentheclientandthetwoPIRservers,Size=storageforindex.Query oQm BTREETime PIRsTuplesrTuplesXferSize (s) (s)(s) (KB)(MB) CQ1 2 312 311,753,144384579.63CQ2 1 1513 413,71672,5685,24825.13CQ3 0 803 31631,806384209.38CQ4 2 255 311,050,300384348.63CQ5 2 693 364,000,0003841,324.13thequerywithnoprivacyatall;thisunderscorestheadvantageofusingcachedindices.Wenotefromourresultsthatitismuchmorecostlytohavetheclientsimplydownloadthecachedindices.Weobserve,forexample,thatitwilltakeabout5timesaslong,forauserwith10Mbpsdownloadbandwidth,todownloadtheindexforCQ5.Moreover,thistrivialdownloadofdataisimpracticalfordeviceswithlowbandwidthandstorage(e.g.,mobiledevices).Onewaytoimprovequeryperformanceisbyrevealingapre xorsuxofthesensitivekeywordinaquery.RevealingasubstringofakeywordhelpstoconstraintheresultsetthatwillbeindexedandretrievedwithPIR.Makingthistrade-o decisioninaprivacy-friendlymannernecessarilyrequiressomeknowledgeofthedatadistributionintermsofthenumberoftuplesthereareforeachvalueinthedomainofvaluesforasensitiveconstant.Theseinformationcanbeincludedinthemetadataaserversendstotheclientandtheclientcanmakethistrade-o decisiononbehalfoftheuserbasedontheuser'spresetpreferences.Weareconsideringthisextensionaspartofourfuturework.6.6LimitationsOurapproachcanpreservetheprivacyofsensitivedatawithintheWHEREandHAVINGclausesofanSQLquery,withtheexceptionofcomplexLIKEqueryexpressions,negatedconditionswithsensitiveconstants,andSELECTnestedquerieswithinaWHEREclause.ThecomplexityofcomplexsearchstringsforLIKEqueries,suchas(LIKE'do%abs%.c%m'),andnegatedWHEREclauseconditions,suchas(NOTregistrant=45444)arebeyondthecurrentcapabil-ityofkeyword-basedPIR.Oursolutiontodealingwiththeseconditionsinaprivacy-friendlymanneristocomputethemontheclient,afterthedataforthecomputationhasbeenretrievedwithPIR;convertingNOT=queriesintotheirequivalentrangequeriesisgenerallylessecientthanourproposedclient-basedevaluationmethod.Inaddition,ourprototypecannotprocessanestedquerywithinaWHEREclause.Weproposethatthesameprocessingdescribedfor ageneralSQLqueryberecursivelyappliedfornestedqueriesintheWHEREclause.Theresultobtainedfromanestedquerywillbecomeaninputtotheclientoptimizer,forrecursivelycomputingtheenclosingqueryforthenextround.Thereisneedforfurtherinvestigationoftheapproachfornestedqueriesreturn-inglargeresultsetsandfordeeplynestedqueries.7ConclusionandFutureWorkWehaveprovidedaprivacymechanismthatleveragesprivateinformationre-trievaltopreservetheprivacyofsensitiveconstantsinanSQLquery.Wede-scribedtechniquestohidesensitiveconstantsfoundintheWHEREclauseofanSQLquery,andtoretrievedatafromhashtableandB+treeindicesusingapri-vateinformationretrievalscheme.Wedevelopedaprototypeprivacymechanismforourapproacho eringpracticalkeyword-basedPIRandenabledapracticaltransitionfrombit-andblock-basedPIRtoSQL-enabledPIR.Weevaluatedthefeasibilityofourapproachwithexperiments.Theresultsoftheexperimentsindicateourapproachincursreasonableperformanceandstoragedemands,con-sideringtheaddedadvantageofbeingabletoperformprivateSQLqueries.Wehopethatourworkwillprovidevaluableinsightonhowtopreservetheprivacyofsensitiveinformationformanyexistingandfuturedatabaseapplications.Futureworkcanimproveonsomelimitationsofourprototype,suchastheprocessingofnestedqueriesandenhancingtheclienttousestatisticalinforma-tiononthedatadistributiontoenhanceprivacy.Thesametechniqueproposedinthispapercanbeextendedtopreservetheprivacyofsensitiveinformationforotherquerysystems,suchasURLquery,XQuery,SPARQLandLINQ.AcknowledgmentsWewouldliketothankUrsHengartner,RyanHenry,AniketKate,CanTang,MashaelAlSabah,JohnAkinyemi,CarolFung,MeredithL.Patterson,andtheanonymousreviewersfortheirhelpfulcommentsforimprovingthispaper.WealsogratefullyacknowledgeNSERCandMITACSforfundingthisresearch.References1.C.Aguilar-MelchorandP.Gaborit.ALattice-BasedComputationally-EcientPrivateInformationRetrievalProtocol.Cryptol.ePrintArch.,Report446,2007.2.L.Arge,O.Procopiuc,andJ.S.Vitter.ImplementingI/O-ecientDataStructuresUsingTPIE.InAnnualEuropeanSymposiumonAlgorithms,pages88{100,2002.3.A.BeimelandY.Stahl.RobustInformation-TheoreticPrivateInformationRe-trieval.J.Cryptol.,20(3):295{321,2007.4.J.Bethencourt,D.Song,andB.Waters.NewTechniquesforPrivateStreamSearching.ACMTrans.Inf.Syst.Secur.,12(3):1{32,2009.5.F.C.Botelho,D.Reis,andN.Ziviani.CMPH:CminimalperfecthashinglibraryonSourceForge.http://cmph.sourceforge.net/. 6.F.C.BotelhoandN.Ziviani.Externalperfecthashingforverylargekeysets.InACMCIKM,pages653{662,2007.7.D.L.Chaum.Untraceableelectronicmail,returnaddresses,anddigitalpseudonyms.Commun.ACM,24(2):84{90,1981.8.B.Chor,N.Gilboa,andM.Naor.Privateinformationretrievalbykeywords.TechnicalReportTRCS0917,Dept.ofComputerScience,Technion,Israel,1997.9.B.Chor,O.Goldreich,E.Kushilevitz,andM.Sudan.Privateinformationretrieval.InFOCS,pages41{50,Oct1995.10.G.D.Crescenzo.TowardsPracticalPrivateInformationRetrieval.AchievingPracticalPrivateInformationRetrieval(Panel@Securecomm2006),Aug.2006.11.DepartmentofComputerScienceatDukeUniversity.TheTPIE(TemplatedPortableI/OEnvironment).http://madalgo.au.dk/Trac-tpie/.12.R.Dingledine,N.Mathewson,andP.Syverson.Tor:thesecond-generationonionrouter.InUSENIXSecuritySymposium,pages21{21,2004.13.I.Goldberg.Percy++projectonSourceForge.http://percy.sourceforge.net/.14.I.Goldberg.ImprovingtheRobustnessofPrivateInformationRetrieval.InIEEESymposiumonSecurityandPrivacy,pages131{148,2007.15.H.Hacigumus,B.Iyer,C.Li,andS.Mehrotra.Executingsqloverencrypteddatainthedatabase-service-providermodel.InACMSIGMOD,pages216{227,2002.16.B.Hore,S.Mehrotra,andG.Tsudik.Aprivacy-preservingindexforrangequeries.InVLDB,pages720{731,2004.17.ICANNSecurityandStabilityAdvisoryCommittee(SSAC).ReportonDomainNameFrontRunning,February2008.18.E.KushilevitzandR.Ostrovsky.Replicationisnotneeded:singledatabase,computationally-privateinformationretrieval.InFOCS,page364,1997.19.S.K.MishraandP.Sarkar.SymmetricallyPrivateInformationRetrieval.InINDOCRYPT,pages225{236,2000.20.M.NaorandB.Pinkas.Oblivioustransferandpolynomialevaluation.InACMSymposiumonTheoryofComputing,pages245{254,1999.21.M.NaorandB.Pinkas.Ecientoblivioustransferprotocols.InACM-SIAMSODA,pages448{457,2001.22.F.Olumo nandI.Goldberg.Privacy-preservingQueriesoverRelationalDatabases.Technicalreport,CACR2009-37,UniversityofWaterloo,2009.23.J.Reardon,J.Pound,andI.Goldberg.Relational-CompletePrivateInformationRetrieval.Technicalreport,CACR2007-34,UniversityofWaterloo,2007.24.L.Sassaman,B.Cohen,andN.Mathewson.ThePynchonGate:aSecureMethodofPseudonymousMailRetrieval.InACMWPES,pages1{9,2005.25.E.Shi,J.Bethencourt,T.-H.H.Chan,D.Song,andA.Perrig.Multi-DimensionalRangeQueryoverEncryptedData.InIEEESSP,pages350{364,2007.26.A.Silberschatz,H.F.Korth,andS.Sudarshan.DatabaseSystemConcepts.McGraw-Hill,Inc.,NewYork,NY,USA,5thedition,2005.27.R.SionandB.Carbunar.OntheComputationalPracticalityofPrivateInforma-tionRetrieval.InNetworkandDistributedSystemsSecuritySymposium,2007.28.SunMicrosystems.MySQL.http://www.mysql.com/.29.TransactionProcessingPerformanceCouncil.BenchmarkC.http://www.tpc.org/.30.D.E.Vengro andJ.ScottVitter.SupportingI/O-ecientscienti ccomputationinTPIE.InIEEESymp.onParallelandDistributedProcessing,page74,1995.31.P.WilliamsandR.Sion.UsablePIR.InNetworkandDistributedSystemSecuritySymposium.TheInternetSociety,2008.32.M.WongandC.Thomas.DatabaseTestSuiteprojectonSourceForge.http://osdldbt.sourceforge.net/.