camacuk Abstract We report on the largest corpus of userchosen passwords ever studied consisting of anonymized password histograms representing almost 70 million Yahoo users mit igating privacy concerns while enabling analysis of dozens of subpopulat ID: 26326 Download Pdf
camacuk Cormac Herley Microsoft Research Redmond WA USA cormacmicrosoftcom Paul C van Oorschot Carleton University Ottawa ON Canada paulvscscarletonca Frank Stajano University of Cambridge Cambridge UK frankstajanoclcamacuk Abstract We evaluate two d
camacuk Cormac Herley Microsoft Research Redmond WA USA cormacmicrosoftcom Paul C van Oorschot Carleton University Ottawa ON Canada paulvscscarletonca Frank Stajano University of Cambridge Cambridge UK frankstajanoclcamacuk Abstract We evaluate two d
clcamacukteachingcurrentOptComp Lecture 13a Alias and Pointsto Analysis Lecture 13a brPage 2br UNIVERSITYOF CAMBRIDGE Pointsto analysis parallelisation etc Consider an MP3 player containing code for channel 0 channel 2 channel processaudiochannel o
clcamacuk Abstract Separation logic is an extension of Hoare logic that allows local reasoning Local reasoning is a powerful feature that often allows simpler speci64257cations and proofs However this power is not used to reason about whileloops In t
camacuk Microsoft Research Cambridge UK manuelcmcastro microsoftcom Abstract Attacks that exploit outofbounds errors in C and C programs are still prevalent despite many years of re search on bounds checking Previous backwards compat ible bounds chec
Tom Ristenpart. CS 6431. The game plan. Historical analysis. Brief overview of research landscape. Current practices in industry. Bonneau. paper. Weir et al. paper. Misc. and wrap-up. Password use cases.
Making Password-Cracking Detectable. by Ari . Juels. , Ronald L. . Rivest. presenter : . Eirini. . Aikaterini. . Degleri. , 2735. CS558. Lecture on Passwords I . Table of contents. Introduction.
“. To improve the knowledge of . naturall. things, and all useful Arts, Manufactures, Mechanic practices, Engines and Inventions by Experiment. .”. Robert Hooke (1635-1703). Early History of Science and the Scientific Revolution.
Computer Science Laboratory 333 Ravenswood Ave Menlo Park CA 94025 650 3266200 Facsimile 650 8592844 brPage 3br Abstract To illustrate some of the power and convenience of its speci64257cation language and the orem prover we use the PVS formal veri6
Instead of attempting to build a system that is as free of errors as possible the designer instead identi64257es key properties that the execution must satisfy to be accept able to its users Together these properties de64257ne the ac ceptability env
Published bynatalia-silvester
camacuk Abstract We report on the largest corpus of userchosen passwords ever studied consisting of anonymized password histograms representing almost 70 million Yahoo users mit igating privacy concerns while enabling analysis of dozens of subpopulat
Download Pdf - The PPT/PDF document "The science of guessing analyzing an ano..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
(a)Historicalcrackingefciency,rawdictionarysize (b)Historicalcrackingefciency,equivalentdictionarysizeFigure1.ThesizeofcrackingdictionariesisplottedlogarithmicallyagainstthesuccessrateachievedinFigure1a.InFigure1b,thedictionarysizesareadjustedtoincorporatetheinherentneedformoreguessestocrackmorepasswords.Circlesandsolidlinesrepresentoperatingsystemuserpasswords,squaresanddashedlinesrepresentwebpasswords.II.HISTORICALEVALUATIONSOFPASSWORDSECURITYIthaslongbeenofinteresttoanalyzehowsecurepass-wordsareagainstguessingattacks,datingatleasttoMor-risandThompson'sseminal1979analysisof3,000pass-words[3].Theyperformedarudimentarydictionaryattackusingthesystemdictionaryandall6-characterstringsandrecovered84%ofavailablepasswords.Theyalsoreportedsomebasicstatisticssuchaspasswordlengths(71%were6charactersorfewer)andfrequencyofnon-alphanumericcharacters(14%ofpasswords).Thesetwoapproaches,pass-wordcrackingandsemanticevaluation,havebeenthebasisfordozensofstudiesinthethirtyyearssince.A.CrackingevaluationThefamous1988Morriswormpropagatedinpartbyguessingpasswordsusinga350-wordpassworddictionaryandseveralrulestomodifypasswords[9].ThepublicitysurroundingthewormmotivatedindependentstudiesbyKleinandSpaffordwhichre-visitedpasswordguessing[4],[5].Bothstudiesbroke2224%ofpasswordsusingmoreso-phisticateddictionariessuchaslistsofnames,sportsteams,moviesandsoforth.Passwordcrackingevolvedrapidlyintheyearsafterthesestudies,withdedicatedsoftwaretoolslikeJohntheRipperemerginginthe1990swhichutilizemanglingrulestoturnasinglepasswordlikejohnintovariantslikeJohn,J0HN,andnhoj.[10].Researchonmanglingruleshascontinuedtoevolve;thecurrentstateoftheartbyWeiretal.[11]automaticallygeneratesmanglingrulesfromalargetrainingsetofknownpasswords.Laterstudieshaveoftenutilizedthesetoolstoperformdictionaryattacksasasecondarygoal,suchasWu'sstudyofpasswordcrackingagainstKerberosticketsin1999[12]andKuoetal.'sstudyofmnemonicpasswordsin2006[13],whichrecovered8%and11%ofpasswords,respectively.Recently,large-scalepasswordleaksfromcompromisedwebsiteshaveprovidedanewsourceofdataforcrackingevaluations.Forexample,Schneieranalyzedabout50,000passwordsobtainedviaphishingfromMySpacein2006[6].Amorein-depthstudywasconductedbyDell'Amicoetal.,whostudiedtheMySpacepasswordsaswellasthoseoftwootherwebsitesusingalargevarietyofdifferentdictionaries[7].Averylargedatasetof32MpasswordsleakedfromRockYouin2009,whichWeiretal.studiedtoexaminetheeffectsofpassword-compositionrulesoncrackingefciency[8].Reportednumbersonpasswordcrackingefciencyvarysubstantiallybetweendifferentstudies,asshowninFig-ure1.Moststudieshavebroken2050%ofaccountswithdictionarysizesintherangeof220230.Allstudiesseediminishingreturnsforlargerdictionaries.ThisisclearinFigure1b,whichadjustsdictionarysizesbasedontheper-centageofpasswordscrackedsothatthedegreeofupwardslopereectsonlydecreasingefciency.ThisconceptwillmotivateourstatisticalguessingmetricsinSectionIII-E.Thereislittledataontheefciencyofsmalldictionariesasmoststudiesemploythelargestdictionarytheycanprocess.Klein'sstudy,whichattemptedtoidentifyhighlyefcientsub-dictionaries,isanotableexception[4].ThereisalsolittledataonthesizeofdictionaryrequiredtobreakalargemajorityofpasswordsonlyMorrisandThompsonbrokemorethan50%ofavailablepasswords1andtheirresultsmaybetoodatedtoapplytomodernpasswords.B.SemanticevaluationsInadditiontocrackingresearch,therehavebeenmanystudiesonthesemanticsofpasswordswithpsychologists1A2007studybyCazierandMedlinclaimedtobreak99%ofpasswordsatane-commercewebsite,butdetailsofthedictionaryweren'tgiven[14]. yearstudylength%digits%special 1989Riddleetal.[15]4.43.51992Spafford[5]6.831.714.81999Wu[12]7.525.74.11999ZviranandHaga[18]5.719.20.72006CazierandMedlin[14]7.435.01.32009RockYouleak[19]7.954.03.7 TableICOMMONLYESTIMATEDATTRIBUTESOFPASSWORDSandlinguistsbeinginterestedaswellascomputersecurityresearchers.Thisapproachcanbedifcultasiteitherre-quiresusersurveys,whichmayproduceunrealisticpasswordchoices,ordirectaccesstounhashedpasswords,whichcarriesprivacyconcerns.Riddleetal.performedlinguisticanalysisof6,226passwordsin1989,classifyingthemintocategoriessuchasnames,dictionarywords,orseeminglyrandomstrings[15].Cazieretal.repeatedthisprocessin2006andfoundthathard-to-classifypasswordswerealsothehardesttocrack[14].PasswordstructurewasformallymodeledbyWeiretal.[11]usingacontext-freegrammartomodeltheprob-abilityofdifferentconstructionsbeingchosen.Passwordcreationhasalsobeenmodeledasacharacter-by-characterMarkovprocess,rstbyNarayananandShmatikov[16]forpasswordcrackingandlaterbyCastellucciaetal.[17]totrainapro-activepasswordchecker.Thusmethodologyforanalyzingpasswordstructurehasvariedgreatly,butafewbasicdatapointslikeaveragelengthandtypesofcharactersusedaretypicallyreported,assummarizedinTableI.Theestimatesvarysowidelythatitisdifculttoinfermuchwhichisusefulinsystemsdesign.Themaintrendsareatendencytowards6-8charactersoflengthandastrongdislikeofnon-alphanumericcharactersinpasswords.2Manystudieshavealsoattemptedtodeterminethenumberofuserswhichappeartobechoosingrandompasswords,oratleastpasswordswithoutanyobviousmean-ingtoahumanexaminer.Methodologiesforestimatingthisvaryaswell,butmoststudiesputitinthe1040%range.Elementsofpasswordstructure,suchlengthorthepres-enceofdigits,upper-case,ornon-alphanumericcharacterscanbeusedtoestimatethestrengthofapassword,oftenmeasuredinbitsandoftenreferredtoimpreciselyasentropy.3Thisusagewascementedbythe2006FIPSElectronicAuthenticationGuideline[20],whichprovidedaroughruleofthumbforestimatingentropyfrompassword2Itisoftensuggestedthatusersavoidcharacterswhichrequiremultiplekeystotype,butthisdoesn'tseemtohavebeenformallyestablished.3Thisterminologyismathematicallyincorrectbecauseentropy(seeSectionsIII-AandIII-B)measuresacompleteprobabilitydistribution,notasingleevent(password).Thecorrectmetricforasingleeventisself-information(orsurprisal).Thisisperhapsdisfavoredbecauseitiscounter-intuitive:passwordsshouldavoidincludinginformationlikenamesoraddresses,sohigh-informationpasswordssoundweak.characteristicssuchaslengthandtypeofcharactersused.Thisstandardhasbeenusedinseveralpasswordstudieswithtoofewsamplestocomputestatisticsontheentiredistribution[21][23].Moresystematicformulashavebeenproposed,suchasonebyShayetal.[22]whichaddsentropyfromdifferentelementsofapassword'sstructure.C.ProblemswithpreviousapproachesThreedecadesofworkonpasswordguessinghaspro-ducedsophisticatedcrackingtoolsandmanydisparatedatapoints,butanumberofmethodologicalproblemscontinuetolimitscienticunderstandingofpasswordsecurity:1)Comparability:Authorsrarelyreportcrackingresultsinaformatwhichisstraightforwardtocomparewithpre-viousbenchmarks.Toourknowledge,Figure1istherstcomparisonofdifferentdatapointsofdictionarysizeandsuccessrate,thoughdirectcomparisonisdifcultsinceauthorsallreportefciencyratesfordifferentdictionarysizes.Passwordcrackingtoolsonlylooselyattempttoguesspasswordsindecreasingorderoflikeliness,introducingim-precisionintoreporteddictionarysizes.Worse,somestudiesreporttherunningtimeofcrackingsoftwareinsteadofdictionarysize[14],[24],[25],makingcomparisondifcult.2)Repeatability:Preciselyreproducingpasswordcrack-ingresultsisdifcult.JohntheRipper[10],usedinmostpublicationsofthepastdecade,hasbeenreleasedin21dif-ferentversionssince2001andmakesavailable20separatewordlistsforuse(alongwithmanyproprietaryones),inadditiontomanycongurationoptions.Otherstudieshaveusedproprietarypassword-crackingsoftwarewhichisn'tavailabletotheresearchcommunity[6],[14].Thusnearlyallstudiesusedictionariesvaryingincontentandordering,makingitdifculttoexactlyre-createapublishedattacktocompareitseffectivenessagainstanewdataset.3)Evaluatordependency:Password-crackingresultsareinherentlydependentontheappropriatenessofthedictionaryandmanglingrulestothedatasetunderstudy.Dell'Amicoetal.[7]demonstratedthisproblembyapplyinglanguage-specicdictionariestodatasetsofpasswordsindifferentlanguagesandseeingefciencyvaryby23ordersofmagnitude.TheyalsoevaluatedthesamedatasetasSchneierthreeyearsearlier[6]andachievedtwoordersofmagnitudebetterefciencysimplybychoosingabetterwordlist.Thusitisdifculttoseparatetheeffectsofmore-carefullychosenpasswordsfromtheuseofalessappropriatedictionary.Thisisparticularlychallengingindata-slicingexperiments[8],[23]whichrequiresimulatinganequallygooddictionaryattackagainsteachsubpopulation.4)Unsoundness:Estimatingtheentropyofapassworddistributionfromstructuralcharacteristicsismathematicallydubious,aswewilldemonstrateinSectionIII-D,andin-herentlyrequiresmakingmanyassumptionsaboutpasswordselection.Inpractice,entropyestimateshaveperformedpoorlyaspredictorsofempiricalcrackingdifculty[8],[23]. III.MATHEMATICALMETRICSOFGUESSINGDIFFICULTYDuetotheproblemsinherenttopasswordcrackingsimu-lationsorsemanticevaluation,weadvocatesecuritymetricsthatrelyonlyonthestatisticaldistributionofpasswords.Whilethisapproachrequireslargedatasets,iteliminatesbiasfrompassword-crackingsoftwarebyalwaysmodelingabest-caseattacker,allowingustoassessandcomparetheinherentsecurityofagivendistribution.Mathematicalnotation:Wedenoteaprobabilitydistribu-tionwithacalligraphicletter,suchasX.Weuselower-casextorefertoaspeciceventinthedistribution(anindividualpassword).Theprobabilityofxisdenotedpx.Formally,adistributionisasetofeventsx2X,eachwithanassociatedprobability0px1,suchthatPpx=1.WeuseNtodenotethetotalnumberofpossibleeventsinX.Weoftenrefertoeventsbytheirindexi,thatis,theirrankbyprobabilityinthedistributionwiththemostprobablehavingindex1andtheleastprobablehavingindexN.Werefertotheithmostcommoneventasxiandcallitsprob-abilitypi.Thus,theprobabilitiesoftheeventsinXformamonotonicallydecreasingsequencep1p2:::pN.WedenoteanunknownvariableasX,denotingXR XifitisdrawnatrandomfromX.Guessingmodel:Wemodelpasswordselectionasaran-domdrawXR XfromanunderlyingpassworddistributionX.ThoughXwillvarydependingonthepopulationofusers,weassumethatXiscompletelyknowntotheattacker.Givena(possiblysingleton)setofunknownpasswordsfX1;X2;:::Xkg,wewishtoevaluatetheefciencyofanattackertryingtoidentifytheunknownpasswordsXigivenaccesstoanoracleforqueriesoftheformisXi=x?A.ShannonentropyIntuitively,wemayrstthinkoftheShannonentropy:H1(X)=NXi=1pilgpi(1)asameasureoftheuncertaintyofXtoanattacker.IntroducedbyShannonin1948[26],entropyappearstohavebeenportedfromcryptographicliteratureintostudiesofpasswordsbeforebeingusedinFIPSguidelines[20].IthasbeendemonstratedthatH1ismathematicallyinap-propriateasameasureguessingdifculty[27][30].ItinfactquantiestheaveragenumberofsubsetmembershipqueriesoftheformIsX2S?forarbitrarysubsetsSXneededtoidentifyX.4Foranattackerwhomustguessindividualpasswords,Shannonentropyhasnodirectcorrelationtoguessingdifculty.54TheproofofthisisastraightforwardconsequenceofShannon'ssourcecodingtheorem[26].SymbolsXR XcanbeencodedusingaHuffmancodewithaveragebitlengthH1(X)+1,ofwhichtheadversarycanlearnonebitatatimewithsubsetmembershipqueries.5H1hasfurtherbeenclaimedtocorrelatepoorlywithpasswordcrackingdifculty[8],[23],thoughtheestimatesofH1usedcannotbereliedupon.B.R´enyientropyanditsvariantsR´enyientropyHnisageneralizationofShannonen-tropy[31]parametrizedbyarealnumbern0:6Hn(X)=1 1nlg NXi=1pni!(2)Inthelimitasn!1,R´enyientropyconvergestoShannonentropy,whichexplainswhyShannonentropyisdenotedH1.NotethatHnisamonotonicallydecreasingfunctionofn.Wearemostinterestedintwospecialcases:1)HartleyentropyH0:Forn=0,R´enyientropyis:H0=lgN(3)IntroducedpriortoShannonentropy[32],H0measuresonlythesizeofadistributionandignorestheprobabilities.2)Min-entropyH1:Asn!1,R´enyientropyis:H1=lgp1(4)Thismetricisonlyinuencedbytheprobabilityofthemostlikelysymbolinthedistribution,hencethename.Thisisausefulworst-casesecuritymetricforhuman-chosendistributions,demonstratingsecurityagainstanattackerwhoonlyguessesthemostlikelypasswordbeforegivingup.H1isalowerboundforallotherR´enyientropiesandindeedallofthemetricswewilldene.C.GuessworkAmoreapplicablemetricistheexpectednumberofguessesrequiredtondXiftheattackerproceedsinoptimalorder,knownasguessworkorguessingentropy[27],[30]:G(X)=Eh#guesses(XR X)i=NXi=1pii(5)BecauseGincludesallprobabilitiesinX,itmodelsanattackerwhowillexhaustivelyguessevenexceedinglyun-likelyeventswhichcanproduceabsurdresults.Forexample,intheRockYoudatasetovertwentyusers(morethan1in221)appeartouse128-bitpseudorandomhexadecimalstringsaspasswords.ThesepasswordsaloneensurethatG(RockYou)2106.ThusGprovideslittleinsightintopracticalattacksandfurthermoreisdifculttoestimatefromsampleddata(seeSectionV).D.PartialguessingmetricsGuessworkandentropymetricsfailtomodelthetendencyofreal-worldattackerstoceaseguessingagainstthemostdifcultaccounts.AsdiscussedinSectionII,crackingevaluationstypicallyreportthefractionofaccountsbrokenbyagivenattackandexplicitlylookforweaksubspacesofpasswordstoattack.Havingmanyaccountstoattackisan6R´enyientropyistraditionallydenotedH;weuseHntoavoidconfusionwithourprimaryuseofasadesiredsuccessrate. (a);G(numberofguesses) (b);G(effectivekey-length)Figure2.Twowaysofcomparingtheguessingdifcultyofuser-chosen4-digitPINs[33]againstuniformdistributionsofsize10,000and1,000(U104andU103,respectively).Fig.2aplotsthedictionarysizeneededtohaveachanceofsuccessaswellastheexpectednumberofguessesperaccountG.Fig.2bconvertsbothmetricsintoaneffectivekey-length,enablingvisualcomparisonacrosstheentirerangeof.Traditionalsingle-pointmetricsH0;H1;H2;H1and~Garealsomarkedforcomparison.Notethat~and~Garehorizontallinesforuniformdistributions;anattackergainsnoefciencyadvantagefromloweringhisdesiredsuccessrate.mH1(X)and~(X)+m~G1(X)foranyseparationparameterm.Furthermore,forany12adistributionXcanbefoundwith~1(X)+m~2(X)foranym.Theseresultseasilyextendto~GusingtheboundslistedinTableIIandrelatedresultscanbeprovedfor~(X). equivalences 8nHn(UN)=lgNallmetricsequalforU8~(UN)=lgNallmetricsequalforU8~(UN)=lgNallmetricsequalforU8~G(UN)=lgNallmetricsequalforUH0=~1=~N=lgNmetricsdependingonlyonNH1=~p1=~1=lgp1metricsdependingonlyonp1 bounds H1~G;~;~H1isabs.lowerbound~G;~;~H0H0isabs.upperbound~G~straightforwardproof~G~lg(1)straightforwardproof monotonicity H1:::H1H0Hndecreasingwithn~~+~increasingwith~~+~increasingwith~G~G+~Gincreasingwith TableIIRELATIONSBETWEENGUESSINGMETRICSG.ApplicationinpracticalsecurityevaluationForanonlineattackerwecanuse~withequaltotheguessinglimitsimposedbythesystem.Thereisnostandardfor,with10guessesrecommendedbyusabilitystud-ies[35],3byFIPSguidelines[20],andavarietyofvalues(often1)seeninpractice[36].Sophisticatedrate-limitingschemesmayallowaprobabilisticnumberofguesses[37].Weconsider~10areasonablebenchmarkforresistancetoonlineguessing,though~1=H1isaconservativechoiceasalowerboundforallmetricsproposed.TheseparationresultsofSectionIII-Fmeanthatforbrute-forceattackswecan'trelyonanysinglevalueof;eachvalueprovidesinformationaboutafundamentallydifferentattackscenario.Foracompletepicture,wecanconsider~or~Gacrossallvaluesof.Wecanplotthisastheguessingcurveforadistribution,asseeninFigure2.Forofineattacks,whereanadversaryislimitedonlybytimeandcomputingpower,wemightconsider~or~Gforastandardvaluesuchas0.5asabenchmark(~0:5wasorig-inallysuggestedby[29]).While~Gmoredirectlymeasurestheefciencyofaguessingattack,~canbeadvantageousinpracticebecauseitissimplertocompute.Inparticular,itcanbecomputedusingpreviouslypublishedcrackingresultsreportedasadictionaryofsizecompromisedafractionofavailableaccounts,asplottedinFigure1b.Furthermore,thedifferencebetweenthemetricsisonlysignicantforhighervaluesof;for0:5thetwowillneverdifferbymorethan1bit(fromtheboundinTableII).IV.PRIVACY-PRESERVINGEXPERIMENTALSETUPByusingstatisticalguessingmetricstoevaluatepass-words,wearefreedfromtheneedtoaccesspasswordsintheiroriginalform.Usersmaybewillingtoprovidepasswordstoresearcherswithethicsoversight[4],[23]butthisapproachdoesnotscaleandthevalidityofthecollectedpasswordsisquestionable.Incontrast,leakeddatasetsprovideunquestionablyvaliddatabutthereareethicalquestionswithusingstolenpassworddataanditsavailabilityshouldn'tbereliedon[38].Thereisalsonocontrolover thesizeorcompositionofleakeddatasets.Thusfar,forexample,noleakedsourceshaveincludeddemographicdata.WeaddressedbothproblemswithanovelexperimentalsetupandexplicitcooperationfromYahoo!,whichmaintainsasinglepasswordsystemtoauthenticateusersforitsdiversesuiteofonlineservices.Ourexperimentaldatacollectionwasperformedbyaproxyserversituatedinfrontofliveloginservers.Thisisrequiredaslong-termpasswordstorageshouldincludeaccount-specicsaltinganditeratedhashingwhichpreventconstructingahistogramofcommonchoices,justastheymitigatepre-computeddictionaryattacks[39].Ourproxyserverseesastreamofpairs(u;passwordu)foreachuseruloggingintoanyYahoo!service.OurgoalistoapproximatedistinctpassworddistributionsXfiforaseriesofdemographicpredicatesfi.Eachpredicate,suchasdoesthisuserhaveawebmailaccount?,willtypicallyrequireadatabasequerybasedonu.Asimplisticsolutionwouldbefortheproxytoemitastreamoftuples(H(passwordu);f1(u);f2(u);:::),removinguseridentiersutopreventtrivialaccesstorealaccountsandusingacryp-tographichashfunctionHtomaskthevaluesofindividualpasswords.8Therearetwomajorproblemstoaddress:A.PreventingpasswordcrackingIfauserucanbere-identiedbytheuniquenessofhisorherdemographicpredicates[40],thenthevalueH(passwordu)couldbeusedasanoracletoperformanofinedictionaryattack.Suchare-identicationattackwasdemonstratedonadatasetofmoviereviewssuperciallyanonymizedforresearchpurposes[41]andwouldalmostcertainlybepossibleformostusersgiventhenumberanddetailofpredicateswewouldliketostudy.Thisriskcanbeeffectivelymitigatedbyprependingthesamecryptographicallyrandomnoncertoeachpasswordpriortohashing.Theproxyservermustgenerateratthebeginningofthestudyanddestroyitpriortomakingdataavailabletoresearchers.Bychoosingrsufcientlylongtopreventbrute-force(128bitsisaconservativechoice)andensuringitisdestroyed,H(rjjpasswordu)isuselessforanattackerattemptingtorecoverpasswordubutthedistributionofhashvalueswillremainexactlyisomorphictotheunderlyingdistributionofpasswordsseen.B.Preventingcross-accountcompromiseWhileincludinganoncepreventsofinesearch,anat-tackerperforminglarge-scalere-identicationcanstilliden-tifysetsofuserswhichhaveapasswordincommon.Thisdecreasessecurityforallusersinagroupwhichshareapassword,asanattackermaythengainaccesstoallaccountsinthegroupbyrecoveringjustoneuser'spasswordbyauxiliarymeanssuchasphishing,malware,orcompromiseofanexternalwebsiteforwhichthepasswordwasre-used.8NotethatHcannotincorporateanyuser-specicsaltdoingsowouldoccludethefrequencyofrepeatedpasswords. Figure3.ChangingestimatesofguessingmetricswithincreasingsamplesizeM.EstimatesforH1and~10convergeveryquickly;estimatesfor~0:25convergearoundM=222(marked)aspredictedinSectionV-A.EstimatesforH0,H1,and~Garenotclosetoconverging.Solvingthisproblemrequirespreventingre-identicationbynotemittingvectorsofpredicatesforeachuser.Instead,theproxyservermaintainsahistogramHiofobservedhashvaluesforeachpredicatefi.Foreachpair(u;passwordu)observed,theproxyserveraddsH(rjjpasswordu)toeachhistogramHiforwhichfi(u)istrue.AnadditionallistisstoredofallpreviouslyseenhashedusernamesH(rjju)topreventdouble-countingusers.C.DeploymentdetailsThecollectioncode,consistingofafewdozenslinesofPerl,wasauditedandrgeneratedusingaseedprovidedbyaYahoo!managerandmachine-generatedentropy.TheexperimentwasapprovedbyYahoo!'slegalteamaswellastheresponsibleethicscommitteeattheUniversityofCambridge.WedeployedourexperimentonarandomsubsetofYahoo!serversfora48hourperiodfromMay2325,2011,observing69,301,337uniqueusersandconstructingseparatehistogramsfor328differentpredicatefunctions.Ofthese,manydidnotachieveasufcientsamplesizetobeusefulandwerediscarded.V.EFFECTSOFSAMPLESIZEInourmathematicaltreatmentofguessingdifculty,weassumedcompleteinformationisavailableabouttheunder-lyingprobabilitydistributionofpasswordsX.Inpractice,wewillneedtoapproximateXwithempiricaldata.9WeassumethatwehaveMindependentsamplesX1;:::;XMR XandwewishtocalculatepropertiesofX.Thesimplestapproachistocomputemetricsusingthedistributionofsamplesdirectly,whichwedenote^X.10As9Itpossiblethatanattackerknowstheprecisedistributionofpasswordsinagivendatabase,buttypicallyinthiscasesheorhewouldalsoknowper-userpasswordsandwouldnotbeguessingstatistically.10Weusethehatsymbol^foranymetricestimatedfromsampleddata. showninFigure3,thisapproachproducessubstantialandsystematicunder-estimatesofmostmetrics,mostpromi-nently^H0=lg^NwhichincreasesnearlycontinuouslywithincreasingsamplesizeMindicatingthatnewpasswordsarestillbeingseenoftenevenatourmassivesamplesize.Themaximum-likelihoodestimationofthegrowthrated^N dMhasbeenshowntobeexactlyV(1;M) M,theproportionofpasswordsinthesampleobservedonlyonce[42].11ThiscanbeseenbecauseininexactlyV(1;M) Mofallpossibleorderingsthatthesamplemayhavebeencollectedwillthelastobservationhavebeenanewitem.Forourfullsample,V(1;M) M=42:5%,indicatingthatalargersamplewouldcontinuetondmanynewpasswordsandhencelargerestimatesforH0,H1,G1etc.Similarly,forarandomsubsampleofourdata,manypasswordswillbemissedandestimatesofthesemetricswilldecrease.Interpretinghapaxlegomenaisafundamentalprobleminstatisticsandtherearetherearenoknownnon-parametrictechniquesforestimatingthetruedistributionsizeN[42].Thisisanotmerelyatheoreticalrestriction;inthecaseofpasswordsdeterminingthatapparentlypseudorandompass-wordsreallyare128-bitrandomstringswouldrequireanut-terlyintractablesamplesizemanytimesgreater2128.Good-Turingtechniques[43]aren'thelpfulforthedistribution-widestatisticsweareinterestedin;theycanonlyestimatethecumulativeprobabilityofallunobservedevents(themissingmass)andprovidedampedmaximum-likelihoodestimatesoftheprobabilityofindividualevents.Fortunately,inpracticewecanusefullyapproximateourguessingmetricsfromreasonably-sizedsamples;thoughtheseestimationsimplicitlyrelyonassumptionsabouttheunderlyingnatureofthepassworddistribution.AsseeninFigure3,partialguessingmetricswhichrelyonlyonthemore-frequentitemsinthedistributionaretheeasiesttoapproximate,whilethosewhichrelyonasummationovertheentiredistributionsuchasH0;H1and~;~Gforlargevaluesofwillbethemostdifcult.A.TheregionofstabilityWecanreliablyestimatepiforeventswithobservedfrequencyfi1duetothelawoflargenumbers.EstimatingH1requiresestimatingonlyp1,theprobabilityofthemostcommonpassword,whichwas1.08%inourdataset.Gaussianstatisticscanbeusedtoestimatethestandarderrorofthemaximum-likelihoodestimate^pi:error(^pi)=r pi(1pi) M1 pir fi M2M fi=1 p fiForourdataset,thisgivesastandarderrorofunder0.1bitin^H1forM214.Thisargumentextendsto^~forsmall11Eventsobservedonlyonceinasamplearecalledhapaxlegomenainlinguistics,Greekforsaidonlyonce. Figure4.EstimatedguessingcurveswithreducedsamplesizeM.Subsampleswerecomputedrandomlywithoutreplacement,tosimulatehavingstoppedthecollectionexperimentearlier.Afterthemaximumcondencepoint6;therearetwo(almostindistinguishable)dashedplotsrepresentingthe1stand99thpercentilesfrom1,000randomsamples.valuesofandinpracticewecanmeasureresistancetoonlineguessingwithrelativelymodestsamplesize.Reasoningabouttheerrorin^~and^~Gforvaluesofwhichrepresentrealisticbrute-forceattacksismoredifcult.Fortunately,weobservethatforourpassworddatasetthenumberofeventsV(f;M)whichoccurftimesinasampleofsizeMisveryconsistentforsmallfandprovidesareasonableestimateofthenumberofeventswithprobabilityf0:5 Mpf+0:5 Minourfulldataset.12Thisenablesausefulheuristicthat~and~Gwillbewellapproximatedwhenissmallenoughtoonlyrelyoneventsoccurringgreaterthansomesmallfrequencyf.Callingfthecumulativeestimatedprobabilityofalleventsoccurringatleastftimes,wetook1,000randomsamplesofourcorpuswithM=219andobservedthefollowingvaluesinthe1stand99thpercentiles:f 678 f 0.1620.1630.1530.1540.1450.146~f^~f 0.1570.1800.1250.1480.1030.127~Gf^~Gf 0.1550.1760.1230.1460.1010.126WeobservedverysimilarvaluesforlargervaluesofM.Thus,wewilluse^~;^~Gdirectlyfor6forrandomsubsamplesofourdata.TheutilityofthisheuristicisseeninFigure3,whereitaccuratelypredictsthepointatwhich~0:25stabilizes,andinFigure4,whereitmarksthepointbelowwhich~isinaccurateforvaryingM.12V(f;M)willalmostalwaysoverestimatethisvaluebecausemorelow-probabilityeventswillberandomlyover-representedthantheconverse. B.ParametricextensionofourapproximationsEstimating~and~Gforhigherrequiresdirectlyassumingamodelfortheunderlyingpassworddistribution.Passwordshavebeenconjecturedtofollowapower-lawdistribution13where:Pr[p(x)y]/y1a(12)Unfortunately,usingapower-lawdistributionisproblematicfortworeasons.First,estimatesforthescaleparameteraareknowntodecreasesignicantlywithsamplesize[42].Usingmaximum-likelihoodttingtechniques[44]forourobservedcountdatawegetthefollowingestimates:M69M10M1M100k ^a2.993.233.704.21Asecondproblemisthismodeltsourobserved,integercounts.Tocorrectlyestimate~fromsamples,weneedtomodelthepresenceofpasswordsforwhichpiM1.Powerlawdistributionsrequireassuminganon-zeromini-mumpasswordprobabilitya-priori[44],whichwehavenomeaningfulwayofdoing.Insteadweneedamodel (p)forthedistributionofpasswordprobabilities,anapproachtakenbylinguistsformodelingwordfrequencies[45].Wemodeltheprobabilityofobservingapasswordktimesusingamixture-model:rstwedrawapasswordprobabilityprandomlyaccordingtotheprobabilitydensityfunction (p),thenwedrawfromaPoissondistributionwithexpectationpMtomodelthenumberoftimesweobservethispassword:Pr[kobs.]=R10(pM)kepM k! (p)dp 1R10epM (p)dp(13)Thenumeratorintegratesthepossibilityofseeingapass-wordwithprobabilitypexactlyktimes,weightedbytheprobability (p)ofapasswordhavingprobabilityp.Thedenominatorcorrectsfortheprobabilityofnotobservingapasswordatall.Thisformulationallowsustotakeasetofcountsfromasampleff1;f2;:::gandndtheparametersfor (p)whichmaximizethelikelihoodofourobservations:Likelihood=^NYi=1Pr[fiobs.](14)Thismodelhasbeeneffectivelyappliedtowordfrequen-ciesusingthegeneralizedinverse-Gaussiandistribution:14 (pjb;c;g)=2g1pg1ep cb2c 4p (bc)gKg(b)(15)whereKgisthemodiedBesselfunctionofthesecondkind.13Power-lawdistributionsarealsocalledParetoorZipandistributions,whichcanallbeshowntobeequivalentformulations[42].14Thecombinedgeneralizedinverse-Gaussian-PoissonmodelwhichweadoptisalsocalledtheSicheldistributionafteritsinitialusebySichelin1975tomodelwordfrequencies[46]. Figure5.Extrapolatedestimatesfor~usingthegeneralizedinverseGaussian-Poissondistribution.Comparedtonaiveestimates(Figure4)theeffectofsamplesizearemitigated.Eachplotshowsthe99%condenceintervalfrom1,000randomsubsamples.Errorfromlackoftofthemodeldwarfserrorduetotherandomnessofeachsample.Thegeneralizedinverse-Gaussianisusefulbecauseitblendsbothpower-lawpg1andexponentialep cb2c 4pbehaviorandproducesawell-formedprobabilitydistribu-tion.BypluggingEquation15intoEquation13for andsolvingtheintegral,weobtain:Pr[kjb;c;g]=(1 2bcn p 1+cn)rKr+g(bp 1+cn) r!(1+cn)g 2Kg(b)Kg(bp 1+cn)(16)Thoughunwieldy,wecancomputeEquation14usingEquation15fordifferentparametersofb;c;g.Fortunately,forb0;c0;g0thereisonlyonemaximumofthisfunction[45],whichenablesapproximationofthemaximum-likelihoodtefcientlybygradientdescent.Wecanusethismodeltoproduceanextrapolateddistribu-tion,removingallobservedpasswordswithfi6toleavethewell-approximatedregionofthedistributionunchangedandaddingsyntheticpasswordsaccordingtoourestimatedmodel (p).Thisisachievedbydividingtheregion0;6 Mintodiscretebins,withincreasinglysmallbinsnearthevaluep+whichmaximizes (p+).Intoeachbin(pj;pj+1)weinsert^NRpj+1pj (p)dpeventsofobservedfrequencypj+pj+1 2M.Wethennormalizetheprobabilityofallsyntheticeventsbymultiplyingthecorrectionfactor1 f6R16 M (p)dptoleavetheheadofthedistributionintact.Figure5plotsthe1stand99thpercentileof~forextrapolationsofrandomsubsamplesofourdata.Weuse~becauseitisstrictlyless-wellapproximatedthan~G,whichisweightedslightlymoretowardswell-approximatedeventsinthedistribution.Somekeyvaluesare: dictionary Chinese German Greek English French Indonesian Italian Korean Portuguese Spanish Vietnamese global minimax target Chinese 4.4%1.9%2.7%2.4%1.7%2.0%2.0%2.9%1.8%1.7%2.0% 2.9%2.7% German 2.0%6.5%2.1%3.3%2.9%2.2%2.8%1.6%2.1%2.6%1.6% 3.5%3.4% Greek 9.3%7.7%13.4%8.4%7.4%8.1%8.0%8.0%7.7%7.8%7.7% 8.6%8.9% English 4.4%4.6%3.9%8.0%4.3%4.5%4.3%3.4%3.5%4.2%3.5% 7.9%7.7% French 2.7%4.0%2.9%4.2%10.0%2.9%3.2%2.2%3.1%3.4%2.1% 5.0%4.9% Indonesian 6.7%6.3%6.5%8.7%6.3%14.9%6.2%5.8%6.0%6.2%5.9% 9.3%9.6% Italian 4.0%6.0%4.6%6.3%5.3%4.6%14.6%3.3%5.7%6.8%3.2% 7.2%7.1% Korean 3.7%2.0%3.0%2.6%1.8%2.3%2.0%5.8%2.4%1.9%2.2% 2.8%3.0% Portuguese 3.9%3.9%4.0%4.3%3.8%3.9%4.4%3.5%11.1%5.8%2.9% 5.1%5.3% Spanish 3.6%5.0%4.0%5.6%4.6%4.1%6.1%3.1%6.3%12.1%2.9% 6.9%7.0% Vietnamese 7.0%5.7%6.2%7.7%5.8%6.3%5.7%6.0%5.8%5.5%14.3% 7.8%8.3% TableVLANGUAGEDEPENDENCYOFPASSWORDGUESSING.EACHCELLINDICATESTHESUCCESSRATEOFAGUESSINGATTACKWITH1000ATTEMPTSUSINGADICTIONARYOPTIMALFORUSERSREGISTEREDATYAHOO!WITHDIFFERENTPREFERREDLANGUAGES.mostsubsets.Thegreatestefciencylossforanysubsetwhenusingthegloballistisonly2.2,forPortugueselanguagepasswords.Wecanimprovethisslightlyfurtherbyconstructingaspecialdictionarytobeeffectiveagainstallsubsets.Wedothisbyrepeatedlychoosingthepasswordforwhichthelowestpopularityinanysubsetismaximalandcallittheminimaxdictionary,alsoseeninTableV.Thisdictionaryperformsverysimilarlytotheglobaldictionary,reducingthemaximumefciencylosstoafactor2.1,alsoforPortugueselanguagepasswords.Diggingintoourdatawendglobalpasswordswhicharepopularacrossallsubgroupsweobserved.Thesinglemostpopularpasswordweobserved,forexample,occurredwithprobabilityatleast0.14%ineverysubpopulation.Someoverallpopularpasswordswereveryrareincertainsubpop-ulations.Forexample,thethirdmostcommonpassword,withoverallprobability0.1%,occurrednearly100timeslessfrequentlyinsomesubpopulations.However,therewereeightpasswordswhichoccurredwithprobabilityatleast0.01%ineverysubpopulation.Withoutaccesstotherawpasswords,wecanonlyspeculatethatthesearenumericpasswordsasthesearepopular21andinternationalizewell.Despitetheexistenceofgloballypopularpasswords,however,westillconcludethatdictionaryspecicitycanhavesurprisinglylargeresults.Forexample,thefollowingtableshowsefciencylossesofupto25%fromdictionariestailoredtopeoplefromdifferentEnglish-speakingcountries: dictionary global us uk ca au target us 8.2%6.6%7.4%7.2% 8.1% uk 5.4%6.9%5.5%5.6% 5.5% ca 8.8%7.9%9.9%8.7% 8.8% au 7.4%7.2%7.6%8.8% 7.5% 21WithintheRockYoudataset,123456wasthemostpopularpasswordand5othernumber-onlypasswordswereamongstthetopten.Weobservecomparableefciencylossesbasedonage: dictionary 1320 2134 3554 55+ global target 1320 8.4%7.8%7.1%6.5% 7.9% 2134 7.3%7.9%7.3%6.7% 7.8% 3554 5.4%5.8%6.4%6.1% 6.2% 55+ 5.4%5.8%6.8%7.3% 6.5% Weevenobserveefciencylossesbasedonserviceusage: dictionary retail chat media mail global target retail 7.0%5.6%6.6%5.6% 6.0% chat 6.9%8.4%7.8%8.3% 8.3% media 5.7%5.6%6.0%5.6% 5.8% mail 6.7%8.0%7.5%8.2% 8.1% VII.CONCLUDINGREMARKSByestablishingsoundmetricsandrigorouslyanalyzingthelargestpasswordcorpustodate,wehopetohavecontributedbothtoolsandnumbersoflastingsignicance.Asaruleofthumbforsecurityengineers,passwordsprovideroughlyequivalentsecurityto10-bitrandomstringsagainstanoptimalonlineattackertryingafewpopularguessesforlargelistofaccounts.Inotherwords,anattackerwhocanmanage10guessesperaccount,typicallywithintherealmofrate-limitingmechanisms,willcompromisearound1%ofaccounts,justastheywouldagainstrandom10-bitstrings.Againstanoptimalattackerperformingunrestrictedbruteforceandwantingtobreakhalfofallavailableaccounts,passwordsappeartoberoughlyequivalentto20-bitrandomstrings.Thismeansthatnopracticalamountofiteratedhashingcanpreventanadversaryfrombreakingalargenumberofaccountsgiventheopportunityforofine [20]W.E.Burr,D.F.Dodson,andW.T.Polk,ElectronicAuthenticationGuideline,NISTSpecialPublication800-63,2006.[21]D.FlorencioandC.Herley,Alarge-scalestudyofwebpasswordhabits,inWWW'07:Proceedingsofthe16thInternationalConferenceontheWorldWideWeb.ACM,2007,pp.657666.[22]R.Shay,S.Komanduri,P.G.Kelley,P.G.Leon,M.L.Mazurek,L.Bauer,N.Christin,andL.F.Cranor,Encoun-teringStrongerPasswordRequirements:UserAttitudesandBehaviors,inSOUPS'10:Proceedingsofthe6thSymposiumonUsablePrivacyandSecurity.ACM,2010.[23]P.G.Kelley,S.Komanduri,M.L.Mazurek,R.Shay,T.Vidas,L.Bauer,N.Christin,L.F.Cranor,andJ.Lopez,Guessagain(andagainandagain):Measuringpasswordstrengthbysimulatingpassword-crackingalgorithms,CarnegieMellonUniversity,Tech.Rep.CMU-CyLab-11-008,2011.[24]J.Yan,A.Blackwell,R.Anderson,andA.Grant,PasswordMemorabilityandSecurity:EmpiricalResults,IEEESecu-rityandPrivacyMagazine,vol.2,no.5,pp.2534,2004.[25]B.Stone-Gross,M.Cova,L.Cavallaro,B.Gilbert,M.Szyd-lowski,R.Kemmerer,C.Kruegel,andG.Vigna,Yourbotnetismybotnet:Analysisofabotnettakeover,inCCS'09:Proceedingsofthe16thACMConferenceonComputerandCommunicationsSecurity.ACM,2009,pp.635647.[26]C.E.Shannon,AMathematicalTheoryofCommunication,inBellSystemTechnicalJournal,vol.7,1948,pp.379423.[27]C.Cachin,Entropymeasuresandunconditionalsecurityincryptography,Ph.D.dissertation,ETHZ¨urich,1997.[28]J.O.Pliam,OntheIncomparabilityofEntropyandMarginalGuessworkinBrute-ForceAttacks,inProgressinCryptology-INDOCRYPT2000,2000.[29]S.Boztas,Entropies,Guessing,andCryptography,Depart-mentofMathematics,RoyalMelbourneInstituteofTechnol-ogy,Tech.Rep.6,1999.[30]J.L.Massey,GuessingandEntropy,inProceedingsofthe1994IEEEInternationalSymposiumonInformationTheory,1994,p.204.[31]A.R´enyi,Onmeasuresofinformationandentropy,Pro-ceedingsofthe4thBerkeleySymposiumonMathematics,StatisticsandProbability,pp.547561,1961.[32]R.V.Hartley,TransmissionofInformation,BellSystemTechnicalJournal,vol.7,no.3,pp.535563,1928.[33]J.Bonneau,S.Preibusch,andR.Anderson,Abirthdaypresenteveryelevenwallets?Thesecurityofcustomer-chosenbankingPINs,FC'12:The16thInternationalConferenceonFinancialCryptographyandDataSecurity,2012.[34]J.Bonneau,M.Just,andG.Matthews,What'sinaname?Evaluatingstatisticalattacksagainstpersonalknowledgequestions,FC'10:The14thInternationalConferenceonFinancialCryptographyandDataSecurity,2010.[35]S.BrostoffandA.Sasse,Tenstrikesandyou'reout:In-creasingthenumberofloginattemptscanimprovepasswordusability,inProceedingsofCHI2003WorkshoponHCIandSecuritySystems.JohnWiley,2003.[36]J.BonneauandS.Preibusch,Thepasswordthicket:technicalandmarketfailuresinhumanauthenticationontheweb,WEIS'10:Proceedingsofthe9thWorkshopontheEconomicsofInformationSecurity,2010.[37]M.Alsaleh,M.Mannan,andP.vanOorschot,RevisitingDefensesAgainstLarge-ScaleOnlinePasswordGuessingAttacks,IEEETransactionsonDependableandSecureCom-puting,vol.9,no.1,pp.128141,2012.[38]S.Egelman,J.Bonneau,S.Chiasson,D.Dittrich,andS.Schechter,ItsNotStealingIfYouNeedIt:Ontheethicsofperformingresearchusingpublicdataofillicitorigin(paneldiscussion),WECSR'12:The3rdWorkshoponEthicsinComputerSecurityResearch,2012.[39]B.Kaliski,RFC2898:PKCS#5:Password-BasedCryptog-raphySpecicationVersion2.0,IETF,2000.[40]D.E.DenningandP.J.Denning,Thetracker:athreattostatisticaldatabasesecurity,ACMTransactionsonDatabaseSystems,vol.4,pp.7696,1979.[41]A.NarayananandV.Shmatikov,HowToBreakAnonymityoftheNetixPrizeDataset,eprintarXiv:cs/0610105,2006.[42]H.R.Baayen,WordFrequencyDistributions,ser.Text,SpeechandLanguageTechnology.Springer,2001.[43]W.A.Gale,Good-Turingsmoothingwithouttears,JournalofQuantitativeLinguistics,vol.2,1995.[44]A.Clauset,C.R.Shalizi,andM.E.J.Newman,Power-LawDistributionsinEmpiricalData,SIAMRev.,vol.51,pp.661703,2009.[45]M.Font,X.Puig,andJ.Ginebra,ABayesiananalysisoffrequencycountdata,JournalofStatisticalComputationandSimulation,2011.[46]H.Sichel,Onadistributionlawforwordfrequencies,JournaloftheAmericanStatisticalAssociation,1975.[47]D.Davis,F.Monrose,andM.K.Reiter,OnUserChoiceinGraphicalPasswordSchemes,inProceedingsofthe13thUSENIXSecuritySymposium,2004.[48]S.Wiedenbeck,J.Waters,J.-C.Birget,A.Brodskiy,andN.Memon,PassPoints:designandlongitudinalevaluationofagraphicalpasswordsystem,InternationalJournalofHuman-ComputerStudies,vol.63,pp.102127,2005.[49]Y.Zhang,F.Monrose,andM.K.Reiter,Thesecurityofmodernpasswordexpiration:analgorithmicframeworkandempiricalanalysis,inCCS'10:Proceedingsofthe17thACMConferenceonComputerandCommunicationsSecurity.ACM,2010,pp.176186.[50]C.Herley,P.vanOorschot,andA.S.Patrick,Passwords:IfWe'reSoSmart,WhyAreWeStillUsingThem?FC'09:The13thInternationalConferenceonFinancialCryptographyandDataSecurity,2009.
© 2021 docslides.com Inc.
All rights reserved.