camacuk Abstract We report on the largest corpus of userchosen passwords ever studied consisting of anonymized password histograms representing almost 70 million Yahoo users mit igating privacy concerns while enabling analysis of dozens of subpopulat ID: 26326
Download Pdf The PPT/PDF document "The science of guessing analyzing an ano..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
(a)Historicalcrackingefciency,rawdictionarysize (b)Historicalcrackingefciency,equivalentdictionarysizeFigure1.ThesizeofcrackingdictionariesisplottedlogarithmicallyagainstthesuccessrateachievedinFigure1a.InFigure1b,thedictionarysizesareadjustedtoincorporatetheinherentneedformoreguessestocrackmorepasswords.Circlesandsolidlinesrepresentoperatingsystemuserpasswords,squaresanddashedlinesrepresentwebpasswords.II.HISTORICALEVALUATIONSOFPASSWORDSECURITYIthaslongbeenofinteresttoanalyzehowsecurepass-wordsareagainstguessingattacks,datingatleasttoMor-risandThompson'sseminal1979analysisof3,000pass-words[3].Theyperformedarudimentarydictionaryattackusingthesystemdictionaryandall6-characterstringsandrecovered84%ofavailablepasswords.Theyalsoreportedsomebasicstatisticssuchaspasswordlengths(71%were6charactersorfewer)andfrequencyofnon-alphanumericcharacters(14%ofpasswords).Thesetwoapproaches,pass-wordcrackingandsemanticevaluation,havebeenthebasisfordozensofstudiesinthethirtyyearssince.A.CrackingevaluationThefamous1988Morriswormpropagatedinpartbyguessingpasswordsusinga350-wordpassworddictionaryandseveralrulestomodifypasswords[9].ThepublicitysurroundingthewormmotivatedindependentstudiesbyKleinandSpaffordwhichre-visitedpasswordguessing[4],[5].Bothstudiesbroke2224%ofpasswordsusingmoreso-phisticateddictionariessuchaslistsofnames,sportsteams,moviesandsoforth.Passwordcrackingevolvedrapidlyintheyearsafterthesestudies,withdedicatedsoftwaretoolslikeJohntheRipperemerginginthe1990swhichutilizemanglingrulestoturnasinglepasswordlikejohnintovariantslikeJohn,J0HN,andnhoj.[10].Researchonmanglingruleshascontinuedtoevolve;thecurrentstateoftheartbyWeiretal.[11]automaticallygeneratesmanglingrulesfromalargetrainingsetofknownpasswords.Laterstudieshaveoftenutilizedthesetoolstoperformdictionaryattacksasasecondarygoal,suchasWu'sstudyofpasswordcrackingagainstKerberosticketsin1999[12]andKuoetal.'sstudyofmnemonicpasswordsin2006[13],whichrecovered8%and11%ofpasswords,respectively.Recently,large-scalepasswordleaksfromcompromisedwebsiteshaveprovidedanewsourceofdataforcrackingevaluations.Forexample,Schneieranalyzedabout50,000passwordsobtainedviaphishingfromMySpacein2006[6].Amorein-depthstudywasconductedbyDell'Amicoetal.,whostudiedtheMySpacepasswordsaswellasthoseoftwootherwebsitesusingalargevarietyofdifferentdictionaries[7].Averylargedatasetof32MpasswordsleakedfromRockYouin2009,whichWeiretal.studiedtoexaminetheeffectsofpassword-compositionrulesoncrackingefciency[8].Reportednumbersonpasswordcrackingefciencyvarysubstantiallybetweendifferentstudies,asshowninFig-ure1.Moststudieshavebroken2050%ofaccountswithdictionarysizesintherangeof220230.Allstudiesseediminishingreturnsforlargerdictionaries.ThisisclearinFigure1b,whichadjustsdictionarysizesbasedontheper-centageofpasswordscrackedsothatthedegreeofupwardslopereectsonlydecreasingefciency.ThisconceptwillmotivateourstatisticalguessingmetricsinSectionIII-E.Thereislittledataontheefciencyofsmalldictionariesasmoststudiesemploythelargestdictionarytheycanprocess.Klein'sstudy,whichattemptedtoidentifyhighlyefcientsub-dictionaries,isanotableexception[4].ThereisalsolittledataonthesizeofdictionaryrequiredtobreakalargemajorityofpasswordsonlyMorrisandThompsonbrokemorethan50%ofavailablepasswords1andtheirresultsmaybetoodatedtoapplytomodernpasswords.B.SemanticevaluationsInadditiontocrackingresearch,therehavebeenmanystudiesonthesemanticsofpasswordswithpsychologists1A2007studybyCazierandMedlinclaimedtobreak99%ofpasswordsatane-commercewebsite,butdetailsofthedictionaryweren'tgiven[14]. yearstudylength%digits%special 1989Riddleetal.[15]4.43.51992Spafford[5]6.831.714.81999Wu[12]7.525.74.11999ZviranandHaga[18]5.719.20.72006CazierandMedlin[14]7.435.01.32009RockYouleak[19]7.954.03.7 TableICOMMONLYESTIMATEDATTRIBUTESOFPASSWORDSandlinguistsbeinginterestedaswellascomputersecurityresearchers.Thisapproachcanbedifcultasiteitherre-quiresusersurveys,whichmayproduceunrealisticpasswordchoices,ordirectaccesstounhashedpasswords,whichcarriesprivacyconcerns.Riddleetal.performedlinguisticanalysisof6,226passwordsin1989,classifyingthemintocategoriessuchasnames,dictionarywords,orseeminglyrandomstrings[15].Cazieretal.repeatedthisprocessin2006andfoundthathard-to-classifypasswordswerealsothehardesttocrack[14].PasswordstructurewasformallymodeledbyWeiretal.[11]usingacontext-freegrammartomodeltheprob-abilityofdifferentconstructionsbeingchosen.Passwordcreationhasalsobeenmodeledasacharacter-by-characterMarkovprocess,rstbyNarayananandShmatikov[16]forpasswordcrackingandlaterbyCastellucciaetal.[17]totrainapro-activepasswordchecker.Thusmethodologyforanalyzingpasswordstructurehasvariedgreatly,butafewbasicdatapointslikeaveragelengthandtypesofcharactersusedaretypicallyreported,assummarizedinTableI.Theestimatesvarysowidelythatitisdifculttoinfermuchwhichisusefulinsystemsdesign.Themaintrendsareatendencytowards6-8charactersoflengthandastrongdislikeofnon-alphanumericcharactersinpasswords.2Manystudieshavealsoattemptedtodeterminethenumberofuserswhichappeartobechoosingrandompasswords,oratleastpasswordswithoutanyobviousmean-ingtoahumanexaminer.Methodologiesforestimatingthisvaryaswell,butmoststudiesputitinthe1040%range.Elementsofpasswordstructure,suchlengthorthepres-enceofdigits,upper-case,ornon-alphanumericcharacterscanbeusedtoestimatethestrengthofapassword,oftenmeasuredinbitsandoftenreferredtoimpreciselyasentropy.3Thisusagewascementedbythe2006FIPSElectronicAuthenticationGuideline[20],whichprovidedaroughruleofthumbforestimatingentropyfrompassword2Itisoftensuggestedthatusersavoidcharacterswhichrequiremultiplekeystotype,butthisdoesn'tseemtohavebeenformallyestablished.3Thisterminologyismathematicallyincorrectbecauseentropy(seeSectionsIII-AandIII-B)measuresacompleteprobabilitydistribution,notasingleevent(password).Thecorrectmetricforasingleeventisself-information(orsurprisal).Thisisperhapsdisfavoredbecauseitiscounter-intuitive:passwordsshouldavoidincludinginformationlikenamesoraddresses,sohigh-informationpasswordssoundweak.characteristicssuchaslengthandtypeofcharactersused.Thisstandardhasbeenusedinseveralpasswordstudieswithtoofewsamplestocomputestatisticsontheentiredistribution[21][23].Moresystematicformulashavebeenproposed,suchasonebyShayetal.[22]whichaddsentropyfromdifferentelementsofapassword'sstructure.C.ProblemswithpreviousapproachesThreedecadesofworkonpasswordguessinghaspro-ducedsophisticatedcrackingtoolsandmanydisparatedatapoints,butanumberofmethodologicalproblemscontinuetolimitscienticunderstandingofpasswordsecurity:1)Comparability:Authorsrarelyreportcrackingresultsinaformatwhichisstraightforwardtocomparewithpre-viousbenchmarks.Toourknowledge,Figure1istherstcomparisonofdifferentdatapointsofdictionarysizeandsuccessrate,thoughdirectcomparisonisdifcultsinceauthorsallreportefciencyratesfordifferentdictionarysizes.Passwordcrackingtoolsonlylooselyattempttoguesspasswordsindecreasingorderoflikeliness,introducingim-precisionintoreporteddictionarysizes.Worse,somestudiesreporttherunningtimeofcrackingsoftwareinsteadofdictionarysize[14],[24],[25],makingcomparisondifcult.2)Repeatability:Preciselyreproducingpasswordcrack-ingresultsisdifcult.JohntheRipper[10],usedinmostpublicationsofthepastdecade,hasbeenreleasedin21dif-ferentversionssince2001andmakesavailable20separatewordlistsforuse(alongwithmanyproprietaryones),inadditiontomanycongurationoptions.Otherstudieshaveusedproprietarypassword-crackingsoftwarewhichisn'tavailabletotheresearchcommunity[6],[14].Thusnearlyallstudiesusedictionariesvaryingincontentandordering,makingitdifculttoexactlyre-createapublishedattacktocompareitseffectivenessagainstanewdataset.3)Evaluatordependency:Password-crackingresultsareinherentlydependentontheappropriatenessofthedictionaryandmanglingrulestothedatasetunderstudy.Dell'Amicoetal.[7]demonstratedthisproblembyapplyinglanguage-specicdictionariestodatasetsofpasswordsindifferentlanguagesandseeingefciencyvaryby23ordersofmagnitude.TheyalsoevaluatedthesamedatasetasSchneierthreeyearsearlier[6]andachievedtwoordersofmagnitudebetterefciencysimplybychoosingabetterwordlist.Thusitisdifculttoseparatetheeffectsofmore-carefullychosenpasswordsfromtheuseofalessappropriatedictionary.Thisisparticularlychallengingindata-slicingexperiments[8],[23]whichrequiresimulatinganequallygooddictionaryattackagainsteachsubpopulation.4)Unsoundness:Estimatingtheentropyofapassworddistributionfromstructuralcharacteristicsismathematicallydubious,aswewilldemonstrateinSectionIII-D,andin-herentlyrequiresmakingmanyassumptionsaboutpasswordselection.Inpractice,entropyestimateshaveperformedpoorlyaspredictorsofempiricalcrackingdifculty[8],[23]. III.MATHEMATICALMETRICSOFGUESSINGDIFFICULTYDuetotheproblemsinherenttopasswordcrackingsimu-lationsorsemanticevaluation,weadvocatesecuritymetricsthatrelyonlyonthestatisticaldistributionofpasswords.Whilethisapproachrequireslargedatasets,iteliminatesbiasfrompassword-crackingsoftwarebyalwaysmodelingabest-caseattacker,allowingustoassessandcomparetheinherentsecurityofagivendistribution.Mathematicalnotation:Wedenoteaprobabilitydistribu-tionwithacalligraphicletter,suchasX.Weuselower-casextorefertoaspeciceventinthedistribution(anindividualpassword).Theprobabilityofxisdenotedpx.Formally,adistributionisasetofeventsx2X,eachwithanassociatedprobability0px1,suchthatPpx=1.WeuseNtodenotethetotalnumberofpossibleeventsinX.Weoftenrefertoeventsbytheirindexi,thatis,theirrankbyprobabilityinthedistributionwiththemostprobablehavingindex1andtheleastprobablehavingindexN.Werefertotheithmostcommoneventasxiandcallitsprob-abilitypi.Thus,theprobabilitiesoftheeventsinXformamonotonicallydecreasingsequencep1p2:::pN.WedenoteanunknownvariableasX,denotingXR XifitisdrawnatrandomfromX.Guessingmodel:Wemodelpasswordselectionasaran-domdrawXR XfromanunderlyingpassworddistributionX.ThoughXwillvarydependingonthepopulationofusers,weassumethatXiscompletelyknowntotheattacker.Givena(possiblysingleton)setofunknownpasswordsfX1;X2;:::Xkg,wewishtoevaluatetheefciencyofanattackertryingtoidentifytheunknownpasswordsXigivenaccesstoanoracleforqueriesoftheformisXi=x?A.ShannonentropyIntuitively,wemayrstthinkoftheShannonentropy:H1(X)=NXi=1pilgpi(1)asameasureoftheuncertaintyofXtoanattacker.IntroducedbyShannonin1948[26],entropyappearstohavebeenportedfromcryptographicliteratureintostudiesofpasswordsbeforebeingusedinFIPSguidelines[20].IthasbeendemonstratedthatH1ismathematicallyinap-propriateasameasureguessingdifculty[27][30].ItinfactquantiestheaveragenumberofsubsetmembershipqueriesoftheformIsX2S?forarbitrarysubsetsSXneededtoidentifyX.4Foranattackerwhomustguessindividualpasswords,Shannonentropyhasnodirectcorrelationtoguessingdifculty.54TheproofofthisisastraightforwardconsequenceofShannon'ssourcecodingtheorem[26].SymbolsXR XcanbeencodedusingaHuffmancodewithaveragebitlengthH1(X)+1,ofwhichtheadversarycanlearnonebitatatimewithsubsetmembershipqueries.5H1hasfurtherbeenclaimedtocorrelatepoorlywithpasswordcrackingdifculty[8],[23],thoughtheestimatesofH1usedcannotbereliedupon.B.R´enyientropyanditsvariantsR´enyientropyHnisageneralizationofShannonen-tropy[31]parametrizedbyarealnumbern0:6Hn(X)=1 1nlg NXi=1pni!(2)Inthelimitasn!1,R´enyientropyconvergestoShannonentropy,whichexplainswhyShannonentropyisdenotedH1.NotethatHnisamonotonicallydecreasingfunctionofn.Wearemostinterestedintwospecialcases:1)HartleyentropyH0:Forn=0,R´enyientropyis:H0=lgN(3)IntroducedpriortoShannonentropy[32],H0measuresonlythesizeofadistributionandignorestheprobabilities.2)Min-entropyH1:Asn!1,R´enyientropyis:H1=lgp1(4)Thismetricisonlyinuencedbytheprobabilityofthemostlikelysymbolinthedistribution,hencethename.Thisisausefulworst-casesecuritymetricforhuman-chosendistributions,demonstratingsecurityagainstanattackerwhoonlyguessesthemostlikelypasswordbeforegivingup.H1isalowerboundforallotherR´enyientropiesandindeedallofthemetricswewilldene.C.GuessworkAmoreapplicablemetricistheexpectednumberofguessesrequiredtondXiftheattackerproceedsinoptimalorder,knownasguessworkorguessingentropy[27],[30]:G(X)=Eh#guesses(XR X)i=NXi=1pii(5)BecauseGincludesallprobabilitiesinX,itmodelsanattackerwhowillexhaustivelyguessevenexceedinglyun-likelyeventswhichcanproduceabsurdresults.Forexample,intheRockYoudatasetovertwentyusers(morethan1in221)appeartouse128-bitpseudorandomhexadecimalstringsaspasswords.ThesepasswordsaloneensurethatG(RockYou)2106.ThusGprovideslittleinsightintopracticalattacksandfurthermoreisdifculttoestimatefromsampleddata(seeSectionV).D.PartialguessingmetricsGuessworkandentropymetricsfailtomodelthetendencyofreal-worldattackerstoceaseguessingagainstthemostdifcultaccounts.AsdiscussedinSectionII,crackingevaluationstypicallyreportthefractionofaccountsbrokenbyagivenattackandexplicitlylookforweaksubspacesofpasswordstoattack.Havingmanyaccountstoattackisan6R´enyientropyistraditionallydenotedH;weuseHntoavoidconfusionwithourprimaryuseofasadesiredsuccessrate. (a);G(numberofguesses) (b);G(effectivekey-length)Figure2.Twowaysofcomparingtheguessingdifcultyofuser-chosen4-digitPINs[33]againstuniformdistributionsofsize10,000and1,000(U104andU103,respectively).Fig.2aplotsthedictionarysizeneededtohaveachanceofsuccessaswellastheexpectednumberofguessesperaccountG.Fig.2bconvertsbothmetricsintoaneffectivekey-length,enablingvisualcomparisonacrosstheentirerangeof.Traditionalsingle-pointmetricsH0;H1;H2;H1and~Garealsomarkedforcomparison.Notethat~and~Garehorizontallinesforuniformdistributions;anattackergainsnoefciencyadvantagefromloweringhisdesiredsuccessrate.mH1(X)and~(X)+m~G1(X)foranyseparationparameterm.Furthermore,forany12adistributionXcanbefoundwith~1(X)+m~2(X)foranym.Theseresultseasilyextendto~GusingtheboundslistedinTableIIandrelatedresultscanbeprovedfor~(X). equivalences 8nHn(UN)=lgNallmetricsequalforU8~(UN)=lgNallmetricsequalforU8~(UN)=lgNallmetricsequalforU8~G(UN)=lgNallmetricsequalforUH0=~1=~N=lgNmetricsdependingonlyonNH1=~p1=~1=lgp1metricsdependingonlyonp1 bounds H1~G;~;~H1isabs.lowerbound~G;~;~H0H0isabs.upperbound~G~straightforwardproof~G~lg(1)straightforwardproof monotonicity H1:::H1H0Hndecreasingwithn~~+~increasingwith~~+~increasingwith~G~G+~Gincreasingwith TableIIRELATIONSBETWEENGUESSINGMETRICSG.ApplicationinpracticalsecurityevaluationForanonlineattackerwecanuse~withequaltotheguessinglimitsimposedbythesystem.Thereisnostandardfor,with10guessesrecommendedbyusabilitystud-ies[35],3byFIPSguidelines[20],andavarietyofvalues(often1)seeninpractice[36].Sophisticatedrate-limitingschemesmayallowaprobabilisticnumberofguesses[37].Weconsider~10areasonablebenchmarkforresistancetoonlineguessing,though~1=H1isaconservativechoiceasalowerboundforallmetricsproposed.TheseparationresultsofSectionIII-Fmeanthatforbrute-forceattackswecan'trelyonanysinglevalueof;eachvalueprovidesinformationaboutafundamentallydifferentattackscenario.Foracompletepicture,wecanconsider~or~Gacrossallvaluesof.Wecanplotthisastheguessingcurveforadistribution,asseeninFigure2.Forofineattacks,whereanadversaryislimitedonlybytimeandcomputingpower,wemightconsider~or~Gforastandardvaluesuchas0.5asabenchmark(~0:5wasorig-inallysuggestedby[29]).While~Gmoredirectlymeasurestheefciencyofaguessingattack,~canbeadvantageousinpracticebecauseitissimplertocompute.Inparticular,itcanbecomputedusingpreviouslypublishedcrackingresultsreportedasadictionaryofsizecompromisedafractionofavailableaccounts,asplottedinFigure1b.Furthermore,thedifferencebetweenthemetricsisonlysignicantforhighervaluesof;for0:5thetwowillneverdifferbymorethan1bit(fromtheboundinTableII).IV.PRIVACY-PRESERVINGEXPERIMENTALSETUPByusingstatisticalguessingmetricstoevaluatepass-words,wearefreedfromtheneedtoaccesspasswordsintheiroriginalform.Usersmaybewillingtoprovidepasswordstoresearcherswithethicsoversight[4],[23]butthisapproachdoesnotscaleandthevalidityofthecollectedpasswordsisquestionable.Incontrast,leakeddatasetsprovideunquestionablyvaliddatabutthereareethicalquestionswithusingstolenpassworddataanditsavailabilityshouldn'tbereliedon[38].Thereisalsonocontrolover thesizeorcompositionofleakeddatasets.Thusfar,forexample,noleakedsourceshaveincludeddemographicdata.WeaddressedbothproblemswithanovelexperimentalsetupandexplicitcooperationfromYahoo!,whichmaintainsasinglepasswordsystemtoauthenticateusersforitsdiversesuiteofonlineservices.Ourexperimentaldatacollectionwasperformedbyaproxyserversituatedinfrontofliveloginservers.Thisisrequiredaslong-termpasswordstorageshouldincludeaccount-specicsaltinganditeratedhashingwhichpreventconstructingahistogramofcommonchoices,justastheymitigatepre-computeddictionaryattacks[39].Ourproxyserverseesastreamofpairs(u;passwordu)foreachuseruloggingintoanyYahoo!service.OurgoalistoapproximatedistinctpassworddistributionsXfiforaseriesofdemographicpredicatesfi.Eachpredicate,suchasdoesthisuserhaveawebmailaccount?,willtypicallyrequireadatabasequerybasedonu.Asimplisticsolutionwouldbefortheproxytoemitastreamoftuples(H(passwordu);f1(u);f2(u);:::),removinguseridentiersutopreventtrivialaccesstorealaccountsandusingacryp-tographichashfunctionHtomaskthevaluesofindividualpasswords.8Therearetwomajorproblemstoaddress:A.PreventingpasswordcrackingIfauserucanbere-identiedbytheuniquenessofhisorherdemographicpredicates[40],thenthevalueH(passwordu)couldbeusedasanoracletoperformanofinedictionaryattack.Suchare-identicationattackwasdemonstratedonadatasetofmoviereviewssuperciallyanonymizedforresearchpurposes[41]andwouldalmostcertainlybepossibleformostusersgiventhenumberanddetailofpredicateswewouldliketostudy.Thisriskcanbeeffectivelymitigatedbyprependingthesamecryptographicallyrandomnoncertoeachpasswordpriortohashing.Theproxyservermustgenerateratthebeginningofthestudyanddestroyitpriortomakingdataavailabletoresearchers.Bychoosingrsufcientlylongtopreventbrute-force(128bitsisaconservativechoice)andensuringitisdestroyed,H(rjjpasswordu)isuselessforanattackerattemptingtorecoverpasswordubutthedistributionofhashvalueswillremainexactlyisomorphictotheunderlyingdistributionofpasswordsseen.B.Preventingcross-accountcompromiseWhileincludinganoncepreventsofinesearch,anat-tackerperforminglarge-scalere-identicationcanstilliden-tifysetsofuserswhichhaveapasswordincommon.Thisdecreasessecurityforallusersinagroupwhichshareapassword,asanattackermaythengainaccesstoallaccountsinthegroupbyrecoveringjustoneuser'spasswordbyauxiliarymeanssuchasphishing,malware,orcompromiseofanexternalwebsiteforwhichthepasswordwasre-used.8NotethatHcannotincorporateanyuser-specicsaltdoingsowouldoccludethefrequencyofrepeatedpasswords. Figure3.ChangingestimatesofguessingmetricswithincreasingsamplesizeM.EstimatesforH1and~10convergeveryquickly;estimatesfor~0:25convergearoundM=222(marked)aspredictedinSectionV-A.EstimatesforH0,H1,and~Garenotclosetoconverging.Solvingthisproblemrequirespreventingre-identicationbynotemittingvectorsofpredicatesforeachuser.Instead,theproxyservermaintainsahistogramHiofobservedhashvaluesforeachpredicatefi.Foreachpair(u;passwordu)observed,theproxyserveraddsH(rjjpasswordu)toeachhistogramHiforwhichfi(u)istrue.AnadditionallistisstoredofallpreviouslyseenhashedusernamesH(rjju)topreventdouble-countingusers.C.DeploymentdetailsThecollectioncode,consistingofafewdozenslinesofPerl,wasauditedandrgeneratedusingaseedprovidedbyaYahoo!managerandmachine-generatedentropy.TheexperimentwasapprovedbyYahoo!'slegalteamaswellastheresponsibleethicscommitteeattheUniversityofCambridge.WedeployedourexperimentonarandomsubsetofYahoo!serversfora48hourperiodfromMay2325,2011,observing69,301,337uniqueusersandconstructingseparatehistogramsfor328differentpredicatefunctions.Ofthese,manydidnotachieveasufcientsamplesizetobeusefulandwerediscarded.V.EFFECTSOFSAMPLESIZEInourmathematicaltreatmentofguessingdifculty,weassumedcompleteinformationisavailableabouttheunder-lyingprobabilitydistributionofpasswordsX.Inpractice,wewillneedtoapproximateXwithempiricaldata.9WeassumethatwehaveMindependentsamplesX1;:::;XMR XandwewishtocalculatepropertiesofX.Thesimplestapproachistocomputemetricsusingthedistributionofsamplesdirectly,whichwedenote^X.10As9Itpossiblethatanattackerknowstheprecisedistributionofpasswordsinagivendatabase,buttypicallyinthiscasesheorhewouldalsoknowper-userpasswordsandwouldnotbeguessingstatistically.10Weusethehatsymbol^foranymetricestimatedfromsampleddata. showninFigure3,thisapproachproducessubstantialandsystematicunder-estimatesofmostmetrics,mostpromi-nently^H0=lg^NwhichincreasesnearlycontinuouslywithincreasingsamplesizeMindicatingthatnewpasswordsarestillbeingseenoftenevenatourmassivesamplesize.Themaximum-likelihoodestimationofthegrowthrated^N dMhasbeenshowntobeexactlyV(1;M) M,theproportionofpasswordsinthesampleobservedonlyonce[42].11ThiscanbeseenbecauseininexactlyV(1;M) Mofallpossibleorderingsthatthesamplemayhavebeencollectedwillthelastobservationhavebeenanewitem.Forourfullsample,V(1;M) M=42:5%,indicatingthatalargersamplewouldcontinuetondmanynewpasswordsandhencelargerestimatesforH0,H1,G1etc.Similarly,forarandomsubsampleofourdata,manypasswordswillbemissedandestimatesofthesemetricswilldecrease.Interpretinghapaxlegomenaisafundamentalprobleminstatisticsandtherearetherearenoknownnon-parametrictechniquesforestimatingthetruedistributionsizeN[42].Thisisanotmerelyatheoreticalrestriction;inthecaseofpasswordsdeterminingthatapparentlypseudorandompass-wordsreallyare128-bitrandomstringswouldrequireanut-terlyintractablesamplesizemanytimesgreater2128.Good-Turingtechniques[43]aren'thelpfulforthedistribution-widestatisticsweareinterestedin;theycanonlyestimatethecumulativeprobabilityofallunobservedevents(themissingmass)andprovidedampedmaximum-likelihoodestimatesoftheprobabilityofindividualevents.Fortunately,inpracticewecanusefullyapproximateourguessingmetricsfromreasonably-sizedsamples;thoughtheseestimationsimplicitlyrelyonassumptionsabouttheunderlyingnatureofthepassworddistribution.AsseeninFigure3,partialguessingmetricswhichrelyonlyonthemore-frequentitemsinthedistributionaretheeasiesttoapproximate,whilethosewhichrelyonasummationovertheentiredistributionsuchasH0;H1and~;~Gforlargevaluesofwillbethemostdifcult.A.TheregionofstabilityWecanreliablyestimatepiforeventswithobservedfrequencyfi1duetothelawoflargenumbers.EstimatingH1requiresestimatingonlyp1,theprobabilityofthemostcommonpassword,whichwas1.08%inourdataset.Gaussianstatisticscanbeusedtoestimatethestandarderrorofthemaximum-likelihoodestimate^pi:error(^pi)=r pi(1pi) M1 pir fi M2M fi=1 p fiForourdataset,thisgivesastandarderrorofunder0.1bitin^H1forM214.Thisargumentextendsto^~forsmall11Eventsobservedonlyonceinasamplearecalledhapaxlegomenainlinguistics,Greekforsaidonlyonce. Figure4.EstimatedguessingcurveswithreducedsamplesizeM.Subsampleswerecomputedrandomlywithoutreplacement,tosimulatehavingstoppedthecollectionexperimentearlier.Afterthemaximumcondencepoint6;therearetwo(almostindistinguishable)dashedplotsrepresentingthe1stand99thpercentilesfrom1,000randomsamples.valuesofandinpracticewecanmeasureresistancetoonlineguessingwithrelativelymodestsamplesize.Reasoningabouttheerrorin^~and^~Gforvaluesofwhichrepresentrealisticbrute-forceattacksismoredifcult.Fortunately,weobservethatforourpassworddatasetthenumberofeventsV(f;M)whichoccurftimesinasampleofsizeMisveryconsistentforsmallfandprovidesareasonableestimateofthenumberofeventswithprobabilityf0:5 Mpf+0:5 Minourfulldataset.12Thisenablesausefulheuristicthat~and~Gwillbewellapproximatedwhenissmallenoughtoonlyrelyoneventsoccurringgreaterthansomesmallfrequencyf.Callingfthecumulativeestimatedprobabilityofalleventsoccurringatleastftimes,wetook1,000randomsamplesofourcorpuswithM=219andobservedthefollowingvaluesinthe1stand99thpercentiles:f 678 f 0.1620.1630.1530.1540.1450.146~f^~f 0.1570.1800.1250.1480.1030.127~Gf^~Gf 0.1550.1760.1230.1460.1010.126WeobservedverysimilarvaluesforlargervaluesofM.Thus,wewilluse^~;^~Gdirectlyfor6forrandomsubsamplesofourdata.TheutilityofthisheuristicisseeninFigure3,whereitaccuratelypredictsthepointatwhich~0:25stabilizes,andinFigure4,whereitmarksthepointbelowwhich~isinaccurateforvaryingM.12V(f;M)willalmostalwaysoverestimatethisvaluebecausemorelow-probabilityeventswillberandomlyover-representedthantheconverse. B.ParametricextensionofourapproximationsEstimating~and~Gforhigherrequiresdirectlyassumingamodelfortheunderlyingpassworddistribution.Passwordshavebeenconjecturedtofollowapower-lawdistribution13where:Pr[p(x)y]/y1a(12)Unfortunately,usingapower-lawdistributionisproblematicfortworeasons.First,estimatesforthescaleparameteraareknowntodecreasesignicantlywithsamplesize[42].Usingmaximum-likelihoodttingtechniques[44]forourobservedcountdatawegetthefollowingestimates:M69M10M1M100k ^a2.993.233.704.21Asecondproblemisthismodeltsourobserved,integercounts.Tocorrectlyestimate~fromsamples,weneedtomodelthepresenceofpasswordsforwhichpiM1.Powerlawdistributionsrequireassuminganon-zeromini-mumpasswordprobabilitya-priori[44],whichwehavenomeaningfulwayofdoing.Insteadweneedamodel (p)forthedistributionofpasswordprobabilities,anapproachtakenbylinguistsformodelingwordfrequencies[45].Wemodeltheprobabilityofobservingapasswordktimesusingamixture-model:rstwedrawapasswordprobabilityprandomlyaccordingtotheprobabilitydensityfunction (p),thenwedrawfromaPoissondistributionwithexpectationpMtomodelthenumberoftimesweobservethispassword:Pr[kobs.]=R10(pM)kepM k! (p)dp 1R10epM (p)dp(13)Thenumeratorintegratesthepossibilityofseeingapass-wordwithprobabilitypexactlyktimes,weightedbytheprobability (p)ofapasswordhavingprobabilityp.Thedenominatorcorrectsfortheprobabilityofnotobservingapasswordatall.Thisformulationallowsustotakeasetofcountsfromasampleff1;f2;:::gandndtheparametersfor (p)whichmaximizethelikelihoodofourobservations:Likelihood=^NYi=1Pr[fiobs.](14)Thismodelhasbeeneffectivelyappliedtowordfrequen-ciesusingthegeneralizedinverse-Gaussiandistribution:14 (pjb;c;g)=2g1pg1ep cb2c 4p (bc)gKg(b)(15)whereKgisthemodiedBesselfunctionofthesecondkind.13Power-lawdistributionsarealsocalledParetoorZipandistributions,whichcanallbeshowntobeequivalentformulations[42].14Thecombinedgeneralizedinverse-Gaussian-PoissonmodelwhichweadoptisalsocalledtheSicheldistributionafteritsinitialusebySichelin1975tomodelwordfrequencies[46]. Figure5.Extrapolatedestimatesfor~usingthegeneralizedinverseGaussian-Poissondistribution.Comparedtonaiveestimates(Figure4)theeffectofsamplesizearemitigated.Eachplotshowsthe99%condenceintervalfrom1,000randomsubsamples.Errorfromlackoftofthemodeldwarfserrorduetotherandomnessofeachsample.Thegeneralizedinverse-Gaussianisusefulbecauseitblendsbothpower-lawpg1andexponentialep cb2c 4pbehaviorandproducesawell-formedprobabilitydistribu-tion.BypluggingEquation15intoEquation13for andsolvingtheintegral,weobtain:Pr[kjb;c;g]=(1 2bcn p 1+cn)rKr+g(bp 1+cn) r!(1+cn)g 2Kg(b)Kg(bp 1+cn)(16)Thoughunwieldy,wecancomputeEquation14usingEquation15fordifferentparametersofb;c;g.Fortunately,forb0;c0;g0thereisonlyonemaximumofthisfunction[45],whichenablesapproximationofthemaximum-likelihoodtefcientlybygradientdescent.Wecanusethismodeltoproduceanextrapolateddistribu-tion,removingallobservedpasswordswithfi6toleavethewell-approximatedregionofthedistributionunchangedandaddingsyntheticpasswordsaccordingtoourestimatedmodel (p).Thisisachievedbydividingtheregion0;6 Mintodiscretebins,withincreasinglysmallbinsnearthevaluep+whichmaximizes (p+).Intoeachbin(pj;pj+1)weinsert^NRpj+1pj (p)dpeventsofobservedfrequencypj+pj+1 2M.Wethennormalizetheprobabilityofallsyntheticeventsbymultiplyingthecorrectionfactor1 f6R16 M (p)dptoleavetheheadofthedistributionintact.Figure5plotsthe1stand99thpercentileof~forextrapolationsofrandomsubsamplesofourdata.Weuse~becauseitisstrictlyless-wellapproximatedthan~G,whichisweightedslightlymoretowardswell-approximatedeventsinthedistribution.Somekeyvaluesare: dictionary Chinese German Greek English French Indonesian Italian Korean Portuguese Spanish Vietnamese global minimax target Chinese 4.4%1.9%2.7%2.4%1.7%2.0%2.0%2.9%1.8%1.7%2.0% 2.9%2.7% German 2.0%6.5%2.1%3.3%2.9%2.2%2.8%1.6%2.1%2.6%1.6% 3.5%3.4% Greek 9.3%7.7%13.4%8.4%7.4%8.1%8.0%8.0%7.7%7.8%7.7% 8.6%8.9% English 4.4%4.6%3.9%8.0%4.3%4.5%4.3%3.4%3.5%4.2%3.5% 7.9%7.7% French 2.7%4.0%2.9%4.2%10.0%2.9%3.2%2.2%3.1%3.4%2.1% 5.0%4.9% Indonesian 6.7%6.3%6.5%8.7%6.3%14.9%6.2%5.8%6.0%6.2%5.9% 9.3%9.6% Italian 4.0%6.0%4.6%6.3%5.3%4.6%14.6%3.3%5.7%6.8%3.2% 7.2%7.1% Korean 3.7%2.0%3.0%2.6%1.8%2.3%2.0%5.8%2.4%1.9%2.2% 2.8%3.0% Portuguese 3.9%3.9%4.0%4.3%3.8%3.9%4.4%3.5%11.1%5.8%2.9% 5.1%5.3% Spanish 3.6%5.0%4.0%5.6%4.6%4.1%6.1%3.1%6.3%12.1%2.9% 6.9%7.0% Vietnamese 7.0%5.7%6.2%7.7%5.8%6.3%5.7%6.0%5.8%5.5%14.3% 7.8%8.3% TableVLANGUAGEDEPENDENCYOFPASSWORDGUESSING.EACHCELLINDICATESTHESUCCESSRATEOFAGUESSINGATTACKWITH1000ATTEMPTSUSINGADICTIONARYOPTIMALFORUSERSREGISTEREDATYAHOO!WITHDIFFERENTPREFERREDLANGUAGES.mostsubsets.Thegreatestefciencylossforanysubsetwhenusingthegloballistisonly2.2,forPortugueselanguagepasswords.Wecanimprovethisslightlyfurtherbyconstructingaspecialdictionarytobeeffectiveagainstallsubsets.Wedothisbyrepeatedlychoosingthepasswordforwhichthelowestpopularityinanysubsetismaximalandcallittheminimaxdictionary,alsoseeninTableV.Thisdictionaryperformsverysimilarlytotheglobaldictionary,reducingthemaximumefciencylosstoafactor2.1,alsoforPortugueselanguagepasswords.Diggingintoourdatawendglobalpasswordswhicharepopularacrossallsubgroupsweobserved.Thesinglemostpopularpasswordweobserved,forexample,occurredwithprobabilityatleast0.14%ineverysubpopulation.Someoverallpopularpasswordswereveryrareincertainsubpop-ulations.Forexample,thethirdmostcommonpassword,withoverallprobability0.1%,occurrednearly100timeslessfrequentlyinsomesubpopulations.However,therewereeightpasswordswhichoccurredwithprobabilityatleast0.01%ineverysubpopulation.Withoutaccesstotherawpasswords,wecanonlyspeculatethatthesearenumericpasswordsasthesearepopular21andinternationalizewell.Despitetheexistenceofgloballypopularpasswords,however,westillconcludethatdictionaryspecicitycanhavesurprisinglylargeresults.Forexample,thefollowingtableshowsefciencylossesofupto25%fromdictionariestailoredtopeoplefromdifferentEnglish-speakingcountries: dictionary global us uk ca au target us 8.2%6.6%7.4%7.2% 8.1% uk 5.4%6.9%5.5%5.6% 5.5% ca 8.8%7.9%9.9%8.7% 8.8% au 7.4%7.2%7.6%8.8% 7.5% 21WithintheRockYoudataset,123456wasthemostpopularpasswordand5othernumber-onlypasswordswereamongstthetopten.Weobservecomparableefciencylossesbasedonage: dictionary 1320 2134 3554 55+ global target 1320 8.4%7.8%7.1%6.5% 7.9% 2134 7.3%7.9%7.3%6.7% 7.8% 3554 5.4%5.8%6.4%6.1% 6.2% 55+ 5.4%5.8%6.8%7.3% 6.5% Weevenobserveefciencylossesbasedonserviceusage: dictionary retail chat media mail global target retail 7.0%5.6%6.6%5.6% 6.0% chat 6.9%8.4%7.8%8.3% 8.3% media 5.7%5.6%6.0%5.6% 5.8% mail 6.7%8.0%7.5%8.2% 8.1% VII.CONCLUDINGREMARKSByestablishingsoundmetricsandrigorouslyanalyzingthelargestpasswordcorpustodate,wehopetohavecontributedbothtoolsandnumbersoflastingsignicance.Asaruleofthumbforsecurityengineers,passwordsprovideroughlyequivalentsecurityto10-bitrandomstringsagainstanoptimalonlineattackertryingafewpopularguessesforlargelistofaccounts.Inotherwords,anattackerwhocanmanage10guessesperaccount,typicallywithintherealmofrate-limitingmechanisms,willcompromisearound1%ofaccounts,justastheywouldagainstrandom10-bitstrings.Againstanoptimalattackerperformingunrestrictedbruteforceandwantingtobreakhalfofallavailableaccounts,passwordsappeartoberoughlyequivalentto20-bitrandomstrings.Thismeansthatnopracticalamountofiteratedhashingcanpreventanadversaryfrombreakingalargenumberofaccountsgiventheopportunityforofine [20]W.E.Burr,D.F.Dodson,andW.T.Polk,ElectronicAuthenticationGuideline,NISTSpecialPublication800-63,2006.[21]D.FlorencioandC.Herley,Alarge-scalestudyofwebpasswordhabits,inWWW'07:Proceedingsofthe16thInternationalConferenceontheWorldWideWeb.ACM,2007,pp.657666.[22]R.Shay,S.Komanduri,P.G.Kelley,P.G.Leon,M.L.Mazurek,L.Bauer,N.Christin,andL.F.Cranor,Encoun-teringStrongerPasswordRequirements:UserAttitudesandBehaviors,inSOUPS'10:Proceedingsofthe6thSymposiumonUsablePrivacyandSecurity.ACM,2010.[23]P.G.Kelley,S.Komanduri,M.L.Mazurek,R.Shay,T.Vidas,L.Bauer,N.Christin,L.F.Cranor,andJ.Lopez,Guessagain(andagainandagain):Measuringpasswordstrengthbysimulatingpassword-crackingalgorithms,CarnegieMellonUniversity,Tech.Rep.CMU-CyLab-11-008,2011.[24]J.Yan,A.Blackwell,R.Anderson,andA.Grant,PasswordMemorabilityandSecurity:EmpiricalResults,IEEESecu-rityandPrivacyMagazine,vol.2,no.5,pp.2534,2004.[25]B.Stone-Gross,M.Cova,L.Cavallaro,B.Gilbert,M.Szyd-lowski,R.Kemmerer,C.Kruegel,andG.Vigna,Yourbotnetismybotnet:Analysisofabotnettakeover,inCCS'09:Proceedingsofthe16thACMConferenceonComputerandCommunicationsSecurity.ACM,2009,pp.635647.[26]C.E.Shannon,AMathematicalTheoryofCommunication,inBellSystemTechnicalJournal,vol.7,1948,pp.379423.[27]C.Cachin,Entropymeasuresandunconditionalsecurityincryptography,Ph.D.dissertation,ETHZ¨urich,1997.[28]J.O.Pliam,OntheIncomparabilityofEntropyandMarginalGuessworkinBrute-ForceAttacks,inProgressinCryptology-INDOCRYPT2000,2000.[29]S.Boztas,Entropies,Guessing,andCryptography,Depart-mentofMathematics,RoyalMelbourneInstituteofTechnol-ogy,Tech.Rep.6,1999.[30]J.L.Massey,GuessingandEntropy,inProceedingsofthe1994IEEEInternationalSymposiumonInformationTheory,1994,p.204.[31]A.R´enyi,Onmeasuresofinformationandentropy,Pro-ceedingsofthe4thBerkeleySymposiumonMathematics,StatisticsandProbability,pp.547561,1961.[32]R.V.Hartley,TransmissionofInformation,BellSystemTechnicalJournal,vol.7,no.3,pp.535563,1928.[33]J.Bonneau,S.Preibusch,andR.Anderson,Abirthdaypresenteveryelevenwallets?Thesecurityofcustomer-chosenbankingPINs,FC'12:The16thInternationalConferenceonFinancialCryptographyandDataSecurity,2012.[34]J.Bonneau,M.Just,andG.Matthews,What'sinaname?Evaluatingstatisticalattacksagainstpersonalknowledgequestions,FC'10:The14thInternationalConferenceonFinancialCryptographyandDataSecurity,2010.[35]S.BrostoffandA.Sasse,Tenstrikesandyou'reout:In-creasingthenumberofloginattemptscanimprovepasswordusability,inProceedingsofCHI2003WorkshoponHCIandSecuritySystems.JohnWiley,2003.[36]J.BonneauandS.Preibusch,Thepasswordthicket:technicalandmarketfailuresinhumanauthenticationontheweb,WEIS'10:Proceedingsofthe9thWorkshopontheEconomicsofInformationSecurity,2010.[37]M.Alsaleh,M.Mannan,andP.vanOorschot,RevisitingDefensesAgainstLarge-ScaleOnlinePasswordGuessingAttacks,IEEETransactionsonDependableandSecureCom-puting,vol.9,no.1,pp.128141,2012.[38]S.Egelman,J.Bonneau,S.Chiasson,D.Dittrich,andS.Schechter,ItsNotStealingIfYouNeedIt:Ontheethicsofperformingresearchusingpublicdataofillicitorigin(paneldiscussion),WECSR'12:The3rdWorkshoponEthicsinComputerSecurityResearch,2012.[39]B.Kaliski,RFC2898:PKCS#5:Password-BasedCryptog-raphySpecicationVersion2.0,IETF,2000.[40]D.E.DenningandP.J.Denning,Thetracker:athreattostatisticaldatabasesecurity,ACMTransactionsonDatabaseSystems,vol.4,pp.7696,1979.[41]A.NarayananandV.Shmatikov,HowToBreakAnonymityoftheNetixPrizeDataset,eprintarXiv:cs/0610105,2006.[42]H.R.Baayen,WordFrequencyDistributions,ser.Text,SpeechandLanguageTechnology.Springer,2001.[43]W.A.Gale,Good-Turingsmoothingwithouttears,JournalofQuantitativeLinguistics,vol.2,1995.[44]A.Clauset,C.R.Shalizi,andM.E.J.Newman,Power-LawDistributionsinEmpiricalData,SIAMRev.,vol.51,pp.661703,2009.[45]M.Font,X.Puig,andJ.Ginebra,ABayesiananalysisoffrequencycountdata,JournalofStatisticalComputationandSimulation,2011.[46]H.Sichel,Onadistributionlawforwordfrequencies,JournaloftheAmericanStatisticalAssociation,1975.[47]D.Davis,F.Monrose,andM.K.Reiter,OnUserChoiceinGraphicalPasswordSchemes,inProceedingsofthe13thUSENIXSecuritySymposium,2004.[48]S.Wiedenbeck,J.Waters,J.-C.Birget,A.Brodskiy,andN.Memon,PassPoints:designandlongitudinalevaluationofagraphicalpasswordsystem,InternationalJournalofHuman-ComputerStudies,vol.63,pp.102127,2005.[49]Y.Zhang,F.Monrose,andM.K.Reiter,Thesecurityofmodernpasswordexpiration:analgorithmicframeworkandempiricalanalysis,inCCS'10:Proceedingsofthe17thACMConferenceonComputerandCommunicationsSecurity.ACM,2010,pp.176186.[50]C.Herley,P.vanOorschot,andA.S.Patrick,Passwords:IfWe'reSoSmart,WhyAreWeStillUsingThem?FC'09:The13thInternationalConferenceonFinancialCryptographyandDataSecurity,2009.