191K - views

The science of guessing analyzing an anonymized corpus of million passwords Joseph Bonneau Computer Laboratory University of Cambridge jcbcl

camacuk Abstract We report on the largest corpus of userchosen passwords ever studied consisting of anonymized password histograms representing almost 70 million Yahoo users mit igating privacy concerns while enabling analysis of dozens of subpopulat

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "The science of guessing analyzing an ano..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

The science of guessing analyzing an anonymized corpus of million passwords Joseph Bonneau Computer Laboratory University of Cambridge jcbcl






Presentation on theme: "The science of guessing analyzing an anonymized corpus of million passwords Joseph Bonneau Computer Laboratory University of Cambridge jcbcl"— Presentation transcript:

(a)Historicalcrackingefciency,rawdictionarysize (b)Historicalcrackingefciency,equivalentdictionarysizeFigure1.ThesizeofcrackingdictionariesisplottedlogarithmicallyagainstthesuccessrateachievedinFigure1a.InFigure1b,thedictionarysizesareadjustedtoincorporatetheinherentneedformoreguessestocrackmorepasswords.Circlesandsolidlinesrepresentoperatingsystemuserpasswords,squaresanddashedlinesrepresentwebpasswords.II.HISTORICALEVALUATIONSOFPASSWORDSECURITYIthaslongbeenofinteresttoanalyzehowsecurepass-wordsareagainstguessingattacks,datingatleasttoMor-risandThompson'sseminal1979analysisof3,000pass-words[3].Theyperformedarudimentarydictionaryattackusingthesystemdictionaryandall6-characterstringsandrecovered84%ofavailablepasswords.Theyalsoreportedsomebasicstatisticssuchaspasswordlengths(71%were6charactersorfewer)andfrequencyofnon-alphanumericcharacters(14%ofpasswords).Thesetwoapproaches,pass-wordcrackingandsemanticevaluation,havebeenthebasisfordozensofstudiesinthethirtyyearssince.A.CrackingevaluationThefamous1988Morriswormpropagatedinpartbyguessingpasswordsusinga350-wordpassworddictionaryandseveralrulestomodifypasswords[9].ThepublicitysurroundingthewormmotivatedindependentstudiesbyKleinandSpaffordwhichre-visitedpasswordguessing[4],[5].Bothstudiesbroke22–24%ofpasswordsusingmoreso-phisticateddictionariessuchaslistsofnames,sportsteams,moviesandsoforth.Passwordcrackingevolvedrapidlyintheyearsafterthesestudies,withdedicatedsoftwaretoolslikeJohntheRipperemerginginthe1990swhichutilizemanglingrulestoturnasinglepasswordlike“john”intovariantslike“John”,“J0HN”,and“nhoj.”[10].Researchonmanglingruleshascontinuedtoevolve;thecurrentstateoftheartbyWeiretal.[11]automaticallygeneratesmanglingrulesfromalargetrainingsetofknownpasswords.Laterstudieshaveoftenutilizedthesetoolstoperformdictionaryattacksasasecondarygoal,suchasWu'sstudyofpasswordcrackingagainstKerberosticketsin1999[12]andKuoetal.'sstudyofmnemonicpasswordsin2006[13],whichrecovered8%and11%ofpasswords,respectively.Recently,large-scalepasswordleaksfromcompromisedwebsiteshaveprovidedanewsourceofdataforcrackingevaluations.Forexample,Schneieranalyzedabout50,000passwordsobtainedviaphishingfromMySpacein2006[6].Amorein-depthstudywasconductedbyDell'Amicoetal.,whostudiedtheMySpacepasswordsaswellasthoseoftwootherwebsitesusingalargevarietyofdifferentdictionaries[7].Averylargedatasetof32MpasswordsleakedfromRockYouin2009,whichWeiretal.studiedtoexaminetheeffectsofpassword-compositionrulesoncrackingefciency[8].Reportednumbersonpasswordcrackingefciencyvarysubstantiallybetweendifferentstudies,asshowninFig-ure1.Moststudieshavebroken20–50%ofaccountswithdictionarysizesintherangeof220–230.Allstudiesseediminishingreturnsforlargerdictionaries.ThisisclearinFigure1b,whichadjustsdictionarysizesbasedontheper-centageofpasswordscrackedsothatthedegreeofupwardslopereectsonlydecreasingefciency.ThisconceptwillmotivateourstatisticalguessingmetricsinSectionIII-E.Thereislittledataontheefciencyofsmalldictionariesasmoststudiesemploythelargestdictionarytheycanprocess.Klein'sstudy,whichattemptedtoidentifyhighlyefcientsub-dictionaries,isanotableexception[4].Thereisalsolittledataonthesizeofdictionaryrequiredtobreakalargemajorityofpasswords—onlyMorrisandThompsonbrokemorethan50%ofavailablepasswords1andtheirresultsmaybetoodatedtoapplytomodernpasswords.B.SemanticevaluationsInadditiontocrackingresearch,therehavebeenmanystudiesonthesemanticsofpasswordswithpsychologists1A2007studybyCazierandMedlinclaimedtobreak99%ofpasswordsatane-commercewebsite,butdetailsofthedictionaryweren'tgiven[14]. yearstudylength%digits%special 1989Riddleetal.[15]4.43.5—1992Spafford[5]6.831.714.81999Wu[12]7.525.74.11999ZviranandHaga[18]5.719.20.72006CazierandMedlin[14]7.435.01.32009RockYouleak[19]7.954.03.7 TableICOMMONLYESTIMATEDATTRIBUTESOFPASSWORDSandlinguistsbeinginterestedaswellascomputersecurityresearchers.Thisapproachcanbedifcultasiteitherre-quiresusersurveys,whichmayproduceunrealisticpasswordchoices,ordirectaccesstounhashedpasswords,whichcarriesprivacyconcerns.Riddleetal.performedlinguisticanalysisof6,226passwordsin1989,classifyingthemintocategoriessuchasnames,dictionarywords,orseeminglyrandomstrings[15].Cazieretal.repeatedthisprocessin2006andfoundthathard-to-classifypasswordswerealsothehardesttocrack[14].PasswordstructurewasformallymodeledbyWeiretal.[11]usingacontext-freegrammartomodeltheprob-abilityofdifferentconstructionsbeingchosen.Passwordcreationhasalsobeenmodeledasacharacter-by-characterMarkovprocess,rstbyNarayananandShmatikov[16]forpasswordcrackingandlaterbyCastellucciaetal.[17]totrainapro-activepasswordchecker.Thusmethodologyforanalyzingpasswordstructurehasvariedgreatly,butafewbasicdatapointslikeaveragelengthandtypesofcharactersusedaretypicallyreported,assummarizedinTableI.Theestimatesvarysowidelythatitisdifculttoinfermuchwhichisusefulinsystemsdesign.Themaintrendsareatendencytowards6-8charactersoflengthandastrongdislikeofnon-alphanumericcharactersinpasswords.2Manystudieshavealsoattemptedtodeterminethenumberofuserswhichappeartobechoosingrandompasswords,oratleastpasswordswithoutanyobviousmean-ingtoahumanexaminer.Methodologiesforestimatingthisvaryaswell,butmoststudiesputitinthe10–40%range.Elementsofpasswordstructure,suchlengthorthepres-enceofdigits,upper-case,ornon-alphanumericcharacterscanbeusedtoestimatethe“strength”ofapassword,oftenmeasuredinbitsandoftenreferredtoimpreciselyas“entropy”.3Thisusagewascementedbythe2006FIPSElectronicAuthenticationGuideline[20],whichprovideda“roughruleofthumb”forestimatingentropyfrompassword2Itisoftensuggestedthatusersavoidcharacterswhichrequiremultiplekeystotype,butthisdoesn'tseemtohavebeenformallyestablished.3Thisterminologyismathematicallyincorrectbecauseentropy(seeSectionsIII-AandIII-B)measuresacompleteprobabilitydistribution,notasingleevent(password).Thecorrectmetricforasingleeventisself-information(orsurprisal).Thisisperhapsdisfavoredbecauseitiscounter-intuitive:passwordsshouldavoidincludinginformationlikenamesoraddresses,sohigh-informationpasswordssoundweak.characteristicssuchaslengthandtypeofcharactersused.Thisstandardhasbeenusedinseveralpasswordstudieswithtoofewsamplestocomputestatisticsontheentiredistribution[21]–[23].Moresystematicformulashavebeenproposed,suchasonebyShayetal.[22]whichaddsentropyfromdifferentelementsofapassword'sstructure.C.ProblemswithpreviousapproachesThreedecadesofworkonpasswordguessinghaspro-ducedsophisticatedcrackingtoolsandmanydisparatedatapoints,butanumberofmethodologicalproblemscontinuetolimitscienticunderstandingofpasswordsecurity:1)Comparability:Authorsrarelyreportcrackingresultsinaformatwhichisstraightforwardtocomparewithpre-viousbenchmarks.Toourknowledge,Figure1istherstcomparisonofdifferentdatapointsofdictionarysizeandsuccessrate,thoughdirectcomparisonisdifcultsinceauthorsallreportefciencyratesfordifferentdictionarysizes.Passwordcrackingtoolsonlylooselyattempttoguesspasswordsindecreasingorderoflikeliness,introducingim-precisionintoreporteddictionarysizes.Worse,somestudiesreporttherunningtimeofcrackingsoftwareinsteadofdictionarysize[14],[24],[25],makingcomparisondifcult.2)Repeatability:Preciselyreproducingpasswordcrack-ingresultsisdifcult.JohntheRipper[10],usedinmostpublicationsofthepastdecade,hasbeenreleasedin21dif-ferentversionssince2001andmakesavailable20separatewordlistsforuse(alongwithmanyproprietaryones),inadditiontomanycongurationoptions.Otherstudieshaveusedproprietarypassword-crackingsoftwarewhichisn'tavailabletotheresearchcommunity[6],[14].Thusnearlyallstudiesusedictionariesvaryingincontentandordering,makingitdifculttoexactlyre-createapublishedattacktocompareitseffectivenessagainstanewdataset.3)Evaluatordependency:Password-crackingresultsareinherentlydependentontheappropriatenessofthedictionaryandmanglingrulestothedatasetunderstudy.Dell'Amicoetal.[7]demonstratedthisproblembyapplyinglanguage-specicdictionariestodatasetsofpasswordsindifferentlanguagesandseeingefciencyvaryby2–3ordersofmagnitude.TheyalsoevaluatedthesamedatasetasSchneierthreeyearsearlier[6]andachievedtwoordersofmagnitudebetterefciencysimplybychoosingabetterwordlist.Thusitisdifculttoseparatetheeffectsofmore-carefullychosenpasswordsfromtheuseofalessappropriatedictionary.Thisisparticularlychallengingindata-slicingexperiments[8],[23]whichrequiresimulatinganequallygooddictionaryattackagainsteachsubpopulation.4)Unsoundness:Estimatingtheentropyofapassworddistributionfromstructuralcharacteristicsismathematicallydubious,aswewilldemonstrateinSectionIII-D,andin-herentlyrequiresmakingmanyassumptionsaboutpasswordselection.Inpractice,entropyestimateshaveperformedpoorlyaspredictorsofempiricalcrackingdifculty[8],[23]. III.MATHEMATICALMETRICSOFGUESSINGDIFFICULTYDuetotheproblemsinherenttopasswordcrackingsimu-lationsorsemanticevaluation,weadvocatesecuritymetricsthatrelyonlyonthestatisticaldistributionofpasswords.Whilethisapproachrequireslargedatasets,iteliminatesbiasfrompassword-crackingsoftwarebyalwaysmodelingabest-caseattacker,allowingustoassessandcomparetheinherentsecurityofagivendistribution.Mathematicalnotation:Wedenoteaprobabilitydistribu-tionwithacalligraphicletter,suchasX.Weuselower-casextorefertoaspeciceventinthedistribution(anindividualpassword).Theprobabilityofxisdenotedpx.Formally,adistributionisasetofeventsx2X,eachwithanassociatedprobability0px1,suchthatPpx=1.WeuseNtodenotethetotalnumberofpossibleeventsinX.Weoftenrefertoeventsbytheirindexi,thatis,theirrankbyprobabilityinthedistributionwiththemostprobablehavingindex1andtheleastprobablehavingindexN.Werefertotheithmostcommoneventasxiandcallitsprob-abilitypi.Thus,theprobabilitiesoftheeventsinXformamonotonicallydecreasingsequencep1p2:::pN.WedenoteanunknownvariableasX,denotingXR XifitisdrawnatrandomfromX.Guessingmodel:Wemodelpasswordselectionasaran-domdrawXR XfromanunderlyingpassworddistributionX.ThoughXwillvarydependingonthepopulationofusers,weassumethatXiscompletelyknowntotheattacker.Givena(possiblysingleton)setofunknownpasswordsfX1;X2;:::Xkg,wewishtoevaluatetheefciencyofanattackertryingtoidentifytheunknownpasswordsXigivenaccesstoanoracleforqueriesoftheform“isXi=x?”A.ShannonentropyIntuitively,wemayrstthinkoftheShannonentropy:H1(X)=NXi=1�pilgpi(1)asameasureofthe“uncertainty”ofXtoanattacker.IntroducedbyShannonin1948[26],entropyappearstohavebeenportedfromcryptographicliteratureintostudiesofpasswordsbeforebeingusedinFIPSguidelines[20].IthasbeendemonstratedthatH1ismathematicallyinap-propriateasameasureguessingdifculty[27]–[30].Itinfactquantiestheaveragenumberofsubsetmembershipqueriesoftheform“IsX2S?”forarbitrarysubsetsSXneededtoidentifyX.4Foranattackerwhomustguessindividualpasswords,Shannonentropyhasnodirectcorrelationtoguessingdifculty.54TheproofofthisisastraightforwardconsequenceofShannon'ssourcecodingtheorem[26].SymbolsXR XcanbeencodedusingaHuffmancodewithaveragebitlengthH1(X)+1,ofwhichtheadversarycanlearnonebitatatimewithsubsetmembershipqueries.5H1hasfurtherbeenclaimedtocorrelatepoorlywithpasswordcrackingdifculty[8],[23],thoughtheestimatesofH1usedcannotbereliedupon.B.R´enyientropyanditsvariantsR´enyientropyHnisageneralizationofShannonen-tropy[31]parametrizedbyarealnumbern0:6Hn(X)=1 1�nlg NXi=1pni!(2)Inthelimitasn!1,R´enyientropyconvergestoShannonentropy,whichexplainswhyShannonentropyisdenotedH1.NotethatHnisamonotonicallydecreasingfunctionofn.Wearemostinterestedintwospecialcases:1)HartleyentropyH0:Forn=0,R´enyientropyis:H0=lgN(3)IntroducedpriortoShannonentropy[32],H0measuresonlythesizeofadistributionandignorestheprobabilities.2)Min-entropyH1:Asn!1,R´enyientropyis:H1=�lgp1(4)Thismetricisonlyinuencedbytheprobabilityofthemostlikelysymbolinthedistribution,hencethename.Thisisausefulworst-casesecuritymetricforhuman-chosendistributions,demonstratingsecurityagainstanattackerwhoonlyguessesthemostlikelypasswordbeforegivingup.H1isalowerboundforallotherR´enyientropiesandindeedallofthemetricswewilldene.C.GuessworkAmoreapplicablemetricistheexpectednumberofguessesrequiredtondXiftheattackerproceedsinoptimalorder,knownasguessworkorguessingentropy[27],[30]:G(X)=Eh#guesses(XR X)i=NXi=1pii(5)BecauseGincludesallprobabilitiesinX,itmodelsanattackerwhowillexhaustivelyguessevenexceedinglyun-likelyeventswhichcanproduceabsurdresults.Forexample,intheRockYoudatasetovertwentyusers(morethan1in221)appeartouse128-bitpseudorandomhexadecimalstringsaspasswords.ThesepasswordsaloneensurethatG(RockYou)2106.ThusGprovideslittleinsightintopracticalattacksandfurthermoreisdifculttoestimatefromsampleddata(seeSectionV).D.PartialguessingmetricsGuessworkandentropymetricsfailtomodelthetendencyofreal-worldattackerstoceaseguessingagainstthemostdifcultaccounts.AsdiscussedinSectionII,crackingevaluationstypicallyreportthefractionofaccountsbrokenbyagivenattackandexplicitlylookforweaksubspacesofpasswordstoattack.Havingmanyaccountstoattackisan6R´enyientropyistraditionallydenotedH ;weuseHntoavoidconfusionwithourprimaryuseof asadesiredsuccessrate. (a) ;G (numberofguesses) (b) ;G (effectivekey-length)Figure2.Twowaysofcomparingtheguessingdifcultyofuser-chosen4-digitPINs[33]againstuniformdistributionsofsize10,000and1,000(U104andU103,respectively).Fig.2aplotsthedictionarysize neededtohaveachanceofsuccess aswellastheexpectednumberofguessesperaccountG .Fig.2bconvertsbothmetricsintoaneffectivekey-length,enablingvisualcomparisonacrosstheentirerangeof .Traditionalsingle-pointmetricsH0;H1;H2;H1and~Garealsomarkedforcomparison.Notethat~ and~G arehorizontallinesforuniformdistributions;anattackergainsnoefciencyadvantagefromloweringhisdesiredsuccessrate .mH1(X)and~ (X)+m~G1(X)foranyseparationparameterm.Furthermore,forany 1 2adistributionXcanbefoundwith~ 1(X)+m~ 2(X)foranym.Theseresultseasilyextendto~G usingtheboundslistedinTableIIandrelatedresultscanbeprovedfor~ (X). equivalences 8nHn(UN)=lgNallmetricsequalforU8 ~ (UN)=lgNallmetricsequalforU8 ~ (UN)=lgNallmetricsequalforU8 ~G (UN)=lgNallmetricsequalforUH0=~1=~N=lgNmetricsdependingonlyonNH1=~ p1=~1=�lgp1metricsdependingonlyonp1 bounds H1~G ;~ ;~ H1isabs.lowerbound~G ;~ ;~ H0H0isabs.upperbound~G ~ straightforwardproof~G �~ lg(1� )straightforwardproof monotonicity H1:::H1H0Hndecreasingwithn~ ~ +~ increasingwith ~ ~ +~ increasingwith ~G ~G +~G increasingwith TableIIRELATIONSBETWEENGUESSINGMETRICSG.ApplicationinpracticalsecurityevaluationForanonlineattackerwecanuse~ with equaltotheguessinglimitsimposedbythesystem.Thereisnostandardfor ,with10guessesrecommendedbyusabilitystud-ies[35],3byFIPSguidelines[20],andavarietyofvalues(often1)seeninpractice[36].Sophisticatedrate-limitingschemesmayallowaprobabilisticnumberofguesses[37].Weconsider~10areasonablebenchmarkforresistancetoonlineguessing,though~1=H1isaconservativechoiceasalowerboundforallmetricsproposed.TheseparationresultsofSectionIII-Fmeanthatforbrute-forceattackswecan'trelyonanysinglevalueof ;eachvalueprovidesinformationaboutafundamentallydifferentattackscenario.Foracompletepicture,wecanconsider~ or~G acrossallvaluesof .Wecanplotthisastheguessingcurveforadistribution,asseeninFigure2.Forofineattacks,whereanadversaryislimitedonlybytimeandcomputingpower,wemightconsider~ or~G forastandardvaluesuchas0.5asabenchmark(~0:5wasorig-inallysuggestedby[29]).While~G moredirectlymeasurestheefciencyofaguessingattack,~ canbeadvantageousinpracticebecauseitissimplertocompute.Inparticular,itcanbecomputedusingpreviouslypublishedcrackingresultsreportedas“adictionaryofsizecompromisedafraction ofavailableaccounts,”asplottedinFigure1b.Furthermore,thedifferencebetweenthemetricsisonlysignicantforhighervaluesof ;for 0:5thetwowillneverdifferbymorethan1bit(fromtheboundinTableII).IV.PRIVACY-PRESERVINGEXPERIMENTALSETUPByusingstatisticalguessingmetricstoevaluatepass-words,wearefreedfromtheneedtoaccesspasswordsintheiroriginalform.Usersmaybewillingtoprovidepasswordstoresearcherswithethicsoversight[4],[23]butthisapproachdoesnotscaleandthevalidityofthecollectedpasswordsisquestionable.Incontrast,leakeddatasetsprovideunquestionablyvaliddatabutthereareethicalquestionswithusingstolenpassworddataanditsavailabilityshouldn'tbereliedon[38].Thereisalsonocontrolover thesizeorcompositionofleakeddatasets.Thusfar,forexample,noleakedsourceshaveincludeddemographicdata.WeaddressedbothproblemswithanovelexperimentalsetupandexplicitcooperationfromYahoo!,whichmaintainsasinglepasswordsystemtoauthenticateusersforitsdiversesuiteofonlineservices.Ourexperimentaldatacollectionwasperformedbyaproxyserversituatedinfrontofliveloginservers.Thisisrequiredaslong-termpasswordstorageshouldincludeaccount-specicsaltinganditeratedhashingwhichpreventconstructingahistogramofcommonchoices,justastheymitigatepre-computeddictionaryattacks[39].Ourproxyserverseesastreamofpairs(u;passwordu)foreachuseruloggingintoanyYahoo!service.OurgoalistoapproximatedistinctpassworddistributionsXfiforaseriesofdemographicpredicatesfi.Eachpredicate,suchas“doesthisuserhaveawebmailaccount?”,willtypicallyrequireadatabasequerybasedonu.Asimplisticsolutionwouldbefortheproxytoemitastreamoftuples(H(passwordu);f1(u);f2(u);:::),removinguseridentiersutopreventtrivialaccesstorealaccountsandusingacryp-tographichashfunctionHtomaskthevaluesofindividualpasswords.8Therearetwomajorproblemstoaddress:A.PreventingpasswordcrackingIfauserucanbere-identiedbytheuniquenessofhisorherdemographicpredicates[40],thenthevalueH(passwordu)couldbeusedasanoracletoperformanofinedictionaryattack.Suchare-identicationattackwasdemonstratedonadatasetofmoviereviewssuperciallyanonymizedforresearchpurposes[41]andwouldalmostcertainlybepossibleformostusersgiventhenumberanddetailofpredicateswewouldliketostudy.Thisriskcanbeeffectivelymitigatedbyprependingthesamecryptographicallyrandomnoncertoeachpasswordpriortohashing.Theproxyservermustgenerateratthebeginningofthestudyanddestroyitpriortomakingdataavailabletoresearchers.Bychoosingrsufcientlylongtopreventbrute-force(128bitsisaconservativechoice)andensuringitisdestroyed,H(rjjpasswordu)isuselessforanattackerattemptingtorecoverpasswordubutthedistributionofhashvalueswillremainexactlyisomorphictotheunderlyingdistributionofpasswordsseen.B.Preventingcross-accountcompromiseWhileincludinganoncepreventsofinesearch,anat-tackerperforminglarge-scalere-identicationcanstilliden-tifysetsofuserswhichhaveapasswordincommon.Thisdecreasessecurityforallusersinagroupwhichshareapassword,asanattackermaythengainaccesstoallaccountsinthegroupbyrecoveringjustoneuser'spasswordbyauxiliarymeanssuchasphishing,malware,orcompromiseofanexternalwebsiteforwhichthepasswordwasre-used.8NotethatHcannotincorporateanyuser-specicsalt—doingsowouldoccludethefrequencyofrepeatedpasswords. Figure3.ChangingestimatesofguessingmetricswithincreasingsamplesizeM.EstimatesforH1and~10convergeveryquickly;estimatesfor~0:25convergearoundM=222(marked)aspredictedinSectionV-A.EstimatesforH0,H1,and~Garenotclosetoconverging.Solvingthisproblemrequirespreventingre-identicationbynotemittingvectorsofpredicatesforeachuser.Instead,theproxyservermaintainsahistogramHiofobservedhashvaluesforeachpredicatefi.Foreachpair(u;passwordu)observed,theproxyserveraddsH(rjjpasswordu)toeachhistogramHiforwhichfi(u)istrue.AnadditionallistisstoredofallpreviouslyseenhashedusernamesH(rjju)topreventdouble-countingusers.C.DeploymentdetailsThecollectioncode,consistingofafewdozenslinesofPerl,wasauditedandrgeneratedusingaseedprovidedbyaYahoo!managerandmachine-generatedentropy.TheexperimentwasapprovedbyYahoo!'slegalteamaswellastheresponsibleethicscommitteeattheUniversityofCambridge.WedeployedourexperimentonarandomsubsetofYahoo!serversfora48hourperiodfromMay23–25,2011,observing69,301,337uniqueusersandconstructingseparatehistogramsfor328differentpredicatefunctions.Ofthese,manydidnotachieveasufcientsamplesizetobeusefulandwerediscarded.V.EFFECTSOFSAMPLESIZEInourmathematicaltreatmentofguessingdifculty,weassumedcompleteinformationisavailableabouttheunder-lyingprobabilitydistributionofpasswordsX.Inpractice,wewillneedtoapproximateXwithempiricaldata.9WeassumethatwehaveMindependentsamplesX1;:::;XMR XandwewishtocalculatepropertiesofX.Thesimplestapproachistocomputemetricsusingthedistributionofsamplesdirectly,whichwedenote^X.10As9Itpossiblethatanattackerknowstheprecisedistributionofpasswordsinagivendatabase,buttypicallyinthiscasesheorhewouldalsoknowper-userpasswordsandwouldnotbeguessingstatistically.10Weusethehatsymbol^foranymetricestimatedfromsampleddata. showninFigure3,thisapproachproducessubstantialandsystematicunder-estimatesofmostmetrics,mostpromi-nently^H0=lg^NwhichincreasesnearlycontinuouslywithincreasingsamplesizeMindicatingthatnewpasswordsarestillbeingseenoftenevenatourmassivesamplesize.Themaximum-likelihoodestimationofthegrowthrated^N dMhasbeenshowntobeexactlyV(1;M) M,theproportionofpasswordsinthesampleobservedonlyonce[42].11ThiscanbeseenbecauseininexactlyV(1;M) Mofallpossibleorderingsthatthesamplemayhavebeencollectedwillthelastobservationhavebeenanewitem.Forourfullsample,V(1;M) M=42:5%,indicatingthatalargersamplewouldcontinuetondmanynewpasswordsandhencelargerestimatesforH0,H1,G1etc.Similarly,forarandomsubsampleofourdata,manypasswordswillbemissedandestimatesofthesemetricswilldecrease.Interpretinghapaxlegomenaisafundamentalprobleminstatisticsandtherearetherearenoknownnon-parametrictechniquesforestimatingthetruedistributionsizeN[42].Thisisanotmerelyatheoreticalrestriction;inthecaseofpasswordsdeterminingthatapparentlypseudorandompass-wordsreallyare128-bitrandomstringswouldrequireanut-terlyintractablesamplesizemanytimesgreater2128.Good-Turingtechniques[43]aren'thelpfulforthedistribution-widestatisticsweareinterestedin;theycanonlyestimatethecumulativeprobabilityofallunobservedevents(the“missingmass”)andprovidedampedmaximum-likelihoodestimatesoftheprobabilityofindividualevents.Fortunately,inpracticewecanusefullyapproximateourguessingmetricsfromreasonably-sizedsamples;thoughtheseestimationsimplicitlyrelyonassumptionsabouttheunderlyingnatureofthepassworddistribution.AsseeninFigure3,partialguessingmetricswhichrelyonlyonthemore-frequentitemsinthedistributionaretheeasiesttoapproximate,whilethosewhichrelyonasummationovertheentiredistributionsuchasH0;H1and~ ;~G forlargevaluesof willbethemostdifcult.A.TheregionofstabilityWecanreliablyestimatepiforeventswithobservedfrequencyfi1duetothelawoflargenumbers.EstimatingH1requiresestimatingonlyp1,theprobabilityofthemostcommonpassword,whichwas1.08%inourdataset.Gaussianstatisticscanbeusedtoestimatethestandarderrorofthemaximum-likelihoodestimate^pi:error(^pi)=r pi(1�pi) M1 pir fi M2M fi=1 p fiForourdataset,thisgivesastandarderrorofunder0.1bitin^H1forM214.Thisargumentextendsto^~ forsmall11Eventsobservedonlyonceinasamplearecalledhapaxlegomenainlinguistics,Greekfor“saidonlyonce.” Figure4.EstimatedguessingcurveswithreducedsamplesizeM.Subsampleswerecomputedrandomlywithoutreplacement,tosimulatehavingstoppedthecollectionexperimentearlier.Afterthemaximumcondencepoint 6;therearetwo(almostindistinguishable)dashedplotsrepresentingthe1stand99thpercentilesfrom1,000randomsamples.valuesof andinpracticewecanmeasureresistancetoonlineguessingwithrelativelymodestsamplesize.Reasoningabouttheerrorin^~ and^~G forvaluesof whichrepresentrealisticbrute-forceattacksismoredifcult.Fortunately,weobservethatforourpassworddatasetthenumberofeventsV(f;M)whichoccurftimesinasampleofsizeMisveryconsistentforsmallfandprovidesareasonableestimateofthenumberofeventswithprobabilityf�0:5 Mpf+0:5 Minourfulldataset.12Thisenablesausefulheuristicthat~ and~G willbewellapproximatedwhen issmallenoughtoonlyrelyoneventsoccurringgreaterthansomesmallfrequencyf.Calling fthecumulativeestimatedprobabilityofalleventsoccurringatleastftimes,wetook1,000randomsamplesofourcorpuswithM=219andobservedthefollowingvaluesinthe1stand99thpercentiles:f 678 f 0.162–0.1630.153–0.1540.145–0.146~ f�^~ f 0.157–0.1800.125–0.1480.103–0.127~G f�^~G f 0.155–0.1760.123–0.1460.101–0.126WeobservedverysimilarvaluesforlargervaluesofM.Thus,wewilluse^~ ;^~G directlyfor  6forrandomsubsamplesofourdata.TheutilityofthisheuristicisseeninFigure3,whereitaccuratelypredictsthepointatwhich~0:25stabilizes,andinFigure4,whereitmarksthepointbelowwhich~ isinaccurateforvaryingM.12V(f;M)willalmostalwaysoverestimatethisvaluebecausemorelow-probabilityeventswillberandomlyover-representedthantheconverse. B.ParametricextensionofourapproximationsEstimating~ and~G forhigher requiresdirectlyassumingamodelfortheunderlyingpassworddistribution.Passwordshavebeenconjecturedtofollowapower-lawdistribution13where:Pr[p(x)�y]/y1�a(12)Unfortunately,usingapower-lawdistributionisproblematicfortworeasons.First,estimatesforthescaleparameteraareknowntodecreasesignicantlywithsamplesize[42].Usingmaximum-likelihoodttingtechniques[44]forourobservedcountdatawegetthefollowingestimates:M69M10M1M100k ^a2.993.233.704.21Asecondproblemisthismodeltsourobserved,integercounts.Tocorrectlyestimate~ fromsamples,weneedtomodelthepresenceofpasswordsforwhichpiM1.Powerlawdistributionsrequireassuminganon-zeromini-mumpasswordprobabilitya-priori[44],whichwehavenomeaningfulwayofdoing.Insteadweneedamodel (p)forthedistributionofpasswordprobabilities,anapproachtakenbylinguistsformodelingwordfrequencies[45].Wemodeltheprobabilityofobservingapasswordktimesusingamixture-model:rstwedrawapasswordprobabilityprandomlyaccordingtotheprobabilitydensityfunction (p),thenwedrawfromaPoissondistributionwithexpectationpMtomodelthenumberoftimesweobservethispassword:Pr[kobs.]=R10(pM)ke�pM k! (p)dp 1�R10e�pM (p)dp(13)Thenumeratorintegratesthepossibilityofseeingapass-wordwithprobabilitypexactlyktimes,weightedbytheprobability (p)ofapasswordhavingprobabilityp.Thedenominatorcorrectsfortheprobabilityofnotobservingapasswordatall.Thisformulationallowsustotakeasetofcountsfromasampleff1;f2;:::gandndtheparametersfor (p)whichmaximizethelikelihoodofourobservations:Likelihood=^NYi=1Pr[fiobs.](14)Thismodelhasbeeneffectivelyappliedtowordfrequen-ciesusingthegeneralizedinverse-Gaussiandistribution:14 (pjb;c;g)=2g�1pg�1ep c�b2c 4p (bc)gKg(b)(15)whereKgisthemodiedBesselfunctionofthesecondkind.13Power-lawdistributionsarealsocalledParetoorZipandistributions,whichcanallbeshowntobeequivalentformulations[42].14Thecombinedgeneralizedinverse-Gaussian-PoissonmodelwhichweadoptisalsocalledtheSicheldistributionafteritsinitialusebySichelin1975tomodelwordfrequencies[46]. Figure5.Extrapolatedestimatesfor~ usingthegeneralizedinverseGaussian-Poissondistribution.Comparedtonaiveestimates(Figure4)theeffectofsamplesizearemitigated.Eachplotshowsthe99%condenceintervalfrom1,000randomsubsamples.Errorfromlackoftofthemodeldwarfserrorduetotherandomnessofeachsample.Thegeneralizedinverse-Gaussianisusefulbecauseitblendsbothpower-law�pg�1andexponentialep c�b2c 4pbehaviorandproducesawell-formedprobabilitydistribu-tion.BypluggingEquation15intoEquation13for andsolvingtheintegral,weobtain:Pr[kjb;c;g]=(1 2bcn p 1+cn)rKr+g(bp 1+cn) r!�(1+cn)g 2Kg(b)�Kg(bp 1+cn)(16)Thoughunwieldy,wecancomputeEquation14usingEquation15fordifferentparametersofb;c;g.Fortunately,forb�0;c�0;g0thereisonlyonemaximumofthisfunction[45],whichenablesapproximationofthemaximum-likelihoodtefcientlybygradientdescent.Wecanusethismodeltoproduceanextrapolateddistribu-tion,removingallobservedpasswordswithfi6toleavethewell-approximatedregionofthedistributionunchangedandaddingsyntheticpasswordsaccordingtoourestimatedmodel (p).Thisisachievedbydividingtheregion�0;6 Mintodiscretebins,withincreasinglysmallbinsnearthevaluep+whichmaximizes (p+).Intoeachbin(pj;pj+1)weinsert^NRpj+1pj (p)dpeventsofobservedfrequencypj+pj+1 2M.Wethennormalizetheprobabilityofallsyntheticeventsbymultiplyingthecorrectionfactor1 f6R16 M (p)dptoleavetheheadofthedistributionintact.Figure5plotsthe1stand99thpercentileof~ forextrapolationsofrandomsubsamplesofourdata.Weuse~ becauseitisstrictlyless-wellapproximatedthan~G ,whichisweightedslightlymoretowardswell-approximatedeventsinthedistribution.Somekeyvaluesare: dictionary Chinese German Greek English French Indonesian Italian Korean Portuguese Spanish Vietnamese global minimax target Chinese 4.4%1.9%2.7%2.4%1.7%2.0%2.0%2.9%1.8%1.7%2.0% 2.9%2.7% German 2.0%6.5%2.1%3.3%2.9%2.2%2.8%1.6%2.1%2.6%1.6% 3.5%3.4% Greek 9.3%7.7%13.4%8.4%7.4%8.1%8.0%8.0%7.7%7.8%7.7% 8.6%8.9% English 4.4%4.6%3.9%8.0%4.3%4.5%4.3%3.4%3.5%4.2%3.5% 7.9%7.7% French 2.7%4.0%2.9%4.2%10.0%2.9%3.2%2.2%3.1%3.4%2.1% 5.0%4.9% Indonesian 6.7%6.3%6.5%8.7%6.3%14.9%6.2%5.8%6.0%6.2%5.9% 9.3%9.6% Italian 4.0%6.0%4.6%6.3%5.3%4.6%14.6%3.3%5.7%6.8%3.2% 7.2%7.1% Korean 3.7%2.0%3.0%2.6%1.8%2.3%2.0%5.8%2.4%1.9%2.2% 2.8%3.0% Portuguese 3.9%3.9%4.0%4.3%3.8%3.9%4.4%3.5%11.1%5.8%2.9% 5.1%5.3% Spanish 3.6%5.0%4.0%5.6%4.6%4.1%6.1%3.1%6.3%12.1%2.9% 6.9%7.0% Vietnamese 7.0%5.7%6.2%7.7%5.8%6.3%5.7%6.0%5.8%5.5%14.3% 7.8%8.3% TableVLANGUAGEDEPENDENCYOFPASSWORDGUESSING.EACHCELLINDICATESTHESUCCESSRATEOFAGUESSINGATTACKWITH1000ATTEMPTSUSINGADICTIONARYOPTIMALFORUSERSREGISTEREDATYAHOO!WITHDIFFERENTPREFERREDLANGUAGES.mostsubsets.Thegreatestefciencylossforanysubsetwhenusingthegloballistisonly2.2,forPortugueselanguagepasswords.Wecanimprovethisslightlyfurtherbyconstructingaspecialdictionarytobeeffectiveagainstallsubsets.Wedothisbyrepeatedlychoosingthepasswordforwhichthelowestpopularityinanysubsetismaximalandcallitthe“minimax”dictionary,alsoseeninTableV.Thisdictionaryperformsverysimilarlytotheglobaldictionary,reducingthemaximumefciencylosstoafactor2.1,alsoforPortugueselanguagepasswords.Diggingintoourdatawend“globalpasswords”whicharepopularacrossallsubgroupsweobserved.Thesinglemostpopularpasswordweobserved,forexample,occurredwithprobabilityatleast0.14%ineverysubpopulation.Someoverallpopularpasswordswereveryrareincertainsubpop-ulations.Forexample,thethirdmostcommonpassword,withoverallprobability0.1%,occurrednearly100timeslessfrequentlyinsomesubpopulations.However,therewereeightpasswordswhichoccurredwithprobabilityatleast0.01%ineverysubpopulation.Withoutaccesstotherawpasswords,wecanonlyspeculatethatthesearenumericpasswordsasthesearepopular21andinternationalizewell.Despitetheexistenceofgloballypopularpasswords,however,westillconcludethatdictionaryspecicitycanhavesurprisinglylargeresults.Forexample,thefollowingtableshowsefciencylossesofupto25%fromdictionariestailoredtopeoplefromdifferentEnglish-speakingcountries: dictionary global us uk ca au target us 8.2%6.6%7.4%7.2% 8.1% uk 5.4%6.9%5.5%5.6% 5.5% ca 8.8%7.9%9.9%8.7% 8.8% au 7.4%7.2%7.6%8.8% 7.5% 21WithintheRockYoudataset,123456wasthemostpopularpasswordand5othernumber-onlypasswordswereamongstthetopten.Weobservecomparableefciencylossesbasedonage: dictionary 13–20 21–34 35–54 55+ global target 13–20 8.4%7.8%7.1%6.5% 7.9% 21–34 7.3%7.9%7.3%6.7% 7.8% 35–54 5.4%5.8%6.4%6.1% 6.2% 55+ 5.4%5.8%6.8%7.3% 6.5% Weevenobserveefciencylossesbasedonserviceusage: dictionary retail chat media mail global target retail 7.0%5.6%6.6%5.6% 6.0% chat 6.9%8.4%7.8%8.3% 8.3% media 5.7%5.6%6.0%5.6% 5.8% mail 6.7%8.0%7.5%8.2% 8.1% VII.CONCLUDINGREMARKSByestablishingsoundmetricsandrigorouslyanalyzingthelargestpasswordcorpustodate,wehopetohavecontributedbothtoolsandnumbersoflastingsignicance.Asaruleofthumbforsecurityengineers,passwordsprovideroughlyequivalentsecurityto10-bitrandomstringsagainstanoptimalonlineattackertryingafewpopularguessesforlargelistofaccounts.Inotherwords,anattackerwhocanmanage10guessesperaccount,typicallywithintherealmofrate-limitingmechanisms,willcompromisearound1%ofaccounts,justastheywouldagainstrandom10-bitstrings.Againstanoptimalattackerperformingunrestrictedbruteforceandwantingtobreakhalfofallavailableaccounts,passwordsappeartoberoughlyequivalentto20-bitrandomstrings.Thismeansthatnopracticalamountofiteratedhashingcanpreventanadversaryfrombreakingalargenumberofaccountsgiventheopportunityforofine [20]W.E.Burr,D.F.Dodson,andW.T.Polk,“ElectronicAuthenticationGuideline,”NISTSpecialPublication800-63,2006.[21]D.FlorˆencioandC.Herley,“Alarge-scalestudyofwebpasswordhabits,”inWWW'07:Proceedingsofthe16thInternationalConferenceontheWorldWideWeb.ACM,2007,pp.657–666.[22]R.Shay,S.Komanduri,P.G.Kelley,P.G.Leon,M.L.Mazurek,L.Bauer,N.Christin,andL.F.Cranor,“Encoun-teringStrongerPasswordRequirements:UserAttitudesandBehaviors,”inSOUPS'10:Proceedingsofthe6thSymposiumonUsablePrivacyandSecurity.ACM,2010.[23]P.G.Kelley,S.Komanduri,M.L.Mazurek,R.Shay,T.Vidas,L.Bauer,N.Christin,L.F.Cranor,andJ.Lopez,“Guessagain(andagainandagain):Measuringpasswordstrengthbysimulatingpassword-crackingalgorithms,”CarnegieMellonUniversity,Tech.Rep.CMU-CyLab-11-008,2011.[24]J.Yan,A.Blackwell,R.Anderson,andA.Grant,“PasswordMemorabilityandSecurity:EmpiricalResults,”IEEESecu-rityandPrivacyMagazine,vol.2,no.5,pp.25–34,2004.[25]B.Stone-Gross,M.Cova,L.Cavallaro,B.Gilbert,M.Szyd-lowski,R.Kemmerer,C.Kruegel,andG.Vigna,“Yourbotnetismybotnet:Analysisofabotnettakeover,”inCCS'09:Proceedingsofthe16thACMConferenceonComputerandCommunicationsSecurity.ACM,2009,pp.635–647.[26]C.E.Shannon,“AMathematicalTheoryofCommunication,”inBellSystemTechnicalJournal,vol.7,1948,pp.379–423.[27]C.Cachin,“Entropymeasuresandunconditionalsecurityincryptography,”Ph.D.dissertation,ETHZ¨urich,1997.[28]J.O.Pliam,“OntheIncomparabilityofEntropyandMarginalGuessworkinBrute-ForceAttacks,”inProgressinCryptology-INDOCRYPT2000,2000.[29]S.Boztas,“Entropies,Guessing,andCryptography,”Depart-mentofMathematics,RoyalMelbourneInstituteofTechnol-ogy,Tech.Rep.6,1999.[30]J.L.Massey,“GuessingandEntropy,”inProceedingsofthe1994IEEEInternationalSymposiumonInformationTheory,1994,p.204.[31]A.R´enyi,“Onmeasuresofinformationandentropy,”Pro-ceedingsofthe4thBerkeleySymposiumonMathematics,StatisticsandProbability,pp.547–561,1961.[32]R.V.Hartley,“TransmissionofInformation,”BellSystemTechnicalJournal,vol.7,no.3,pp.535–563,1928.[33]J.Bonneau,S.Preibusch,andR.Anderson,“Abirthdaypresenteveryelevenwallets?Thesecurityofcustomer-chosenbankingPINs,”FC'12:The16thInternationalConferenceonFinancialCryptographyandDataSecurity,2012.[34]J.Bonneau,M.Just,andG.Matthews,“What'sinaname?Evaluatingstatisticalattacksagainstpersonalknowledgequestions,”FC'10:The14thInternationalConferenceonFinancialCryptographyandDataSecurity,2010.[35]S.BrostoffandA.Sasse,““Tenstrikesandyou'reout”:In-creasingthenumberofloginattemptscanimprovepasswordusability,”inProceedingsofCHI2003WorkshoponHCIandSecuritySystems.JohnWiley,2003.[36]J.BonneauandS.Preibusch,“Thepasswordthicket:technicalandmarketfailuresinhumanauthenticationontheweb,”WEIS'10:Proceedingsofthe9thWorkshopontheEconomicsofInformationSecurity,2010.[37]M.Alsaleh,M.Mannan,andP.vanOorschot,“RevisitingDefensesAgainstLarge-ScaleOnlinePasswordGuessingAttacks,”IEEETransactionsonDependableandSecureCom-puting,vol.9,no.1,pp.128–141,2012.[38]S.Egelman,J.Bonneau,S.Chiasson,D.Dittrich,andS.Schechter,“ItsNotStealingIfYouNeedIt:Ontheethicsofperformingresearchusingpublicdataofillicitorigin(paneldiscussion),”WECSR'12:The3rdWorkshoponEthicsinComputerSecurityResearch,2012.[39]B.Kaliski,RFC2898:PKCS#5:Password-BasedCryptog-raphySpecicationVersion2.0,IETF,2000.[40]D.E.DenningandP.J.Denning,“Thetracker:athreattostatisticaldatabasesecurity,”ACMTransactionsonDatabaseSystems,vol.4,pp.76–96,1979.[41]A.NarayananandV.Shmatikov,“HowToBreakAnonymityoftheNetixPrizeDataset,”eprintarXiv:cs/0610105,2006.[42]H.R.Baayen,WordFrequencyDistributions,ser.Text,SpeechandLanguageTechnology.Springer,2001.[43]W.A.Gale,“Good-Turingsmoothingwithouttears,”JournalofQuantitativeLinguistics,vol.2,1995.[44]A.Clauset,C.R.Shalizi,andM.E.J.Newman,“Power-LawDistributionsinEmpiricalData,”SIAMRev.,vol.51,pp.661–703,2009.[45]M.Font,X.Puig,andJ.Ginebra,“ABayesiananalysisoffrequencycountdata,”JournalofStatisticalComputationandSimulation,2011.[46]H.Sichel,“Onadistributionlawforwordfrequencies,”JournaloftheAmericanStatisticalAssociation,1975.[47]D.Davis,F.Monrose,andM.K.Reiter,“OnUserChoiceinGraphicalPasswordSchemes,”inProceedingsofthe13thUSENIXSecuritySymposium,2004.[48]S.Wiedenbeck,J.Waters,J.-C.Birget,A.Brodskiy,andN.Memon,“PassPoints:designandlongitudinalevaluationofagraphicalpasswordsystem,”InternationalJournalofHuman-ComputerStudies,vol.63,pp.102–127,2005.[49]Y.Zhang,F.Monrose,andM.K.Reiter,“Thesecurityofmodernpasswordexpiration:analgorithmicframeworkandempiricalanalysis,”inCCS'10:Proceedingsofthe17thACMConferenceonComputerandCommunicationsSecurity.ACM,2010,pp.176–186.[50]C.Herley,P.vanOorschot,andA.S.Patrick,“Passwords:IfWe'reSoSmart,WhyAreWeStillUsingThem?”FC'09:The13thInternationalConferenceonFinancialCryptographyandDataSecurity,2009.