/
mostprobableaccordingtosomestatisticallan-guagemodel,whosejobistoassig mostprobableaccordingtosomestatisticallan-guagemodel,whosejobistoassig

mostprobableaccordingtosomestatisticallan-guagemodel,whosejobistoassig - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
368 views
Uploaded On 2016-03-11

mostprobableaccordingtosomestatisticallan-guagemodel,whosejobistoassig - PPT Presentation

Model Phetoxf Pthefox 1gram 183109 183109 2gram 3261011 118107 3gram 1891013 104106 OveralongersequenceXoflengthNwecanalsocalculatelog2PXNwhichperShannongivesthecom ID: 252127

Model P(hetoxf) P(thefox) 1-gram 1:83109 1:83109 2-gram 3:261011 1:18107 3-gram 1:891013 1:04106 OveralongersequenceXoflengthN wecanalsocalculatelog2(P(X))=N which(perShan-non)givesthecom

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "mostprobableaccordingtosomestatisticalla..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

mostprobableaccordingtosomestatisticallan-guagemodel,whosejobistoassignsomeproba-bilitytoanysequenceofletters.Accordingtoa1-grammodelofEnglish,theprobabilityofaplaintextp1:::pnisgivenby:P(p1:::pn)=P(p1)P(p2):::P(pn)Thatis,weobtaintheprobabilityofasequencebymultiplyingtogethertheprobabilitiesofthein-dividuallettersthatmakeitup.Thismodelassignsaprobabilitytoanylettersequence,andtheproba-bilitiesofalllettersequencessumtoone.Wecol-lectletterprobabilities(includingspace)from50millionwordsoftextavailablefromtheLinguisticDataConsortium(GraffandFinch,1994).Wealsoestimate2-and3-grammodelsusingthesamere-sources:P(p1:::pn)=P(p1jSTART)P(p2jp1)P(p3jp2):::P(pnjpn�1)P(ENDjpn)P(p1:::pn)=P(p1jSTART)P(p2jSTARTp1)P(p3jp1p2):::P(pnjpn�2pn�1)P(ENDjpn�1pn)Unlikethe1-grammodel,the2-grammodelwillassignalowprobabilitytothesequence“ae”be-causetheprobabilityP(eja)islow.Ofcourse,allthesemodelsarefairlyweak,asalreadyknownby(Shannon,1949).Whenwestochasticallygeneratetextaccordingtothesemodels,weget,forexample:1-gram:...thdodetusariicibtdegirntoihytrsen...2-gram:...itariarissoriorcupunondrkeuth...3-gram:...indthnowelfjusisionthadinatof...4-gram:...recebenceonbuttherservier...5-gram:...mrsearnedageimondtheperious...6-gram:...apartytopossibleuponrestof...7-gram:...tourgeneralthroughapprovethe...WecanfurtherestimatetheprobabilityofawholeEnglishsentenceorphrase.Forexample,theprob-abilitiesoftwoplaintextphrases“hetoxf”and“thefox”(whichhavethesameletterfrequencydistribution)isshownbelow.The1-grammodelwhichcountsonlythefrequencyofoccurrenceofeachletterinthephrase,estimatesthesameproba-bilityforboththephrases“hetoxf”and“thefox”,sincethesamelettersoccurinbothphrases.Ontheotherhand,the2-gramand3-grammodels,whichtakecontextintoaccount,areabletodistinguishbe-tweentheEnglishandnon-Englishphrasesbetter,andhenceassignahigherprobabilitytotheEnglishphrase“thefox”. Model P(hetoxf) P(thefox) 1-gram 1:8310�9 1:8310�9 2-gram 3:2610�11 1:1810�7 3-gram 1:8910�13 1:0410�6 OveralongersequenceXoflengthN,wecanalsocalculate�log2(P(X))=N,which(perShan-non)givesthecompressionratepermittedbythemodel,inbitspercharacter.Inourcase,weget:11-gram:4.192-gram:3.513-gram:2.933DeciphermentGivenaciphertextc1:::cn,wesearchforthekeythatyieldsthemostprobableplaintextp1:::pn.Thereare26!possiblekeys,toomanytoenumerate.How-ever,wecanstillndthebestoneinaguaranteedfashion.Wedothisbytakingourmost-probable-plaintextproblemandcastingitasanintegerpro-grammingproblem.2Hereisasampleintegerprogrammingproblem:variables:x;yminimize:2x+ysubjectto:x+y6:9y�x2:5y&#x]TJ/;༕ ;.9;‘ ;&#xTf 2;�.78;&#x 0 T; [0;1:1Werequirethatxandytakeonintegervalues.Asolutioncanbeobtainedbytypingthisintegerpro-gramintothepubliclyavailablelp solveprogram, 1Becausespacingisxedinourlettersubstitutionciphers,wenormalizeP(X)bythesumofprobabilitiesofallEnglishstringsthatmatchthespacingpatternofX.2Foranoverviewofintegerandlinearprogramming,seeforexample(Schrijver,1998). P(p2jp1):::P(pnjpn�1)P(ENDjpn)](e)Minimize�log2P(p1jSTART)�log2P(p2jp1)�:::�log2P(pnjpn�1)�log2P(ENDjpn)Wecanguaranteethislastoutcomeifwecon-structourminimizationfunctionasasumof2727(n�1)terms,eachofwhichisalinkXYZvariablemultipliedby�log2P(ZjY):Minimizelink1aa�log2P(aja)+link1ab�log2P(bja)+link1ac�log2P(cja)+:::+link5qu�log2P(ujq)+:::Whenweassignvalue1tolinkvariablesalongsomedeciphermentpath,and0toallothers,thisfunctioncomputesthenegativelogprobabilityofthatpath.Wemuststilladdafewmore“subjectto”con-straints.Weneedtoensurethatthechosenpathim-itatestherepetitionpatternoftheciphertext.WhiletheboldpathinFigure1representstheneplain-textchoice“decade”,thedottedpathrepresentsthechoice“ababab”,whichisnotconsistentwiththerepetitionpatternofthecipher“QWBSQW”.Tomakesureoursubstitutionsobeyaconsistentkey,wesetup2727=729newkeyxyvariablestorepresentthechoiceofkey.Thesenewvariablesarealsobinary,takingonvalues0or1.IfvariablekeyaQ=1,thatmeansthekeymapsplaintextatociphertextQ.Clearly,notallassignmentstothese729variablesrepresentvalidkeys,soweaugmentthe“subjectto”partofourintegerprogrambyre-quiringthatforanyletterx,subjectto:keyxA+keyxB+:::+keyxZ+keyx =1keyAx+keyBx+:::+keyZx+key x=1Thatis,everyplaintextlettermustmaptoexactlyoneciphertextletter,andeveryciphertextlettermustmaptoexactlyoneplaintextletter.Wealsoaddaconstrainttoensurethattheciphertextspacecharac-termapstotheplaintextspacecharacter:subjectto:key =1Finally,weensurethatanychosendeciphermentpathoflinkXYZvariablesisconsistentwiththechosenkey.WeknowthatforeverynodeAalongthedeciphermentpath,exactlyoneactivelinkhasAasitsdestination.Forallothernodes,zeroactivelinksleadin.SupposenodeArepresentsthede-ciphermentofciphertextletterciasplaintextletterpj—forallsuchnodes,westipulatethatthesumofvaluesforlink(i�1)xpj(forallx)equalsthevalueofkeypjci.Inotherwords,whetheranodeliesalongthechosendeciphermentpathornot,thechosenkeymustsupportthatdecision.Figure2summarizestheintegerprogramthatweconstructfromagivenciphertextc1:::cn.Thecom-putercodethattransformsanygivencipherintoacorrespondingintegerprogramrunstoaboutonepage.Variationsonthedeciphermentnetworkyield1-gramand3-gramdeciphermentcapabilities.Onceanintegerprogramisgeneratedbymachine,weaskthecommercially-availableCPLEXsoftwaretosolveit,andthenwenotewhichkeyXYvariablesareassignedvalue1.BecauseCPLEXcomputestheoptimalkey,themethodisnotfast—forciphersoflength32,thenumberofvariablesandconstraintsencodedintheintegerprogram(IP)alongwithaver-agerunningtimesareshownbelow.Itispossibletoobtainless-than-optimalkeysfasterbyinterruptingthesolver. Model #ofIP #ofIP Average variables constraints runningtime 1-gram 1;755 1;083 0.01seconds 2-gram 27;700 2;054 50seconds 3-gram 211;600 27;326 450seconds 4DeciphermentExperimentsWecreate50cipherseachoflengths2;4;8;:::;256.Wesolvethesewith1-gram,2-gram,and3-gramlanguagemodels.Werecordtheaveragepercentageofciphertexttokensdecodedincorrectly.50%errormeanshalfoftheciphertexttokensaredecipheredwrong,while0%meansperfectdecipherment.Here thatsuchcipherscanbeattackedwithverylimitedknowledgeofEnglish(nowordsorgrammar)andlittlecustomprogramming.The1-grammodelworksbadlyinthisscenario,whichisconsistentwithBauer's(2006)observationthatforshorttexts,mechanicaldecryptionontheba-sisofindividualletterfrequenciesdoesnotwork.Ifwehadinniteamountsofciphertextandplaintextdrawnfromthesamestochasticsource,wewouldexpecttheplainandcipherfrequenciestoeventuallylineup,allowingustoreadoffacorrectkeyfromthefrequencytables.TheuppercurveinFigure3showsthatconvergencetothisendisslow.5ShannonEquivocationandUnicityDistanceVeryshortciphersarehardtosolveaccurately.Shannon(1949)pinpointedaninherentdifcultywithshortciphers,onethatisindependentoftheso-lutionmethodorlanguagemodelused;thecipheritselfmaynotcontainenoughinformationforitspropersolution.Forexample,givenashortcipherlikeXYYX,wecanneverbesureiftheanswerispeep,noon,anna,etc.Shannondenedamathemat-icalmeasureofourdeciphermentuncertainty,whichhecalledequivocation(nowcalledentropy).LetCbeacipher,Mbetheplaintextmessageitencodes,andKbethekeybywhichtheencodingtakesplace.BeforeevenseeingC,wecancomputeouruncertaintyaboutthekeyKbynotingthatthereare26!equiprobablekeys:4H(K)=�(26!)(1=26!)log2(1=26!)=88:4bitsThatis,anysecretkeycanberevealedin89bits.WhenweactuallyreceiveacipherC,ouruncer-taintyaboutthekeyandtheplaintextmessageisre-duced.Shannondescribedouruncertaintyabouttheplaintextmessage,lettingmrangeoveralldecipher-ments:H(MjC)=equivocationofplaintextmessage=�XmP(mjC)log2P(mjC) 4(Shannon,1948)Theentropyassociatedwithasetofpos-sibleeventswhoseprobabilitiesofoccurrencearep1;p2;:::;pnisgivenbyH=�Pni=1pilog2(pi).P(mjC)isprobabilityofplaintextm(accordingtothelanguagemodel)dividedbythesumofproba-bilitiesofallplaintextmessagesthatobeytherepeti-tionpatternofC.Whileintegerprogramminggivesusamethodtondthemostprobabledeciphermentwithoutenumeratingallkeys,wedonotknowofasimilarmethodtocomputeafullequivocationwith-outenumeratingallkeys.Therefore,wesampleupto100,000plaintextmessagesintheneighborhoodofthemostprobablydecipherment5andcomputeH(MjC)overthatsubset.6ShannonalsodescribedH(KjC),theequivoca-tionofkey.ThisuncertaintyistypicallylargerthanH(MjC),becauseagivenmessageMmaybede-rivedfromCviamorethanonekey,incaseCdoesnotcontainall26lettersofthealphabet.WecomputeH(KjC)bylettingr(C)bethenumberofdistinctlettersinC,andlettingq(C)be(26�r(C))!.Lettingirangeoveroursampleofplaintextmessages,weget:H(KjC)=equivocationofkey=�Xiq(C)(P(i)=q(C))log2(P(i)=q(C))=�XiP(i)log2(P(i)=q(C))=�XiP(i)(log2P(i)�log2q(C))=�XiP(i)log2P(i)+XiP(i)log2q(C)=H(MjC)+log2q(C)Shannon(1949)usedanalyticmeanstoroughlysketchthecurvesforH(KjC)andH(MjC),whichwereproduceinFigure4.Shannon'scurveisdrawnforahuman-levellanguagemodel,andthey-axisisgivenin“decimaldigits”insteadofbits. 5ThesamplingusedtocomputeH(MjC)startswiththeoptimalkeyandexpandsoutafrontier,byswappinglettersinthekey,andrecursingtogeneratenewkeys(andcorrespondingplaintextmessagedecipherments).Theplaintextmessagesarerememberedsothatthefrontierexpandsefciently.Thesam-plingstopsif100,000differentmessagesarefound.6Interestingly,aswegrowoursampleoutfromthemostprobableplaintext,wedonotguaranteethatanyintermediateresultisalowerboundontheequivocation.Anexampleispro-videdbythegrowingsample(0.12,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01),whoseentropysteadilyincreases.However,ifweadda14thitemwhoseP(m)is0.12,theentropysuddenlydecreasesfrom2.79to2.78. denestheunicitydistance(U)asthecipherlengthatwhichwehavevirtuallynomoreuncertaintyabouttheplaintext.Usinganalyticmeans(andvari-ousapproximations),hegives:U=H(K)=(A�B)where:A=bitspercharacterofa0-grammodel(4.7)B=bitspercharacterofthemodelusedtodecipherForahuman-levellanguagemodel(B1:2),heconcludesU25,whichisconrmedbypractice.Forourlanguagemodels,theformulagives:U=173(1-gram)U=74(2-gram)U=50(3-gram)ThesenumbersareinthesameballparkasBauer(2006),whogives167,74,and59.Wenotethatthesepredictedunicitydistancesareabittoorosy,accordingtoourempiricalmessageequivoca-tioncurves.Ourexperienceconrmsthisaswell,as1-gramfrequencycountsovera173-lettercipheraregenerallyinsufcienttopindownasolution.6ConclusionWeprovideamethodfordecipheringlettersubsti-tutioncipherswithlow-ordermodelsofEnglish.Thismethod,basedonintegerprogramming,re-quiresverylittlecodingandcanperformanopti-malsearchoverthekeyspace.Weconcludebynot-ingthatEnglishlanguagemodelscurrentlyusedinspeechrecognition(ChelbaandJelinek,1999)andautomatedlanguagetranslation(Brantsetal.,2007)aremuchmorepowerful,employing,forexample,7-gramwordmodels(notlettermodels)trainedontrillionsofwords.Obtainingoptimalkeysaccord-ingtosuchmodelswillpermittheautomaticdeci-phermentofshorterciphers,butthisrequiresmorespecializedsearchthanwhatisprovidedbygen-eralintegerprogrammingsolvers.Methodssuchastheseshouldalsobeusefulfornaturallanguagedeciphermentproblemssuchascharactercodecon-version,phoneticdecipherment,andwordsubstitu-tioncipherswithapplicationsinmachinetranslation(Knightetal.,2006).7AcknowledgementsTheauthorswishtogratefullyacknowledgeJonathanGraehl,forprovidingaprooftosupporttheargumentthattakingalargernumberofsamplesdoesnotnecessarilyincreasetheequivocation.ThisresearchwassupportedbytheDefenseAdvancedResearchProjectsAgencyunderSRIInternational'sprimeContractNumberNBCHD040058.ReferencesFriedrichL.Bauer.2006.DecryptedSecrets:MethodsandMaximsofCryptology.Springer-Verlag.ThorstenBrants,AshokC.Popat,PengXu,FranzJ.Och,andJeffreyDean.2007.Largelanguagemod-elsinmachinetranslation.InProceedingsofEMNLP-CoNLL.CiprianChelbaandFrederickJelinek.1999.Structuredlanguagemodelingforspeechrecognition.InPro-ceedingsofNLDB:4thInternationalConferenceonApplicationsofNaturalLanguagetoInformationSys-tems.RaviGanesanandAlanT.Sherman.1993.Statisticaltechniquesforlanguagerecognition:Anintroductionandguideforcryptanalysts.Cryptologia,17(4):321–366.DavidGraffandRebeccaFinch.1994.Multilingualtextresourcesatthelinguisticdataconsortium.InPro-ceedingsoftheHLTWorkshoponHumanLanguageTechnology.ThomasJakobsen.1995.Afastmethodforcryptanalysisofsubstitutionciphers.Cryptologia,19(3):265–274.KevinKnight,AnishNair,NishitRathod,andKenjiYa-mada.2006.Unsupervisedanalysisfordeciphermentproblems.InProceedingsoftheCOLING/ACL.EdwinOlson.2007.Robustdictionaryattackofshortsimplesubstitutionciphers.Cryptologia,31(4):332–342.ShmuelPelegandAzrielRosenfeld.1979.Break-ingsubstitutionciphersusingarelaxationalgorithm.Comm.ACM,22(11):598–605.AlexanderSchrijver.1998.TheoryofLinearandIntegerProgramming.JohnWiley&Sons.ClaudeE.Shannon.1948.Amathematicaltheoryofcommunication.BellSystemTechnicalJournal,27:379–423and623–656.ClaudeE.Shannon.1949.Communicationtheoryofsecrecysystems.BellSystemTechnicalJournal,28:656–715.

Related Contents


Next Show more