/
Proceedings of the  Conference on Empirical Methods in Proceedings of the  Conference on Empirical Methods in

Proceedings of the Conference on Empirical Methods in - PDF document

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
409 views
Uploaded On 2015-05-25

Proceedings of the Conference on Empirical Methods in - PPT Presentation

2008 Association for Computational Linguistics Attacking Decipherment Problems Optimally with LowOrder Ngram Models Sujith Ravi and Kevin Knight University of Southern California Information Sciences Institute Marina del Rey California 90292 sravikn ID: 74242

2008 Association for Computational

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Proceedings of the Conference on Empiri..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Proceedingsofthe2008ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages812–819,Honolulu,October2008.c 2008AssociationforComputationalLinguistics 812 AttackingDeciphermentProblemsOptimallywithLow-OrderN-gramModelsSujithRaviandKevinKnightUniversityofSouthernCaliforniaInformationSciencesInstituteMarinadelRey,California90292fsravi,knightg@isi.eduAbstractWeintroduceamethodforsolvingsubsti-tutionciphersusinglow-orderlettern-grammodels.Thismethodenforcesglobalcon-straintsusingintegerprogramming,anditguaranteesthatnodeciphermentkeyisover-looked.Wecarryoutextensiveempiricalex-perimentsshowinghowdeciphermentaccu-racyvariesasafunctionofcipherlengthandn-gramorder.Wealsomakeanempiricalin-vestigationofShannon's(1949)theoryofun-certaintyindecipherment.1IntroductionAnumberofpapershaveexploredalgorithmsforautomaticallysolvingletter-substitutionciphers.Someuseheuristicmethodstosearchforthebestde-terministickey(PelegandRosenfeld,1979;Gane-sanandSherman,1993;Jakobsen,1995;Olson,2007),oftenusingworddictionariestoguidethatsearch.Othersuseexpectation-maximization(EM)tosearchforthebestprobabilistickeyusinglettern-grammodels(Knightetal.,2006).Inthispaper,weintroduceanexactdeciphermentmethodbasedonintegerprogramming.Wecarryoutextensivede-ciphermentexperimentsusinglettern-grammodels,andwendthatouraccuracyratesfarexceedthoseofEM-basedmethods.WealsoempiricallyexploretheconceptsinShan-non's(1949)paperoninformationtheoryasappliedtociphersystems.Weprovidequantitativeplotsforuncertaintyindecipherment,includingthefamousunicitydistance,whichestimateshowlongaciphermustbetovirtuallyeliminatesuchuncertainty.WendtheideasinShannon's(1949)paperrel-evanttoproblemsofstatisticalmachinetranslationandtransliteration.Whenrstexposedtotheideaofstatisticalmachinetranslation,manypeoplenatu-rallyask:(1)howmuchdataisneededtogetagoodresult,and(2)cantranslationsystemsbetrainedwithoutparalleldata?Thesearetoughquestionsbyanystretch,anditisremarkablethatShannonwasalreadyinthe1940stacklingsuchquestionsintherealmofcode-breaking,creatinganalyticformulastoestimateanswers.Ournovelcontributionsareasfollows:Weoutlineanexactletter-substitutiondeci-phermentmethodwhich:-guaranteesthatnokeyisoverlooked,and-canbeexecutedwithstandardintegerpro-grammingsolversWepresentempiricalresultsfordeciphermentwhich:-plotsearch-error-freedeciphermentresultsatvariouscipherlengths,and-demonstrateaccuracyratessuperiortoEM-basedmethodsWecarryoutempiricaltestingofShannon'sformulasfordeciphermentuncertainty2LanguageModelsWeworkonlettersubstitutioncipherswithspaces.Welookforthekey(among26!possibleones)that,whenappliedtotheciphertext,yieldsthemostEnglish-likeresult.Wetake“English-like”tomean 813 mostprobableaccordingtosomestatisticallan-guagemodel,whosejobistoassignsomeproba-bilitytoanysequenceofletters.Accordingtoa1-grammodelofEnglish,theprobabilityofaplaintextp1:::pnisgivenby:P(p1:::pn)=P(p1)P(p2):::P(pn)Thatis,weobtaintheprobabilityofasequencebymultiplyingtogethertheprobabilitiesofthein-dividuallettersthatmakeitup.Thismodelassignsaprobabilitytoanylettersequence,andtheproba-bilitiesofalllettersequencessumtoone.Wecol-lectletterprobabilities(includingspace)from50millionwordsoftextavailablefromtheLinguisticDataConsortium(GraffandFinch,1994).Wealsoestimate2-and3-grammodelsusingthesamere-sources:P(p1:::pn)=P(p1jSTART)P(p2jp1)P(p3jp2):::P(pnjpn�1)P(ENDjpn)P(p1:::pn)=P(p1jSTART)P(p2jSTARTp1)P(p3jp1p2):::P(pnjpn�2pn�1)P(ENDjpn�1pn)Unlikethe1-grammodel,the2-grammodelwillassignalowprobabilitytothesequence“ae”be-causetheprobabilityP(eja)islow.Ofcourse,allthesemodelsarefairlyweak,asalreadyknownby(Shannon,1949).Whenwestochasticallygeneratetextaccordingtothesemodels,weget,forexample:1-gram:...thdodetusariicibtdegirntoihytrsen...2-gram:...itariarissoriorcupunondrkeuth...3-gram:...indthnowelfjusisionthadinatof...4-gram:...recebenceonbuttherservier...5-gram:...mrsearnedageimondtheperious...6-gram:...apartytopossibleuponrestof...7-gram:...tourgeneralthroughapprovethe...WecanfurtherestimatetheprobabilityofawholeEnglishsentenceorphrase.Forexample,theprob-abilitiesoftwoplaintextphrases“hetoxf”and“thefox”(whichhavethesameletterfrequencydistribution)isshownbelow.The1-grammodelwhichcountsonlythefrequencyofoccurrenceofeachletterinthephrase,estimatesthesameproba-bilityforboththephrases“hetoxf”and“thefox”,sincethesamelettersoccurinbothphrases.Ontheotherhand,the2-gramand3-grammodels,whichtakecontextintoaccount,areabletodistinguishbe-tweentheEnglishandnon-Englishphrasesbetter,andhenceassignahigherprobabilitytotheEnglishphrase“thefox”. Model P(hetoxf) P(thefox) 1-gram 1:8310�9 1:8310�9 2-gram 3:2610�11 1:1810�7 3-gram 1:8910�13 1:0410�6 OveralongersequenceXoflengthN,wecanalsocalculate�log2(P(X))=N,which(perShan-non)givesthecompressionratepermittedbythemodel,inbitspercharacter.Inourcase,weget:11-gram:4.192-gram:3.513-gram:2.933DeciphermentGivenaciphertextc1:::cn,wesearchforthekeythatyieldsthemostprobableplaintextp1:::pn.Thereare26!possiblekeys,toomanytoenumerate.How-ever,wecanstillndthebestoneinaguaranteedfashion.Wedothisbytakingourmost-probable-plaintextproblemandcastingitasanintegerpro-grammingproblem.2Hereisasampleintegerprogrammingproblem:variables:x;yminimize:2x+ysubjectto:x+y6:9y�x2:5y&#x]TJ/;༕ ;.9;‘ ;&#xTf 2;�.78;&#x 0 T; [0;1:1Werequirethatxandytakeonintegervalues.Asolutioncanbeobtainedbytypingthisintegerpro-gramintothepubliclyavailablelp solveprogram, 1Becausespacingisxedinourlettersubstitutionciphers,wenormalizeP(X)bythesumofprobabilitiesofallEnglishstringsthatmatchthespacingpatternofX.2Foranoverviewofintegerandlinearprogramming,seeforexample(Schrijver,1998). 814 1 2 3 4 5 6 7 8 …_ Q W B S Q W _ …a a a a a a a a …b b b b b b b b …c c c c c c c c …d d d d d d d d …e e e e e e e e …… … … … … … … … …z z z z z z z z …_ _ _ _ _ _ _ _ …ciphertextnetwork ofpossibleplaintexts link-2delink-5ad link-7e_ 1 2 3 4 5 6 7 8 …_ Q W B S Q W _ …a a a a a a a a …b b b b b b b b …c c c c c c c c …d d d d d d d d …e e e e e e e e …… … … … … … … … …z z z z z z z z …_ _ _ _ _ _ _ _ …ciphertextnetwork ofpossibleplaintexts link-2delink-5ad link-7e_ Figure1:Adeciphermentnetwork.Thebeginningoftheciphertextisshownatthetopofthegure(underscoresrepresentspaces).Anyleft-to-rightpaththroughthenetworkconstitutesapotentialdecipherment.Theboldpathcorrespondstothedecipherment“decade”.Thedottedpathcorrespondstothedecipherment“ababab”.Givenacipherlengthofn,thenetworkhas2727(n�1)linksand27npaths.Eachlinkcorrespondstoanamedvariableinourintegerprogram.Threelinksareshownwiththeirnamesinthegure.orthecommerciallyavailableCPLEXprogram,whichyieldstheresult:x=4;y=2.Supposewewanttodecipherwitha2-gramlan-guagemodel,i.e.,wewanttondthekeythatyieldstheplaintextofhighest2-gramprobability.Giventheciphertextc1:::cn,wecreateanintegerprogram-mingproblemasfollows.First,wesetupanet-workofpossibledecipherments(Figure1).Eachofthe2727(n�1)linksinthenetworkisabinaryvariableintheintegerprogram—itmustbeassignedavalueofeither0or1.WenamethesevariableslinkXYZ,whereXindicatesthecolumnofthelink'ssource,andYandZrepresenttherowsofthelink'ssourceanddestination(e.g.variableslink1aa,link1ab,link5qu,...).Eachdistinctleft-to-rightpaththroughthenet-workcorrespondstoadifferentdecipherment.Forexample,theboldpathinFigure1correspondstothedecipherment“decade”.Deciphermentamountstoturningsomelinks“on”(assigningvalue1tothelinkvariable)andothers“off”(assigningvalue0).Notallassignmentsof0'sand1'stolinkvariablesresultinacoherentleft-to-rightpath,sowemustplacesome“subjectto”constraintsinourintegerprogram.Weobservethatasetofvariablesformsapathif,foreverynodeincolumns2throughn�1ofthenet-work,thefollowingpropertyholds:thesumofthevaluesofthelinkvariablesenteringthenodeequalsthesumofthelinkvariablesleavingthenode.Fornodesalongachosendeciphermentpath,thissumwillbe1,andforothers,itwillbe0.3Therefore,wecreateone“subjectto”constraintforeachnode(“ ”standsforspace).Forexample,forthenodeincolumn2,rowewehave:subjectto:link1ae+link1be+link1ce+:::+link1 e=link2ea+link2eb+link2ec+:::+link2e Nowwesetupanexpressionforthe“minimize”partoftheintegerprogram.Recallthatwewanttoselecttheplaintextp1:::pnofhighestprobability.Forthe2-gramlanguagemodel,thefollowingareequivalent:(a)MaximizeP(p1:::pn)(b)Maximizelog2P(p1:::pn)(c)Minimize�log2P(p1:::pn)(d)Minimize�log2[P(p1jSTART) 3Strictlyspeaking,thisconstraintovernodesstillallowsmultipledeciphermentpathstobeactive,butwecanrelyontherestofourintegerprogramtoselectonlyone. 815 P(p2jp1):::P(pnjpn�1)P(ENDjpn)](e)Minimize�log2P(p1jSTART)�log2P(p2jp1)�:::�log2P(pnjpn�1)�log2P(ENDjpn)Wecanguaranteethislastoutcomeifwecon-structourminimizationfunctionasasumof2727(n�1)terms,eachofwhichisalinkXYZvariablemultipliedby�log2P(ZjY):Minimizelink1aa�log2P(aja)+link1ab�log2P(bja)+link1ac�log2P(cja)+:::+link5qu�log2P(ujq)+:::Whenweassignvalue1tolinkvariablesalongsomedeciphermentpath,and0toallothers,thisfunctioncomputesthenegativelogprobabilityofthatpath.Wemuststilladdafewmore“subjectto”con-straints.Weneedtoensurethatthechosenpathim-itatestherepetitionpatternoftheciphertext.WhiletheboldpathinFigure1representstheneplain-textchoice“decade”,thedottedpathrepresentsthechoice“ababab”,whichisnotconsistentwiththerepetitionpatternofthecipher“QWBSQW”.Tomakesureoursubstitutionsobeyaconsistentkey,wesetup2727=729newkeyxyvariablestorepresentthechoiceofkey.Thesenewvariablesarealsobinary,takingonvalues0or1.IfvariablekeyaQ=1,thatmeansthekeymapsplaintextatociphertextQ.Clearly,notallassignmentstothese729variablesrepresentvalidkeys,soweaugmentthe“subjectto”partofourintegerprogrambyre-quiringthatforanyletterx,subjectto:keyxA+keyxB+:::+keyxZ+keyx =1keyAx+keyBx+:::+keyZx+key x=1Thatis,everyplaintextlettermustmaptoexactlyoneciphertextletter,andeveryciphertextlettermustmaptoexactlyoneplaintextletter.Wealsoaddaconstrainttoensurethattheciphertextspacecharac-termapstotheplaintextspacecharacter:subjectto:key =1Finally,weensurethatanychosendeciphermentpathoflinkXYZvariablesisconsistentwiththechosenkey.WeknowthatforeverynodeAalongthedeciphermentpath,exactlyoneactivelinkhasAasitsdestination.Forallothernodes,zeroactivelinksleadin.SupposenodeArepresentsthede-ciphermentofciphertextletterciasplaintextletterpj—forallsuchnodes,westipulatethatthesumofvaluesforlink(i�1)xpj(forallx)equalsthevalueofkeypjci.Inotherwords,whetheranodeliesalongthechosendeciphermentpathornot,thechosenkeymustsupportthatdecision.Figure2summarizestheintegerprogramthatweconstructfromagivenciphertextc1:::cn.Thecom-putercodethattransformsanygivencipherintoacorrespondingintegerprogramrunstoaboutonepage.Variationsonthedeciphermentnetworkyield1-gramand3-gramdeciphermentcapabilities.Onceanintegerprogramisgeneratedbymachine,weaskthecommercially-availableCPLEXsoftwaretosolveit,andthenwenotewhichkeyXYvariablesareassignedvalue1.BecauseCPLEXcomputestheoptimalkey,themethodisnotfast—forciphersoflength32,thenumberofvariablesandconstraintsencodedintheintegerprogram(IP)alongwithaver-agerunningtimesareshownbelow.Itispossibletoobtainless-than-optimalkeysfasterbyinterruptingthesolver. Model #ofIP #ofIP Average variables constraints runningtime 1-gram 1;755 1;083 0.01seconds 2-gram 27;700 2;054 50seconds 3-gram 211;600 27;326 450seconds 4DeciphermentExperimentsWecreate50cipherseachoflengths2;4;8;:::;256.Wesolvethesewith1-gram,2-gram,and3-gramlanguagemodels.Werecordtheaveragepercentageofciphertexttokensdecodedincorrectly.50%errormeanshalfoftheciphertexttokensaredecipheredwrong,while0%meansperfectdecipherment.Here 816 0 10 20 30 40 50 60 70 80 90 100 2481632641282565121024 2-gram 3-gram variables:linkipr1iftheithcipherletterisdecipheredasplaintextletterpANDthe(i+1)thcipherletterisdecipheredasplaintextletterr0otherwisekeypq1ifdeciphermentkeymapsplaintextletterptociphertextletterq0otherwiseminimize:Pn�1i=1Pp;rlinkipr�logP(rjp)(2-gramprobabilityofchosenplaintext)subjectto:forallp:Prkeypr=1(eachplaintextlettermapstoexactlyoneciphertextletter)forallp:Prkeyrp=1(eachciphertextlettermapstoexactlyoneplaintextletter)key =1(cipherspacecharactermapstoplainspacecharacter)for(i=1...n-2),forallr:[Pplinkipr=Pplink(i+1)rp](chosenlinksformaleft-to-rightpath)for(i=1...n-1),forallp:Prlinkirp=keypci+1(chosenlinksareconsistentwithchosenkey)Figure2:Summaryofhowtobuildanintegerprogramforanygivenciphertextc1:::cn.Solvingtheintegerprogramwillyieldthedeciphermentofhighestprobability.weillustratesomeautomaticdeciphermentswither-rorrates:42%error:theavelageongrichmanhalcywiofasevesonmequsantizextythathebuprklathesweblungthansoment-fotesmmasthes11%error:theaverageenglishmanhassoweekareferenceforantialitythathewouldratherbeprongthanrecent-deterquarteur2%error:theaverageenglishmanhassokeepareferenceforantiquitythathewouldratherbewrongthanrecent-petermcarthur0%error:theaverageenglishmanhassodeepareverenceforantiquitythathewouldratherbewrongthanrecent-petermcarthurFigure3showsourautomaticdeciphermentre-sults.Wenotethatthesolutionmethodisexact,notheuristic,sothatdeciphermenterrorisnotduetosearcherror.Ouruseofglobalkeyconstraintsalsoleadstoaccuracythatissuperiorto(Knightetal.,2006).Witha2-grammodel,theirEMalgorithmgives10%errorfora414-lettercipher,whileourmethodprovidesasolutionwithonly0:5%error.Atshortercipherlengths,weobservemuchhigherimprovementswhenusingourmethod.Forexam- Figure3:Averagedeciphermenterrorusingintegerpro-grammingvs.cipherlength,for1-gram,2-gramand3-grammodelsofEnglish.Errorbarsindicate95%con-denceintervals.ple,ona52-lettertextbookcipher,usinga2-grammodel,thesolutionfromourmethodresultedin21%errorascomparedto85%errorgivenbytheEMso-lution.Weseethatdecipheringwith3-gramsworkswellonciphersoflength64ormore.Thisconrms 817 thatsuchcipherscanbeattackedwithverylimitedknowledgeofEnglish(nowordsorgrammar)andlittlecustomprogramming.The1-grammodelworksbadlyinthisscenario,whichisconsistentwithBauer's(2006)observationthatforshorttexts,mechanicaldecryptionontheba-sisofindividualletterfrequenciesdoesnotwork.Ifwehadinniteamountsofciphertextandplaintextdrawnfromthesamestochasticsource,wewouldexpecttheplainandcipherfrequenciestoeventuallylineup,allowingustoreadoffacorrectkeyfromthefrequencytables.TheuppercurveinFigure3showsthatconvergencetothisendisslow.5ShannonEquivocationandUnicityDistanceVeryshortciphersarehardtosolveaccurately.Shannon(1949)pinpointedaninherentdifcultywithshortciphers,onethatisindependentoftheso-lutionmethodorlanguagemodelused;thecipheritselfmaynotcontainenoughinformationforitspropersolution.Forexample,givenashortcipherlikeXYYX,wecanneverbesureiftheanswerispeep,noon,anna,etc.Shannondenedamathemat-icalmeasureofourdeciphermentuncertainty,whichhecalledequivocation(nowcalledentropy).LetCbeacipher,Mbetheplaintextmessageitencodes,andKbethekeybywhichtheencodingtakesplace.BeforeevenseeingC,wecancomputeouruncertaintyaboutthekeyKbynotingthatthereare26!equiprobablekeys:4H(K)=�(26!)(1=26!)log2(1=26!)=88:4bitsThatis,anysecretkeycanberevealedin89bits.WhenweactuallyreceiveacipherC,ouruncer-taintyaboutthekeyandtheplaintextmessageisre-duced.Shannondescribedouruncertaintyabouttheplaintextmessage,lettingmrangeoveralldecipher-ments:H(MjC)=equivocationofplaintextmessage=�XmP(mjC)log2P(mjC) 4(Shannon,1948)Theentropyassociatedwithasetofpos-sibleeventswhoseprobabilitiesofoccurrencearep1;p2;:::;pnisgivenbyH=�Pni=1pilog2(pi).P(mjC)isprobabilityofplaintextm(accordingtothelanguagemodel)dividedbythesumofproba-bilitiesofallplaintextmessagesthatobeytherepeti-tionpatternofC.Whileintegerprogramminggivesusamethodtondthemostprobabledeciphermentwithoutenumeratingallkeys,wedonotknowofasimilarmethodtocomputeafullequivocationwith-outenumeratingallkeys.Therefore,wesampleupto100,000plaintextmessagesintheneighborhoodofthemostprobablydecipherment5andcomputeH(MjC)overthatsubset.6ShannonalsodescribedH(KjC),theequivoca-tionofkey.ThisuncertaintyistypicallylargerthanH(MjC),becauseagivenmessageMmaybede-rivedfromCviamorethanonekey,incaseCdoesnotcontainall26lettersofthealphabet.WecomputeH(KjC)bylettingr(C)bethenumberofdistinctlettersinC,andlettingq(C)be(26�r(C))!.Lettingirangeoveroursampleofplaintextmessages,weget:H(KjC)=equivocationofkey=�Xiq(C)(P(i)=q(C))log2(P(i)=q(C))=�XiP(i)log2(P(i)=q(C))=�XiP(i)(log2P(i)�log2q(C))=�XiP(i)log2P(i)+XiP(i)log2q(C)=H(MjC)+log2q(C)Shannon(1949)usedanalyticmeanstoroughlysketchthecurvesforH(KjC)andH(MjC),whichwereproduceinFigure4.Shannon'scurveisdrawnforahuman-levellanguagemodel,andthey-axisisgivenin“decimaldigits”insteadofbits. 5ThesamplingusedtocomputeH(MjC)startswiththeoptimalkeyandexpandsoutafrontier,byswappinglettersinthekey,andrecursingtogeneratenewkeys(andcorrespondingplaintextmessagedecipherments).Theplaintextmessagesarerememberedsothatthefrontierexpandsefciently.Thesam-plingstopsif100,000differentmessagesarefound.6Interestingly,aswegrowoursampleoutfromthemostprobableplaintext,wedonotguaranteethatanyintermediateresultisalowerboundontheequivocation.Anexampleispro-videdbythegrowingsample(0.12,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01),whoseentropysteadilyincreases.However,ifweadda14thitemwhoseP(m)is0.12,theentropysuddenlydecreasesfrom2.79to2.78. 818 UnicityDistance Key Equivocation Message Equivocation 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 Equivocation of key (bits)Cipher Length 1-gram 2-gram 3-gram 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 Equivocation of message (bits)Cipher Length 1-gram 2-gram 3-gram Figure4:EquivocationforsimplesubstitutiononEnglish(Shannon,1949). Figure5:Averagekeyequivocationobserved(bits)vs.cipherlength(letters),for1-gram,2-gramand3-grammodelsofEnglish.Forcomparison,weplotinFigures5and6theav-erageequivocationsasweempiricallyobservethemusingour1-,2-,and3-gramlanguagemodels.TheshapeofthekeyequivocationcurvefollowsShannon,exceptthatitiscurvedfromthestart,ratherthanstraight.ThemessageequivocationcurvefollowsShan-non'sprediction,risingthenfalling.Becauseveryshortciphershaverelativelyfewsolutions(forex- Figure6:Averagemessageequivocationobserved(bits)vs.cipherlength(letters),for1-gram,2-gramand3-grammodelsofEnglish.ample,aone-lettercipherhasonly26),theoveralluncertaintyisnotthatgreat.7Astheciphergetslonger,messageequivocationrises.Atsomepoint,itthendecreases,asthecipherbeginstorevealitssecretthroughpatternsofrepetition.Shannon'sanalyticmodelalsopredictsasharpdeclineofmessageequivocationtowardszero.He 7Uncertaintyisonlylooselyrelatedtoaccuracy—evenifwearequitecertainaboutasolution,itmaystillbewrong. 819 denestheunicitydistance(U)asthecipherlengthatwhichwehavevirtuallynomoreuncertaintyabouttheplaintext.Usinganalyticmeans(andvari-ousapproximations),hegives:U=H(K)=(A�B)where:A=bitspercharacterofa0-grammodel(4.7)B=bitspercharacterofthemodelusedtodecipherForahuman-levellanguagemodel(B1:2),heconcludesU25,whichisconrmedbypractice.Forourlanguagemodels,theformulagives:U=173(1-gram)U=74(2-gram)U=50(3-gram)ThesenumbersareinthesameballparkasBauer(2006),whogives167,74,and59.Wenotethatthesepredictedunicitydistancesareabittoorosy,accordingtoourempiricalmessageequivoca-tioncurves.Ourexperienceconrmsthisaswell,as1-gramfrequencycountsovera173-lettercipheraregenerallyinsufcienttopindownasolution.6ConclusionWeprovideamethodfordecipheringlettersubsti-tutioncipherswithlow-ordermodelsofEnglish.Thismethod,basedonintegerprogramming,re-quiresverylittlecodingandcanperformanopti-malsearchoverthekeyspace.Weconcludebynot-ingthatEnglishlanguagemodelscurrentlyusedinspeechrecognition(ChelbaandJelinek,1999)andautomatedlanguagetranslation(Brantsetal.,2007)aremuchmorepowerful,employing,forexample,7-gramwordmodels(notlettermodels)trainedontrillionsofwords.Obtainingoptimalkeysaccord-ingtosuchmodelswillpermittheautomaticdeci-phermentofshorterciphers,butthisrequiresmorespecializedsearchthanwhatisprovidedbygen-eralintegerprogrammingsolvers.Methodssuchastheseshouldalsobeusefulfornaturallanguagedeciphermentproblemssuchascharactercodecon-version,phoneticdecipherment,andwordsubstitu-tioncipherswithapplicationsinmachinetranslation(Knightetal.,2006).7AcknowledgementsTheauthorswishtogratefullyacknowledgeJonathanGraehl,forprovidingaprooftosupporttheargumentthattakingalargernumberofsamplesdoesnotnecessarilyincreasetheequivocation.ThisresearchwassupportedbytheDefenseAdvancedResearchProjectsAgencyunderSRIInternational'sprimeContractNumberNBCHD040058.ReferencesFriedrichL.Bauer.2006.DecryptedSecrets:MethodsandMaximsofCryptology.Springer-Verlag.ThorstenBrants,AshokC.Popat,PengXu,FranzJ.Och,andJeffreyDean.2007.Largelanguagemod-elsinmachinetranslation.InProceedingsofEMNLP-CoNLL.CiprianChelbaandFrederickJelinek.1999.Structuredlanguagemodelingforspeechrecognition.InPro-ceedingsofNLDB:4thInternationalConferenceonApplicationsofNaturalLanguagetoInformationSys-tems.RaviGanesanandAlanT.Sherman.1993.Statisticaltechniquesforlanguagerecognition:Anintroductionandguideforcryptanalysts.Cryptologia,17(4):321–366.DavidGraffandRebeccaFinch.1994.Multilingualtextresourcesatthelinguisticdataconsortium.InPro-ceedingsoftheHLTWorkshoponHumanLanguageTechnology.ThomasJakobsen.1995.Afastmethodforcryptanalysisofsubstitutionciphers.Cryptologia,19(3):265–274.KevinKnight,AnishNair,NishitRathod,andKenjiYa-mada.2006.Unsupervisedanalysisfordeciphermentproblems.InProceedingsoftheCOLING/ACL.EdwinOlson.2007.Robustdictionaryattackofshortsimplesubstitutionciphers.Cryptologia,31(4):332–342.ShmuelPelegandAzrielRosenfeld.1979.Break-ingsubstitutionciphersusingarelaxationalgorithm.Comm.ACM,22(11):598–605.AlexanderSchrijver.1998.TheoryofLinearandIntegerProgramming.JohnWiley&Sons.ClaudeE.Shannon.1948.Amathematicaltheoryofcommunication.BellSystemTechnicalJournal,27:379–423and623–656.ClaudeE.Shannon.1949.Communicationtheoryofsecrecysystems.BellSystemTechnicalJournal,28:656–715.