/
Figure1.Thetreerepresentationoftheregularexpression((ajb)c(ajb)c)(a Figure1.Thetreerepresentationoftheregularexpression((ajb)c(ajb)c)(a

Figure1.Thetreerepresentationoftheregularexpression((ajb)c(ajb)c)(a - PDF document

pasty-toler
pasty-toler . @pasty-toler
Follow
387 views
Uploaded On 2015-10-08

Figure1.Thetreerepresentationoftheregularexpression((ajb)c(ajb)c)(a - PPT Presentation

SeqRepAltb a RepSeqSeqc RepAltb a Seqc RepAltb a shiftmREPrcREPshiftm nalrrcWeshiftamarkintotheinnerexpressionifapreviouscharacterwasmarkedoranalcharacterintheexpressionismark ID: 153540

SeqRepAlt'b' 'a' RepSeqSeq'c' RepAlt'b' 'a' Seq'c' RepAlt'b' 'a' shiftm(REPr)c=REP(shift(m_ nalr)rc)Weshiftamarkintotheinnerexpressionifapreviouscharacterwasmarkedoranalcharacterintheexpressionismark

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Figure1.Thetreerepresentationoftheregula..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

SeqRepAlt'b' 'a' RepSeqSeq'c' RepAlt'b' 'a' Seq'c' RepAlt'b' 'a' Figure1.Thetreerepresentationoftheregularexpression((ajb)c(ajb)c)(ajb)whichmatchesallwordsinfa;b;cgwithanevennumberofoccurrencesofcTHEOAsapredicate,inductivelyonthestructureofyourdatatype.(Writessomeformaldenitionstothewhiteboard:seman-ticbrackets,Greekletters,languagesassets,etc.)HAZEL(goestothekeyboard,sitsdownnexttoCODY.)Ok,thiscanbeeasilycodedinHaskell,asacharacteristicfunction.Listcomprehensionsarefairlyuseful,aswell.(Writesthefollowingdenitioninhertexteditor.)accept::Reg!String!BoolacceptEpsu=nulluaccept(Symc)u=u=I[c]accept(Altpq)u=acceptpu_acceptquaccept(Seqpq)u=or[acceptpu1^acceptqu2j(u1;u2) splitu]accept(Repr)u=or[and[acceptruijui ps]jps partsu]THEOLetmesee.splitproducesalldecompositionsofastringintotwofactors,andpartsstandsfor“partitions”andproducesalldecompositionsofastringintoanarbitrarynumberoffactors.CODYWait!Weneedtobecarefultoavoidemptyfactorswhendeningparts.Otherwisethereisaninnitenumberofpossibledecompositions.HAZELRight.Butsplitmustalsoproduceemptypartsandcanbedenedasfollows.(ContinueswritingtheHaskellprogram.)split::[a]![([a];[a])]split[]=[([];[])]split(c:cs)=([];c:cs):[(c:s1;s2)j(s1;s2) splitcs]Thefunctionpartsisageneralizationofsplittosplitwordsintoanynumberoffactors(notjusttwo)exceptforemptyones.CODYThat'stricky.Let'suselistcomprehensionsagain.(Sitsdownononeoftheemptychairs,grabsthekeyboardandex-tendstheprogramasfollows:)parts::[a]![[[a]]]parts[]=[[]]parts[c]=[[[c]]]parts(c:cs)=concat[[(c:p):ps;[c]:p:ps]jp:ps partscs]Wesplitawordwithatleasttwocharactersrecursivelyandeitheraddtherstcharactertotherstfactororadditasanewfactor.THEOWhydoyouwrite[a]andnotString.HAZELThat'sbecausewewanttobemoregeneral.Wecannowworkwitharbitrarylisttypesinsteadofstringsonly.THEOThatmakessensetome.CODYMaybe,it'sgoodtohaveaseparatenamefortheselists.IthinkHazelusedthetermwords—that'sagoodterm.Let'ssticktoit.THEOIwanttocheckoutyourcode.(Sitsdownaswell.Nowallthreebuildasmallcrowdinfrontofthemonitor.)�ghciparts"acc"[["acc"],["a","cc"],["ac","c"],["a","c","c"]]�ghciacceptevencs"acc"TrueTHEOAha.(Pausestothinkforamoment.)Waitasecond!Thenumberofdecompositionsofastringoflengthn+1is2n.Blindlycheckingallofthemisnotefcient.Whenyouconvertaregularexpressionintoanequivalentnite-stateautomatonandusethisautomatonformatching,then,foraxedregularexpression,theruntimeofthematchingalgorithmislinearinthelengthofthestring.HAZELWell,theprogramisnotmeanttobeefcient.It'sonlyaspecication,albeitexecutable.Wecanwriteanefcientprogramlater.WhatIammoreinterestediniswhetherwecanmakethespecicationabitmoreinterestingrst.Canitbegeneralized,forinstance?THEO(staringoutofthewindow.)Wecanaddweights.HAZELWeights?SCENEII.WEIGHTSHAZEL,CODY,andTHEOarestillsittingaroundthelaptop.HAZELWhatdoyoumeanbyweights?THEORememberwhatwedidabove?Givenaregularexpression,weassignedtoawordabooleanvaluereectingwhetherthewordmatchesthegivenexpressionornot.Now,weproducemorecomplexvalues—semiringelements.HAZELWhat'sanexample?Isthisusefulatall?THEOAverysimpleexampleistodeterminethelengthofawordorthenumberofoccurrencesofagivensymbolinaword.Amorecomplicatedexamplewouldbetocountthenumberofmatchingsofawordagainstaregularexpression,ortodeterminealeftmostlongestmatchingsubword.CODYThatsoundsinteresting,butwhatwasasemiring,again?HAZELIfIremembercorrectlyfrommyalgebracourseasemiringisanalgebraicstructurewithzero,one,addition,andmultipli-cationthatsatisescertainlaws.(AddsaHaskelltypeclassforsemiringstotheHaskellle.)classSemiringswherezero;one::s();( )::s!s!sHere,zeroisanidentityfor,onefor ,bothcompositionoperatorsareassociative,andiscommutative,inaddition.THEOThat'strue,but,moreover,theusualdistributivitylawsholdandzeroannihilatesasemiringwithrespecttomultiplications,whichmeansthatbothzero sands zeroarezeroforalls.HAZELTheselawsarenotenforcedbyHaskell,soprogram-mersneedtoensurethattheyholdwhendeninginstancesofSemiring.CODYOk,ne.Iguesswhatweneedtodoistoaddweightstothesymbolsinourregularexpressions.THEO(sippingcoffee)Right.CODYSolet'smakeadatatypeforweightedregularexpressions.THEO(interjects.)Cool,that'sexactlytheterminologyweuseinformallanguagetheory. shiftm(REPr)c=REP(shift(m_ nalr)rc)Weshiftamarkintotheinnerexpressionifapreviouscharacterwasmarkedoranalcharacterintheexpressionismarked.HAZELOk,let'sdenethehelperfunctionsemptyand nal.Iguess,thisisprettystraightforward.(Typesthedenitionofemptyinhertexteditor.)empty::REG!BoolemptyEPS=Trueempty(SYM )=Falseempty(ALTpq)=emptyp_emptyqempty(SEQpq)=emptyp^emptyqempty(REPr)=TrueNosurpriseshere.Howabout nal?(Goesontyping.) nal::REG!Bool nalEPS=False nal(SYMb )=b nal(ALTpq)= nalp_ nalq nal(SEQpq)= nalp^emptyq_ nalq nal(REPr)= nalrCODY(pointingtothescreen)Thecaseforsequencesiswrong.Itlookssimilartothedenitioninshift,butyoumixedupthevariablespandq.(Takesthekeyboardandwantstochangethedenition.)HAZELNo,stop!Thisiscorrect. nalanalyzestheregularexpres-sionintheotherdirection.Analcharacteroftherstpartisalsoanalcharacterofthewholesequenceifthesecondpartacceptstheemptyword.Ofcourse,analcharacterinthesec-ondpartisalwaysanalcharacterofthewholesequence,aswell.CODYGotit.Let'swrapallthisupintoanefcientfunctionmatchforregularexpressionmatching.(Continuestyping.)Thetypeofmatchisthesameasthetypeofaccept—ourpreviouslydenedspecication.match::REG!String!BoolIfthegivenwordisempty,wecancheckwhethertheexpressionmatchestheemptywordusingempty:matchr[]=emptyrIfthegivenwordisanonemptywordc:cswemarkallsymbolsofthegivenexpressionwhichmayberesponsibleformatchingtherstcharactercbycallingshiftTruerc.ThenwesubsequentlyshifttheothercharactersusingshiftFalse.matchr(c:cs)= nal(foldl(shiftFalse)(shiftTruerc)cs)THEOWhyhastheargumenttobeFalse?CODYBecause,afterhavingprocessedtherstcharacter,weonlywanttoshiftexistingmarkswithoutaddingnewmarksfromtheleft.Finally,wecheckwhethertheexpressioncontainsanalcharacterafterprocessingthewholeinputword.HAZELThatisaprettyconciseimplementationofregularexpres-sionmatching!However,I'mnotyethappywiththedeni-tionofshiftandhowitrepeatedlycallstheauxiliaryfunctionsemptyand nalwhichtraversetheirargumentinadditiontothetraversalbyshift.Lookattheruleforsequencesagain!(Pointsattheshiftruleforsequencesonthescreen.)shiftm(SEQpq)c=SEQ(shiftmpc)(shift(m^emptyp_ nalp)qc)dataREGwcs=REGwfemptyw::s; nalw::s;regw::REwcsgdataREwcs=EPSwjSYMw(c!s)jALTw(REGwcs)(REGwcs)jSEQw(REGwcs)(REGwcs)jREPw(REGwcs)epsw::Semirings)REGwcsepsw=REGwfemptyw=one; nalw=zero;regw=EPSwgsymw::Semirings)(c!s)!REGwcssymwf=REGwfemptyw=zero; nalw=zero;regw=SYMwfgaltw::Semirings)REGwcs!REGwcs!REGwcsaltwpq=REGwfemptyw=emptywpemptywq; nalw= nalwp nalwq;regw=ALTwpqgseqw::Semirings)REGwcs!REGwcs!REGwcsseqwpq=REGwfemptyw=emptywp emptywq; nalw= nalwp emptywq nalwq;regw=SEQwpqgrepw::Semirings)REGwcs!REGwcsrepwr=REGwfemptyw=one; nalw= nalwr;regw=REPwrgmatchw::Semirings)REGwcs![c]!smatchwr[]=emptywrmatchwr(c:cs)= nalw(foldl(shiftwzeroregw)(shiftwone(regwr)c)cs)shiftw::Semirings)s!REwcs!c!REGwcsshiftw EPSw =epswshiftwm(SYMwf)c=(symwf)f nalw=m fcgshiftwm(ALTwpq)c=altw(shiftwm(regwp)c)(shiftwm(regwq)c)shiftwm(SEQwpq)c=seqw(shiftwm(regwp)c)(shiftw(m emptywp nalwp)(regwq)c)shiftwm(REPwr)c=repw(shiftw(m nalwr)(regwr)c) Figure4.EfcientmatchingofweightedregularexpressionsTherearethreecallswhichtraversepandoneofthemisarecursivecalltoshift.So,ifpcontainsanothersequencewheretheleftpartcontainsanothersequencewheretheleftpartcontainsanothersequenceandsoon,thismayleadtoquadraticruntimeinthesizeoftheregularexpression.Weshouldcomeupwithimplementationsofemptyand nalwithconstantruntime. HAZELThat'snottoofast,isit?Let'stryourimplementation.(SwitchestoGHCiandstartstyping.)�ghcileta=symw('a'==)�ghciletseqnn=foldr1seqw.replicaten�ghciletren=seqnn(altwaepsw)`seqw`seqnna�ghci:set+s�ghcimatchw(re500)(replicate500'a')True(5.99secs,491976576bytes)CODYGood.We'refasterthangrepandwedidn'tevencompile!Butit'susingalotofmemory.Letmesee.(WritesasmallprogramtomatchthestandardinputstreamagainsttheaboveexpressionandcompilesitusingGHC.)I'llpassthe-soptiontotherun-timesystemsowecanseebothruntimeandmemoryusagewithoutusingthetimecommand.�bashforiin`seq1500`;doecho-na;done|\�..../re500+RTS-smatch...1MBtotalmemoryinuse...Totaltime0.06s(0.21selapsed)...Seemslikeweneedamoreseriouscompetitor!HAZELItoldyouaboutGoogle'snewlibrary.TheyimplementedanalgorithminC++withsimilarworstcaseperformanceasouralgorithm.DoyouknowanyC++?CODYGosh!Thelightfades,thetwokeeptyping,theonlylightemergesfromthescreen.Afterafewseconds,thelightgoesonagain.HAZELNowitcompiles!CODYPuuh.Thistookforever—onehour.HAZELLet'sseewhetheritworks.CODYC++isn'tHaskell.Theybothsmile.HAZELWewrotetheprogramsuchthatthewholestringismatched,sowedon'tneedtoprovidethestartandendmarkers^and$.�bashtimeforiin`seq1500`;doecho-na;done|\�..../re2"(a?){500}a{500}"matchreal0m0.092suser0m0.076ssys0m0.022sAh,that'sprettyfast,too.Let'spushittothelimit:�bashtimeforiin`seq15000`;doecho-na;done|\�..../re2"(a?){5000}a{5000}"Error...invalidrepetitionsize:{5000}CODYGoogledoesn'twantustocheckthisexample.Butwait.(Furrowshisbrow.)Let'scheat:�bashtimeforiin`seq15000`;doecho-na;done|\�..../re2"((a?){50}){100}(a{50}){100}"matchreal0m4.919suser0m4.505ssys0m0.062sHAZELNicetrick!Let'stryourprogram.Unfortunately,wehavetorecompileforn=5000,becausewecannotparseregularexpressionsfromstringsyet.Theyrecompiletheirprogramandruniton5000a's.�bashforiin`seq15000`;doecho-na;done|\�..../re5000+RTS-smatch...3MBtotalmemoryinuse...Totaltime20.80s(21.19selapsed)%GCtime83.4%(82.6%elapsed)...HAZELThememoryrequirementsarequitegoodbutintotalit'saboutvetimesslowerthanGoogle'slibraryinthisexample.CODYYes,butlookattheGCline!Morethan80%oftheruntimeisspentduringgarbagecollection.That'scertainlybecausewerebuildthemarkedregularexpressionineachstepbyshiftw.HAZELThisseemsinherenttoouralgorithm.It'swrittenasapurelyfunctionalprogramanddoesnotmutateonemarkedregularexpressionbutcomputesnewonesineachstep.Unlesswecansomehoweliminatethedataconstructorsoftheregularexpression,Idon'tseehowwecanimproveonthis.CODYAnewmarkofamarkedexpressioniscomputedinatrickywayfrommultiplemarksoftheoldexpression.Idon'tseehowtoeliminatetheexpressionstructurewhichguidesthecomputationofthemarks.HAZELOk,howabouttryinganotherexample?TheGoogleli-braryisbasedonsimulatinganautomatonjustlikeouralgo-rithm.Oursecondexample,whichcheckswhethertherearetwoa'swithaspecicdistance,isatoughnuttocrackforautomata-basedapproaches,becausetheautomatonisexponen-tiallylarge.THEOcuriouslyentersthescene.CODYOk,canwegenerateaninputstringsuchthatalmostallstatesofthisautomatonarereached?Then,hopefully,cachingstrategieswillnotbesuccessful.THEOIfwejustgeneratearandomstringofa'sandb's,thentheprobabilitythatitmatchesquiteearlyisfairlyhigh.Notethattheprobabilitythatitmatchesaftern+2positionsisonefourth.Weneedtogenerateastringthatdoesnotmatchatallandissufcientlyrandomtogenerateanexponentialnumberofdifferentstates.Ifwewanttoavoidthattherearetwoa'swithncharactersinbetween,wecangeneratearandomstringandadditionallykeeptrackofthen+1previouscharacters.Wheneverweareexactlyn+1stepsafterana,wegenerateab.Otherwise,werandomlygenerateeitheranaorab.Maybe,weshould...THEO'svoicefadesout.Apparently,heimmerseshim-selfinsomeproblem.CODYandHAZELstareathimforafewseconds,thenturntothelaptopandwriteaprogramgenrndwhichproducesrandomstringsofa'sandb's.TheyturntoTHEO.CODYTheo,we'redone!THEOOhhh,sorry!(Looksatthescreen.)CODYWecancallgenrndwithtwoparameterslikethis:�bash./genrnd56bbbaaaaabbbbbbbabaabbbbbbbaabbbbbbabbaabbbTheresultisastringofa'sandb'ssuchthattherearenotwoa'swith5charactersinbetween.Thetotalnumberofgeneratedcharactersistheproductoftheincrementedarguments,i.e.,inthiscase(5+1)(6+1)=42.THEOOk.Soifwewanttocheckourregularexpressionforn=20weneedtouseastringwithlengthgreaterthan220106.Let'sgeneratearound2millioncharacters.HAZELOk,let'scheckouttheGoogleprogram. �bashtime./genrnd20100000|./re2".*a.{20}a.*"WhiletheprogramisrunningCODYislookingataprocessmonitorandseesthatGoogle'sprogramusesaround5MBofmemory.nomatchreal0m4.430suser0m4.514ssys0m0.025sLet'sseewhetherwecanbeatthis.First,weneedtocompileacorrespondingprogramthatusesouralgorithm.TheywriteaHaskellprogramdist20whichmatchesthestandardinputstreamagainsttheregularexpression.*a.{20}a.*.Thentheyrunit.�bash./genrnd20100000|./dist20+RTS-snomatch...2MBtotalmemoryinuse...Totaltime3.10s(3.17selapsed)%GCtime5.8%(6.3%elapsed)...HAZELWow!ThistimewearefasterthanGoogle.Andourpro-gramusesonlylittlememory.CODYYes,andinthisexample,thetimeforgarbagecollectionisonlyabout5%.Iguessthat'sbecausetheregularexpressionismuchsmallernow,sofewerconstructorsbecomegarbage.THEOThisisquitepleasing.Wehavenotinvestedanythoughtsinefciency—atleastw.r.t.constantfactors—but,still,oursmallHaskellprogramcancompetewithGoogle'slibrary.HAZELWhatotherlibrariesarethereforregularexpressionmatch-ing?Obviously,wecannotusealibrarythatperformsback-tracking,becauseitwouldrunforeveronourrstbenchmark.Also,wecannotusealibrarythatconstructsacompleteau-tomatoninadvance,becauseitwouldeatallourmemoryinthesecondbenchmark.WhatdoesthestandardClibrarydo?CODYNoidea.Justasabove,thelightfadesout,thescreenbeingtheonlysourceoflight.CODYandHAZELkeepworking,THEOfallsasleeponhischair.Afterawhile,thesunrises.CODYandHAZELlooktired,theywakeupTHEO.CODY(addressingTHEO)Wewroteaprogramthatusesthestan-dardClibraryregexforregularexpressionmatchingandcheckeditwiththepreviousexamples.It'sinteresting,theper-formancediffershugelyondifferentcomputers.Itseemsthatdifferentoperatingsystemscomewithdifferentimplementa-tionsoftheregexlibrary.Onthislaptop—anIntelMacBookrunningOSX10.5—theregexlibraryoutperformsGoogle'slibraryintherstbenchmarkandtheHaskellprograminthesecondbenchmark–bothbyafactorbetweentwoandthree,butnotmore.Wetrieditonsomeothersystems,butthelibrarywasslowerthere.Also,whennotusingtheoptionRE2::Latin1inthere2programitrunsinUTF-8modeandismorethanthreetimesslowerinthesecondbenchmark.THEOAha.ACTIIISCENEI.INFINITEREGULAREXPRESSIONSHAZELsittingatherdesk.THEOandCODYatthecoffeemachine,eatingasandwich.CODYThebenchmarksarequiteencouragingandIlikehowele-ganttheimplementationis.THEOIlikeourworkaswell,althoughitisalwaysdifculttoworkwithpractitioners.(Rollshiseyes.)Itisapitythattheapproachonlyworksforregularlanguages.CODYIthinkthisisnottrue.Haskellisalazylanguage.SoIthinkthereisnoreasonwhyweshouldnotbeabletoworkwithnon-regularlanguages.THEOHowisthispossible?(Startseatingmuchfaster.)CODYWell,Ithinkwecoulddeneaninniteregularexpressionforagivencontext-freelanguage.Thereisnoreasonwhyouralgorithmshouldevaluateunusedpartsofregularexpressions.Hence,context-freelanguagesshouldworkaswell.THEOThat'sinteresting.(Finisheshissandwich.)Let'sgotoHazelanddiscussitwithher.THEOjumpsupandrushestoHAZEL.CODYispuzzledandfollows,eatingwhilewalking.THEO(addressingHAZEL)Codytoldmethatitwouldalsobepossibletomatchcontext-freelanguageswithourHaskellpro-gram.Isthatpossible?HAZELItmightbe.Let'scheckhowwecoulddenearegularexpressionforanynumberofa'sfollowedbythesamenumberofb's(fanbnjn�0g).CODYInsteadofusingrepetitionslikeinab,wehavetouserecursiontodeneaninniteregularexpression.Let'stry.�ghcileta=symw('a'==)�ghciletb=symw('b'==)�ghciletanbn=epsw`altw`seqwa(anbn`seqw`b)�ghcimatchwanbn""Theprogramdoesn'tterminate.^CTHEOItdoesn'twork.That'swhatIthought!HAZELYoushouldn'tbesopessimistic.Let'sndoutwhytheprogramevaluatestheinniteregularexpression.CODYIthinktheproblemisthecomputationof nalw.Ittraversesthewholeregularexpressionwhilesearchingformarksitcanpropagatefurtheron.Isthisreallynecessary?HAZELYoumeantherearepartsoftheregularexpressionwhichdonotcontainanymarks.Traversingthesepartsisoftensuper-uousbecausenothingischangedanyway,butouralgorithmcurrentlyevaluatesthewholeregularexpressioneveniftherearenomarks.CODYWecouldaddaagattherootofeachsubexpressionindi-catingthattherespectivesubexpressiondoesnotcontainanymarkatall.Thiscouldalsoimprovetheperformanceinthenitecase,sincesubexpressionswithoutmarkscanbesharedinsteadofcopiedbytheshiftwfunction.THEOI'dprefertousethetermweightswhentalkingaboutthesemiringimplementation.Whenyousaymarksyoumeanweightsthatarenon-zero.CODYRight.Letmechangetheimplementation.CODYleaves.SCENEII.LAZINESSTHEOandHAZELstillatthedesk.CODYreturns.CODY(smiling)Higuys,itworks.Ihadtomakesomemodica-tionsinthecode,butit'sstillaslickprogram.Youcancheckoutthenewversionnow.HAZELWhatdidyoudo? �ghciletbcsn=foldr1seq(bsn++csn)�ghcileta=symw('a'==)�ghciletabcn=a`seq`alt(bcsn)(abc(n+1))�ghciletanbncn=epsw`alt`abc1THEOFairlycomplicated!Canyoucheckit?CODYenterssomeexamples.�ghcimatchwanbncn""True�ghcimatchwanbncn"abc"True�ghcimatchwanbncn"aabbcc"True�ghcimatchwanbncn"aabbbcc"FalseTHEOGreat,itworks.Impressive!SCENEIII.REVIEWThethreeatthecoffeemachine.HAZELGoodthatyoutoldusaboutGlushkov'sconstruction.THEOWe'veworkedforquitealongtimeontheregularexpres-sionproblemnow,butdidwegetsomewhere?HAZELWell,wehaveacuteprogram,elegant,efcient,concise,solvingarelevantproblem.Whatelsedoyouwant?CODYWhatarewegonnadowithit?Isitsomethingpeoplemightbeinterestedin?THEOInditinteresting,butthatdoesn'tcount.Whydon'tweaskexternalreviewers?Isn'tthereaconferencedeadlinecomingupforniceprograms(smiling)?CODYandHAZEL(together)ICFP.THEOICFP?CODYYes,theycollectfunctionalpearls—elegant,instructive,andfunessaysonfunctionalprogramming.THEOButhowdowemakeourstoryafunessay?Thethreeturntotheaudience,brightsmilesontheirfaces!EPILOGUERegularexpressionswereintroducedbyStephenC.Kleeneinhis1956paper[Kleene1956],wherehewasinterestedincharac-terizingthebehaviorofMcCulloch-Pittsnerve(neural)netsandniteautomata,seealsotheseminalpaper[RabinandScott1959]byMichaelO.RabinandDanaScott.VictorM.Glushkov'spaperfrom1960,[Glushkov1960],isanotherearlypaperwhereregu-larexpressionsaretranslatedintonite-stateautomata,buttherearemanymore,suchasthepaperbyRobertMcNaughtonandH.Yamada,[McNaughtonandYamada1960].KenThompson'spa-perfrom1968isthersttodescriberegularexpressionmatching[Thompson1968].TheideaofintroducingweightsintoniteautomatagoesbacktoapaperbyMarcelP.Schützenberger,[Schützenberger1961];weightedregularexpressionscameuplater.AgoodreferencefortheweightedsettingistheHandbookofWeightedAutomata[Drosteetal.2009];oneofthepapersthatisconcernedwithseveralweightedautomataconstructionsis[AllauzenandMohri2006].Thepaper[CaronandFlouret2003]isoneofthepapersthatfocusesonGlushkov'sconstructionintheweightedsetting.WhatwenowadayscallGreibachnormalformisdenedinSheilaA.Greibach's1965paper[Greibach1965].Haskellisalazy,purelyfunctionalprogramminglanguage.Ahistoricaloverviewispresentedin[Hudaketal.2007].ThereareseveralimplementationsofregularexpressionsinHaskell[HaskellWiki].SomeofthesearebindingstoexistingClibraries,othersareimplementationsofcommonalgorithmsinHaskell.Incomparisonwiththeseimplementationsourapproachismuchmoreconciseandelegant,butcanstillcompetewithregardtoefciency.TheexperimentswerecarriedoutusingGHCversion6.10.4with-O2optimizations.TheGooglelibrarycanbefoundathttp://code.google.com/p/re2/,theaccompanyingblogpostathttp://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html.REFERENCESC.AllauzenandM.Mohri.AuniedconstructionoftheGlushkov,follow,andAntimirovautomata.InR.KralovicandP.Urzyczyn,editors,MathematicalFoundationsofComputerScience2006(MFCS2006),StaráLesná,Slovakia,volume4162ofLectureNotesinComputerScience,pages110–121.Springer,2006.P.CaronandM.Flouret.FromGlushkovWFAstorationalexpressions.InZ.ÉsikandZ.Fülöp,editors,DevelopmentsinLanguageTheory,7thInternationalConference(DLT2003),Szeged,Hungary,volume2710ofLectureNotesinComputerScience,pages183–193.Springer,2003.M.Droste,W.Kuich,andH.Vogler.HandbookofWeightedAutomata.Springer,NewYork,2009.V.M.Glushkov.Onasynthesisalgorithmforabstractautomata.Ukr.Matem.Zhurnal,12(2):147–156,1960.S.A.Greibach.Anewnormal-formtheoremforcontext-freephrasestructuregrammars.J.ACM,12(1):42–52,1965.HaskellWiki.Haskell–regularexpressions.http://www.haskell.org/haskellwiki/Regular_expressions.P.Hudak,J.Hughes,S.L.Peyton-Jones,andP.Wadler.AhistoryofHaskell:beinglazywithclass.InThirdACMSIGPLANHistoryofPro-grammingLanguagesConference(HOPL-III),SanDiego,California,pages1–55.ACM,2007.S.Kleene.Representationofeventsinnervenetsandniteautomata.InC.ShannonandJ.McCarthy,editors,AutomataStudies,pages3–42.PrincetonUniversityPress,Princeton,N.J.,1956.R.McNaughtonandH.Yamada.Regularexpressionsandstategraphsforautomata.IEEETransactionsonElectronicComputers,9(1):39–47,1960.M.O.RabinandD.Scott.Finiteautomataandtheirdecisionproblems.IBMjournalofresearchanddevelopment,3(2):114–125,1959.M.P.Schützenberger.Onthedenitionofafamilyofautomata.Informa-tionandControl,4(2–3):245–270,1961.K.Thompson.Programmingtechniques:Regularexpressionsearchalgo-rithm.Commun.ACM,11(6):419–422,1968.