/
Theconceptoflongestcommonprexescanbegeneralizedforsets:Denition1.7:F Theconceptoflongestcommonprexescanbegeneralizedforsets:Denition1.7:F

Theconceptoflongestcommonpre xescanbegeneralizedforsets:De nition1.7:F - PDF document

phoebe-click
phoebe-click . @phoebe-click
Follow
464 views
Uploaded On 2015-08-25

Theconceptoflongestcommonpre xescanbegeneralizedforsets:De nition1.7:F - PPT Presentation

Theorem110ThenumberofnodesintrieRisexactlyjjRjjLR1wherejjRjjisthetotallengthofthestringsinRProofConsidertheconstructionoftrieRbyinsertingthestringsonebyoneinthelexicographicalorderInitia ID: 114967

Theorem1.10:Thenumberofnodesintrie(R)isexactlyjjRjjL(R)+1 wherejjRjjisthetotallengthofthestringsinR.Proof.Considertheconstructionoftrie(R)byinsertingthestringsonebyoneinthelexicographicalorder.Initia

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Theconceptoflongestcommonpre xescanbegen..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Theconceptoflongestcommonpre xescanbegeneralizedforsets:De nition1.7:ForastringSandastringsetR,de nelcp(S;R)=maxflcp(S;T)jT2Rglcp(R)=XT2Rlcp(T;RnfTg)InthealgorithmforinsertingSintoR(Algorithm1.3),therearetwowhileloops.The rstloopfollowsexistingedgesaslongaspossibleandthesecondloopthencreatesthenecessarynewnodesandedges.Thenumberofroundsinthe rstloopisexactlylcp(S;R).ThusthenumberofnewnodesandedgesaddedisjSj�lcp(S;R).Basedonthisobservation,wewillnextderiveanexpressionforthesizeoftrie(R).Thisisnotdirectlybasedthevaluelcp(R);weneedamorere nedmeasure.18 Theorem1.10:Thenumberofnodesintrie(R)isexactlyjjRjj�L(R)+1,wherejjRjjisthetotallengthofthestringsinR.Proof.Considertheconstructionoftrie(R)byinsertingthestringsonebyoneinthelexicographicalorder.Initially,thetriehasjustonenode,theroot.Asobservedearlier,thenumberofnodesaddedwheninsertingSiisjSij�LCPR[i].Summingup,wegettheresult.Thevaluelcp(R)isperhapsaconceptuallysimplermeasurethanL(R).Thefollowingresultshowsthatitisasymptoticallyequivalent.Lemma1.11:L(R)lcp(R)2L(R).Theproofisleftasanexercise.WewilllaterseethatthearrayLCPRisusefulasanactualdatastructure.20 Example1.12:PathcompactedtriesforR=fpot$;potato$;pottery$;tattoo$;tempo$g. tery$tempo$attoo$ato$$opt tery$tempo$attoo$ato$$potTheegdelabelsarefactorsoftheinputstrings.Iftheinputstringsarestoredseparately,theedgelabelscanberepresentedinconstantspaceusingpointerstothestrings.Thetimecomplexityofthebasicoperationsonthecompacttrieisthesameasforthetrie(anddependsontheimplementationofthechildoperationinthesameway),butpre xandrangequeriesarefasteronthecompacttrie(exercise).22 TernaryTrieThebinarytreeimplementationofatriesupportsorderedalphabetsbutawkwardly.Ternarytrieisasimplerdatastructurebasedonsymbolcomparisons.Ternarytrieislikeabinarysearchtreeexcept:Eachinternalnodehasthreechildren:smaller,equalandlarger.Thebranchingisbasedonasinglesymbolatagivenposition.Thepositioniszero( rstsymbol)attherootandincreasesalongthemiddlebranches.Ternarytriehasvariantssimilarto-arytrie:Abasicternarytrieisafullrepresentationofthestrings.Compactternarytriesreducespacebycompactingbranchlesspathsegments.23 Aternarytrieisbalancedifeachleftandrightsubtreecontainsatmosthalfofthestringsinitsparenttree.Thebalancecanbemaintainedbyrotationssimilarlytobinarysearchtrees. dDEbABbABCdDECrotationWecanalsogetreasonablyclosetobalancebyinsertingthestringsinthetreeinarandomorder.25 Inabalancedternarytrieeachstepdowneithermovesthepositionforward(middlebranch),orhalvesthenumberofstringsremaininginthesubtree(sidebranch).Thus,inabalancedternarytriestoringnstrings,anydownwardtraversalfollowingastringSpassesatmostjSjmiddleedgesandatmostlognsideedges.Thusthetimecomplexityofinsertion,deletion,lookupandlcpqueryisO(jSj+logn).Incomparisonbasedtries,wherethechildfunctionisimplementedusingbinarysearchtrees,thetimecomplexitiescouldbeO(jSjlog),amultiplicativefactorO(log)insteadofanadditivefactorO(logn).Pre xandrangequeriesbehavesimilarly(exercise).26 ThefollowingtheoremshowsthatwecannotachieveO(nlogn)symbolcomparisonsforanysetofstrings(when=no(1)).Theorem1.14:LetAbeanalgorithmthatsortsasetofobjectsusingonlycomparisonsbetweentheobjects.LetR=fS1;S2;:::;Sngbeasetofnstringsoveranorderedalphabetofsize.SortingRusingArequires (nlognlogn)symbolcomparisonsonaverage,wheretheaverageistakenovertheinitialordersofR.Ifisconsideredtobeaconstant,thelowerboundis (n(logn)2).NotethatthetheoremholdsforanycomparisonbasedsortingalgorithmAandanystringsetR.Onlytheinitialorderisrandomratherthan\any".Anyordercouldbethecorrectorder,inwhichcaseanalgorithmthat rstchecksiftheorderiscorrectwouldneedtodoonlyO(n+L(R))symbolcomparisons.Anintuitiveexplanationforthisresultisthatthecomparisonsmadebyasortingalgorithmarenotrandom.Inthelaterstages,thealgorithmtendstocomparestringsthatareclosetoeachotherinlexicographicalorderandthusarelikelytohavelongcommonpre xes.28 Theprecedinglowerbounddoesnotholdforalgorithmsspecializedforsortingstrings.Theorem1.15:LetR=fS1;S2;:::;Sngbeasetofnstrings.SortingRintothelexicographicalorderbyanyalgorithmbasedonsymbolcomparisonsrequires (L(R)+nlogn)symbolcomparisons.Proof.Ifwearegiventhestringsinthecorrectorderandthejobistoverifythatthisisindeedso,weneedatleastL(R)symbolcomparisons.Nosortingalgorithmcouldpossiblydoitsjobwithlesssymbolcomparisons.Thisgivesalowerbound (L(R)).Ontheotherhand,thegeneralsortinglowerbound (nlogn)mustholdheretoo.Theresultfollowsfromcombiningthetwolowerbounds.NotethattheexpectedvalueofL(R)forarandomsetofnstringsisO(nlogn).Thelowerboundthenbecomes (nlogn).Wewillnextseethattherearealgorithmsthatmatchthislowerbound.SuchalgorithmscansortarandomsetofstringsinO(nlogn)time.30 Inthenormal,binaryquicksort,wewouldhavetwosubsetsRandR,bothofwhichmaycontainelementsthatareequaltothepivot.Binaryquicksortisslightlyfasterinpracticeforsortingsets.Ternaryquicksortcanbefasterforsortingmultisetswithmanyduplicatekeys(exercise).Thetimecomplexityofboththebinaryandtheternaryquicksortdependsontheselectionofthepivot(exercise).Inthefollowing,weassumeanoptimalpivotselectiongivingO(nlogn)worstcasetimecomplexity.32 Example1.18:Apossiblepartitioning,when`=2. al p habetal i gnmental l ocateal g orithmal t ernativeal i asal t ernateal l =) al i gnmental g orithmal i as al l ocateal l al p habetal t ernativeal t ernate Theorem1.19:StringquicksortsortsasetRofnstringsinO(L(R)+nlogn)time.Thusstringquicksortisanoptimalsymbolcomparisonbasedalgorithm.Stringquicksortisalsofastinpractice.34 RadixSortThe (nlogn)sortinglowerbounddoesnotapplytoalgorithmsthatusestrongeroperationsthancomparisons.Abasicexampleiscountingsortforsortingintegers.Algorithm1.20:CountingSort(R)Input:(Multi)setR=fk1;k2;:::kngofintegersfromtherange[0::).Output:RinnondecreasingorderinarrayJ[0::n).(1)fori 0to�1doC[i] 0(2)fori 1tondoC[ki] C[ki]+1(3)sum 0(4)fori 0to�1do//cumulativesums(5)tmp C[i];C[i] sum;sum sum+tmp(6)fori 1tondo//distribute(7)J[C[ki]] ki;C[ki] C[ki]+1(8)returnJThetimecomplexityisO(n+).Countingsortisastablesortingalgorithm,i.e.,therelativeorderofequalelementsstaysthesame.36 TheLSDradixsortalgorithmisverysimple.Algorithm1.21:LSDRadixSort(R)Input:(Multi)setR=fS1;S2;:::;Sngofstringsoflengthmoverthealphabet[0::).Output:Rinascendinglexicographicalorder.(1)for` m�1to0doCountingSort(R,`)(2)returnRCountingSort(R,`)sortsthestringsinRbythesymbolsatposition`usingcountingsort(withkireplacedbySi[`]).ThetimecomplexityisO(jRj+).Thestabilityofcountingsortisessential.Example1.22:R=fcat;him;ham;batg.cathimhambat=) hi m ha m ca t ba t =) h a mc a tb a th i m =) b at c at h am h im 38 MSDradixsortresemblesstringquicksortbutpartitionsthestringsintopartsinsteadofthreeparts.Example1.24:MSDradixsortpartitioning. al p habetal i gnmental l ocateal g orithmal t ernativeal i asal t ernateal l =) al g orithm al i gnmental i as al l ocateal l al p habet al t ernativeal t ernate 40 Theorem1.26:MSDradixsortsortsasetRofnstringsoverthealphabet[0::)inO(L(R)+nlog)time.Proof.Consideracallprocessingasubsetofsizek:ThetimeexcludingtherecursivecallbutincludingthecalltocountingsortisO(k+)=O(k).Theksymbolsaccessedherewillnotbeaccessedagain.Atmostlcp(S;RnfSg)+1symbolsinSwillbeaccessedbythealgorithm.ThusthetotaltimespentinthiskindofcallsisO(L(R)+n).Thisstillleavesthetimespentinthecallsforasubsetsofsizek,whicharehandledbystringquicksort.Nostringisincludedintwosuchcalls.Therefore,thetotaltimeoverallcallsisO(L(R)+nlog).ThereexistsamorecomplicatedvariantofMSDradixsortwithtimecomplexityO(L(R)+n+). (L(R)+n)isalowerboundforanyalgorithmthatmustaccesssymbolsoneatatime.Inpractice,MSDradixsortisveryfast,butitissensitivetoimplementationdetails.42