/
MSc Design and Analysis of Parallel Algorithms Supplementary Note  Analysing Parallel MSc Design and Analysis of Parallel Algorithms Supplementary Note  Analysing Parallel

MSc Design and Analysis of Parallel Algorithms Supplementary Note Analysing Parallel - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
589 views
Uploaded On 2014-12-15

MSc Design and Analysis of Parallel Algorithms Supplementary Note Analysing Parallel - PPT Presentation

We then consider the complications introduced by the introduction of parallelism and look at some proposed parallel frameworks Analysing Sequential Algorithms The design and analysis of sequential algorithms is a well developed 64257eld with a large ID: 24246

then consider the

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "MSc Design and Analysis of Parallel Algo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

AnalysingParallelAlgorithmsThesequentialworldbene tsfromasingleuniversalabstractmachinemodel(theRAM)whichaccurately(enough)characterizesallsequentialcomputersandfromasimplecriterionof\better"foralgorithmcompari-son(\lessisbetter",usuallyofruntime,andoccasionallyofmemoryspace).Thinkingparallel,weimmediatelyencountertwocomplications.Firstly,andfundamentally,thereisnocommonlyagreedmodelofparallelcomputation.Thediversityofproposedandimplementedparallelarchitec-turesissuchthatitisnotclearthatsuchamodelwilleveremerge.Worsethanthis,thevariationsinarchitecturecapabilitiesandassociatedcostsmeanthatnosuchmodelcanemerge,unlesswearepreparedtoforgocer-taintricksorshortcutsexploitableononemachinebutnotanother.Analgorithmdesignedinsomeabstractmodelofparallelismmayhaveasymp-toticallydi erentperformanceontwodi erentarchitectures(ratherthanjustthevaryingconstantfactorsofdi erentsequentialmachines).Secondly,ournotionof\better"eveninthecontextofasinglearchitec-turemustsurelytakeintoaccountthenumberofprocessorsinvolved,aswellastheruntime.Thetrade-o sherewillneedcarefulconsideration.Inthiscoursewewillnotattempttounifytheirretrievablydiverse.Thuswewillhaveasmallnumberofmachinemodelsandwilldesignalgorithmsforourchosenproblemsforsomeorallofthese.However,indoingsowestillhopetoemphasizecommonprinciplesofdesignwhichtranscendthedi erencesinarchitecture.Equally,insomeinstances,wewillexploitparticularfeaturesofonemodelwherethatleadstoanovelorparticularlye ectivealgorithm.Similarly,wewillinvestigatenotionsof\better"astheyhavebeentraditionallyde nedinthecontextofeachmodel.Wewillcontinuetoemploythenotationofasymptoticanalysis,butnotethatwemustbeparticularlywaryofconstantfactorsintheparallelcase-a\constantfactor"discrepancyof32inanasymptoticallyoptimalalgorithmona64processormachineisaseriousmatter.2 ofsuchstepsrequiredandusuallyexpressedasafunctionofproblemsizenandp(whichmayitselfbeexpressedasafunctionofn).Forexample,considertheproblemofsumminganarrayofnintegers.WiththeCRCW-AssociativePRAMwehaveasimplenprocessorsingletimestep(orasymptotically(1)time)algorithm-eachprocessorwritesadistinctarrayelementtothe\sum"locationandtheclashresolutionmechanism(with+astheassociativeoperator)doestherest.Bycontrast,intheEREWvariantanobviousapproachistousen 2processorstoadddistinctpairsinthe rststep,thenn 4oftheseprocessorstoadddistinctpairsofresultsinthesecondstep,andsoon.Thisprocesscontinuesfor(logn)stepsuntilthe naltwosub-totalsaresummedintotheintendedsumlocationbyasingleprocessor.Aswellasabsolutespeed,asigni cantfocusinterestconcernsthedesignof\cost-ecient"or\cost-optimal"PRAMalgorithms.De nition1ThecostofaparallelalgorithmistheproductofitsruntimeTpandthenumberofprocessorsusedp.AparallelalgorithmiscostoptimalwhenitscostmatchestheruntimeofthebestknownsequentialalgorithmTsforthesameproblem.ThespeedupSo eredbyaparallelalgorithmissimplytheratiooftheruntimeofthebestknownsequentialalgorithmtothatoftheparallelalgorithm.ItseciencyEistheratioofthespeeduptothenumberofprocessorsused(soacostoptimalparallelalgorithmhasspeeduppandeciency1(or(1)asymptotically).Forexample,thesequentialruntimeof(comparisonbased)sortingisknowntobe(nlogn).AcostoptimalparallelsortingalgorithmmightuseO(n)processorsforO(logn)time,orOn lognprocessorsforOlog2ntime.Ontheotherhand,anOn2processor,constanttimesortingalgo-rithmwouldbefasterthanbothofthese(givenenoughprocessors)butnotcost-optimal.Thesigni canceofcostoptimalityisthatitimpliesgoodscalabilitydowntosmallersizedmachines.ItisnotdiculttoseethataPRAMalgorithmforsayn2processorscanbeemulatedonnprocessorswithacorrespondingslow-downofafactorofn(eachabstracttimestepisemulatedbynrealtimestepsinwhicheachprocessorplaystheroleofnimaginaryprocessors).This4 (thoughnotnecessarilyhowtoexpressitasanalgorithm).Withalittlemorethoughtwecanadaptouralgorithmtoproduceanasymptoticallyoptimalvariant.Thetrick(whichwillbeapplicableinmanysituations),istohaveasmallernumberofprocessorseachdosomeoftheworksequentiallyandoptimally,toimprovethecost-eciencytotheextentthatwecanhidealessecientsecondphaseintheO()notation.Inthiscase,Brenttellsusthatweshouldworkwithn lognprocessors.Ifeachofthesesumslognitemssequentially(in(logn)time),andthenco-operatesintheoriginalparallelsummationapproach(butnowwithfeweritemsandsteps),thenwestillhavea(logn)timealgorithm,butonewhichisnowcostoptimal.Strictlyspeaking,neitherround-robinschedulingnorBrent'stheoremapplytoCRCW-associativePRAMalgorithms,sincebreakingtheworkofwhatwasasinglestepsacrossseveralstepscanchangetheprogram'sbehaviour(forexample,thinkaboutoursinglestepsummationalgorithm).However,thetechniquescanbeadaptedtoapplytoeventhismostpowerfulmodel,withonlyasmallconstant-factorincreaseintime(andsonochangeasymptotically).ThechoiceofPRAMvariantcanhaveanimpactontheruntimewhichcanbeachievedformanyproblems.Forexample,thefollowingCRCW-Associative(+)algorithmallowsconstant-timecomparisonbasedsortingofnitemswithn2processors.Thisisnotpossibleinanynon-concurrent-writevariant(andcouldbearguedtocallintoquestionthepracticalityofthismodel).fori=0ton-1doinparallelforj=0ton-1doinparallelif(A[i�]A[j])or(A[i]=A[j]and�ij)thenwins[i]=1;/*exploitingconcurrentwrites*/elsewins[i]=0;fori=0ton-1doinparallelA[wins[i]]=A[i];/*writestodistinctlocations*/Noticethatthesecondclauseintheconditionalbreakstiesbetweendupli-catedvalues,ensuringthateachentryhasadistinctnumberofwins.6