/
Multilev el ILU with reorderings for diagonal dominance ousef Saad No em er   Abstract Multilev el ILU with reorderings for diagonal dominance ousef Saad No em er   Abstract

Multilev el ILU with reorderings for diagonal dominance ousef Saad No em er Abstract - PDF document

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
531 views
Uploaded On 2014-12-14

Multilev el ILU with reorderings for diagonal dominance ousef Saad No em er Abstract - PPT Presentation

The nonsymmetric erm utation exploits greedy strategy to put large en tries of the matrix in the diagonal of the upp er leading submatrix The metho can regarded as complete piv oting ersion of the incomplete LU factorization This leads to an eectiv ID: 23692

The nonsymmetric erm utation

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Multilev el ILU with reorderings for dia..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

MultilevelILUwithreorderingsfordiagonaldominance¤YousefSaadyNovember16,2004AbstractThispaperpresentsapreconditioningmethodbasedoncombiningtwo-sidedpermutationswithamultilevelapproach.Thenonsymmetricpermutationexploitsagreedystrategytoputlargeentriesofthematrixinthediagonaloftheupperleadingsubmatrix.ThemethodcanberegardedasacompletepivotingversionoftheincompleteLUfactorization.Thisleadstoane®ectiveincompletefactorizationpreconditionerforgeneralnonsymmetric,irregularlystructured,sparselinearsystems.Keywords:IncompleteLUfactorization,ILUT,multilevelILUpreconditioner,Krylovsubspacemethods,multi-elimination,completepivotingAMSsubjectclassi¯cations:65F10,65N06.1IntroductionInrecentyears,preconditionedKrylovsubspacemethodshavemadegoodprogresstowardtheiracceptanceasgeneralpurposetechniquesforsolvingirregularlystructuredsparselinearsystems.Onemaydistinguishbetweentwocategoriesoflinearsystemsthataretackledbysuchmethods.Firstistheclassoflinearsystemswhichoriginatefromthediscretizationofpartialdi®erentialequations.Initially,preconditionedKrylovmethodswereonlyattemptedonproblemsofelliptictype,forwhichagoodbodyoftheoryexisted.Becauseoftheirsuccess,thesemethodshaveincreasinglybeenemployedonsystemsarisingfromvarious°owproblems,StokesandNavierStokesequations,Maxwell'sequations,Helmholtzequations,andothertypesofmodelswhichleadtoinde¯nitelinearsystems.ThesecondcategoryoflinearsystemsincludesproblemswhicharisefromapplicationswhicharenotgovernedbyPartialDi®erentialEquations.Anarchetypeofthisclassofproblemsisonethatoriginatedfrompowernetworks,see,e.g.[21,18],andwhichcanbeconsideredinsomewaystobeattheoriginofirregularlystructuredsparsematrixproblemsasweknowthemtoday.Nowadays,theseproblemsareconsideredsmallandareoftenbesthandledbysparsedirectsolvers.However,therearesomewhatrelatedproblemsincircuitsimulation[13,38],whichcausedi±cultiestolinearsolvers.Onemayaskwhetherthedistinctionbetweenthesetwoclassesofproblemsisimportantforagivensystemsolver.Fordirectsolvers,theoriginoftheproblemhasnodirectimpactonthesolutionalgorithm,exceptthatits`geometric'naturecan¤ThisworkwassupportedbyNSFgrantsACI-0305120andINT-0003274,andbytheMinnesotaSupercomputerInstitute.yDepartmentofComputerScienceandEngineering,UniversityofMinnesota,4-192EE/CSBuilding,200UnionStreetS.E.,Minneapolis,MN55455.E-mail:saad@cs.umn.edu.1 in°uencecomplexity.Thegeometryoftheunderlyinggraphmaymattermorethanwhetherornottheproblemislikelytobehighlyinde¯nite,asitmayhaveamajorimpacton¯ll-in.Ingeneral,however,theprevailingconsensussofarhasbeenthatiterativemethodsarenotassuccessfulforthissecondclassofmethodsastheyhavebeenforthePDE-basedproblems.Asthesizesofthelinearsystemsarebecominglargerandmorechallenging,thisissuehasresurfacedrecentlyandafewmethodshavebeendevelopedtoincreasetherangeofapplicabilityofiterativemethods.TheworkofOlschowkaandNeumaier[33]inparticular,showedthatitispossibletoessentiallyperform`static'pivoting,priortoGaussianelimination.Otherwisestated,itispossibletopermutetherowsofamatrixinsuchawaythatinmanycasespivotingbecomesunnecessaryduringtheeliminationprocess.OlschowkaandNeumaierindicatedthatthiscouldbehelpfulinanincompleteLUfactorizationaswell.TheworkbyDu®andKosterdescribedinthearticles[23,24],followedthisideafurtherbydevelopinge±cientcodes,andshowingthatthemethodcanbequitee®ectiveinsomecases.Otherarticleshavesincetestedtheideaonvariousproblems,andreportedexcellentsuccess,seeforexample[3,38].Thispaperproposesareorderingmethodthatissimilarinspirittotheonedescribedin[23,24].Whilebothmethodsarebasedonimprovingdiagonaldominancetheonepresentedinthispaperisessentially`dynamic'insteadof`static'.Itproceedsbylevelswherebyeachlevelcorrespondstoablockwhichhasbeenextractedfromtheoriginalmatrixbypermutingitsrowsandcolumnsinanonsymmetricway.2ThequestforrobustILUsSeveraldistinctpathshavebeentakentoimprovetherobustnessofpreconditionersforirregularlystructuredlinearsystems.The¯rst1istoaddpartialpivotingtotheincompleteLUfactorizationwiththreshold(ILUT),see[35].ThestandardILUTtechniqueoftenfailsforirregularlystructuredmatrices.PartialcolumnpivotingcanbeaddedtoILUTandthisyieldstheILUTPalgorithm,[35].Pivotingdoeshelpbutthereisnoguaranteethatafactorizationwillevenbeproducedinsomecases.Oneofthemostcommoncasesoffailureresultsfromtheappearanceofzerorowsduringfactorizationcausedbythecombinationofpermutationsanddropping.However,ILUTPcanworkratherwellifsubstantial¯ll-inisallowed{butthecostofthefactorizationmaybecomeprohibitive.Thesecondclassofmethods,whichbecamepopularinrecentyears,avoidsILUfactorizationsaltogetherandutilizesinsteadoneofseveralmethodstocomputeanapproximationtotheinverseofAdirectly,see,e.g.,[5,7,8,15,27,29].Initially,these\approximateinversemethods"havebeenprimarilydevelopedfortheirparallelism.However,theyareoftenviewedasrobustalternativestoILUfactorizationpreconditioners,seee.g.,commentsin[4,6].Thoughthefactorsproducedbythesemethodsdonotsu®erfromtheinstabilitywhichplaguesincompletefactorizations,theytendtorequiremorememoryandtocostmoretobuild.Athirdapproachistouseaformofpreorderingwhosegoalistomovelargeentriesofthematrixtothediagonal.Inthismethod,thematrixisorderedononeside(e.g.,rowsonly).ThenpivotingisobviatedduringGaussianelimination[24,23].ThecombinationofILU-typepreconditionerswithsuchareorderingstrategyhasbeenshowntobequitee®ective[3,24,23,38].1Theorderisnotchronological.2 ThelastclassofmethodswewillmentionhasbeendevelopedfollowingtheworkbyBollhÄofer[9]whichaddressestheproblemofrigorousdroppinginILU-typemethods.ThesemethodsarefoundedontheintimaterelationshipsbetweenILUfactorizationsandapproximateinversetech-niques[10].SmalltermsareusuallydroppedinILUeitherbyconsideringtheirlocation(positionaldropping)ortheirmagnitude(thresholddropping)insomerelativesense.The¯rstapproachworkswellforregularlystructuredproblemsarisingfromPDEsbutitisnotdesignedforthemoregeneralcase.ThesecondisgenerallyapplicablebutthegreedycriteriononwhichitisbasedfocussesonmakingLUclosetoAinsteadofmakingthepreconditionedmatrixclosetotheidentity.Itisoftenobservedthattheresultingfactorsareveryill-conditionedandthisleadstopoorperformance.Abetterstrategywasde¯nedin[9]whichexploitsconditionnumberestimatorsfordroppingtermswhoseremovalisleastlikelytobedetrimentaltothepreconditioner.ThisstrategywascombinedwithaCroutimplementationofILUin[31].Incontrastwithsimilarexistingstrategiesthemethoddiscussedinthispaperdoesnotattemptto¯llthewholediagonaloftheoriginalmatrixwithlargeentries.Instead,amoreprogressive,multilevel,approachistakenwherebytheunknownsassociatedwiththeupperleadingsubmatrixareeliminatedandtheprocessisrepeatedonthereducedsystem.AnalternativeviewoftheprocessisthatofablockcompletepivotingILU.Viewedfromthisangle,theprocessconsistsof¯ndingtwopermutationsPandQsuchthatPAQT=µBFEC¶inwhichtheBmatrixhaslargediagonalentries.TheunknownsassociatedwiththeBblockcannowbe\safely"eliminatedandtheprocessrepeatedonthereducedsystem.Preconditioningtechniquesdevelopedinthispaperobtainapproximatefactorizationsderivedfromthisstrategy.ThemethodcanalsoberegardedasanonsymmetricversionoftheAlgebraicRecursiveMultilevelSolver,whichusessymmetricpermutations(i.e.,P=Q).Numericalexperimentsindicateamarkedimprovementinrobustnessoverusingsymmetricpermutations.Notethatintheidealcase,BwouldbeofthesamesizeasA.Onecanthink,forexample,ofanalgorithmthatwouldyieldaBmatrixthatisdiagonalandcontainsmostofthelargeentriesofthematrixA.However,forcingBtobediagonalcouldseverelylimititssizeingeneral,resultinginanexpensiveoption.Wenowbrie°ydiscusspracticalissuessurroundingpivotingforincompletefactorization.Whenarowversionofdelayed-updateGaussianelimination(thesocalledIKJversion,see[26,p.99])isused,itisrelativelyeasytoincorporatecolumnpivoting.Sincethepivotingislocal,i.e.,itisoblivioustotherowsbelowtheworking(k-th)row,droppingmayleadtoundesiredresultsinlaterstepsduringtheILUfactorization.Forexample,wealreadymentionedthecommonoccurrenceofzerorowsduringtheprocess,especiallywhenlowlevelsof¯llareused.Whenitworks,theresultingLUfactorizationiscapableofsolvingmoreproblemsingeneral,butthelossofstructureduetopivotingresultsinamuchmoreexpensivealgorithm,especiallyintermsofmemoryusage.Incorporatingpartialpivotinginthemorecommonrank-oneupdatevariant(so-calledKIJvariant)orotherformsofincompleteLUfactorizationsdoesnotseemtohavebeenundertaken.However,themoreinterestingquestiononemightaskiswhetherornotaformofcompletepivotingispracticalinthecontextofILU.Searchingforthelargestentryintheworkingsubmatrixateach3 stepseemstobeexceedinglyexpensiveandimpractical.Ontheotherhand,ifthiswerepossible,itwouldpotentiallyenabletodropmoreterms.Thisisbecausetheresultingfactorsaremorelikelytobestable(thenormsoftheirinversesarenottoolarge)andthiswillcauseamildere®ectonthetermsdropped,seeforexample[9,11].Asimpli¯edargumentofthisfactisasfollows.LetA=LU+EwhereErepresentsthematrixofthetermsthathavebeendroppedfromthefactorization.Thenthesplit-preconditionedmatrixissuchthat,L¡1AU¡1=I+L¡1EU¡1:ThisshowsthatthelargerkL¡1kandkU¡1k,thesmallerthedroppedtermsmustbeinordertoobtainapreconditionedmatrixthatisnottoofarfromtheidentity.Tocopewiththeissueofcost,themethodpresentedinthispaperessentiallyresortstoablockpivotingapproachthatperformsonlyanapproximateversionofcompletepivoting.3MultilevelILUpreconditionersAcommonmethodusedtoobtainapreconditionerisviaablockIncompleteLUfactorization,whichinvolvesanapproximateGaussianeliminationprocessbasedonseparatingtheoriginalunknownsintoa\coarse"anda\¯ne"set.ThesetermsareborrowedfromtheAMGliterature.Intheclassofpreconditionersdevelopedin[14,37,1,36]theideaofindependentsetsor\group-independent"sets[25,32,30,17,34]isexploitedtode¯nethispartition.BlockIndependentsetorderingspermutetheoriginallinearsystemAx=bintotheformµBFEC¶µuy¶=µfg¶(1)inwhichthesubmatrixBisblockdiagonal.TheILUM[34]factorizationutilizesstandardinde-pendentsetorderingswhichresultinamatrixBthatisdiagonal.Inthissituation,itiseasytoeliminatetheuvariabletoobtainasystemwithonlytheyvariable.Thecoe±cientmatrixforthis`reducedsystem'istheSchurcomplementS=C¡EB¡1Fwhichisstillsparse,ingeneral2.Thisideawasusedrecursively,applyingdroppingtoStolimit¯ll-in,andreorderingtheresultingreducedsystemintotheaboveformviaindependentsets.Thisisrepeatedforafewlevelsuntilthesystemissmallenough,oramaximumnumberoflevelsisreached.Thenthesystemissolvedbysomeother,standard,meanssuchasadirectsolveroranILUT-GMREScombination.3.1Two-sidedpermutationsThealgorithmsdescribedinthispaperdonotrequireBtohaveanyparticularstructure.Instead,therowsandcolumnsofAarepermuted,usingtwodi®erentpermutationsP(rows)andQ(columns)whichwilltransformAintotheformPAQT=µBFEC¶(2)inwhichthematrixBhasthepropertyofbeingasdiagonallydominantaspossible,inawaythatisyettobespeci¯ed.2Forexample,whenBisdiagonal,EhasnomorethannzEnonzeroelementsperrow,andFhasnomorethannzFnonzeroelementsperrow,thenEB¡1FwillhavenomorethannzE£nzFnonzeroentriesperrow.4 Atthel-thleveloftheprocedure,thecoe±cientmatrixisreorderedasin(2)andthefollowingblockfactorizationiscomputed`approximately'(subscriptscorrespondingtolevelnumbersareintroduced):PlAlQT=µBlFlElCl¶¼µLl0ElU¡1lI¶£µUlL¡1lFl0Al+1¶;(3)whereBl¼LlUl(4)Al+1¼Cl¡(ElU¡1l)(L¡1lFl):(5)Whendescribedwiththehelpofrecursivity,theprocedureconsistsessentiallyofthreesteps.First,anorderingofthematrixintheform(2)isobtainedandapplied;Then,anILUfactorizationBl¼LlUlforBliscomputed.Finally,approximationstothematricesL¡1lFl,ElU¡1l,andthenAl+1areobtained.TheprocessisrepeatedrecursivelyonthematrixAl+1untilaselectednumberoflevelsisreached.Atthelastlevel,asimpleILUTfactorization,typicallywithpivotingandveryhigh¯ll-inlevelscanbeapplied.TheAi'swillnolongerremainsparseingeneral,sinceBisnolongerassumedtobediagonalorblock-diagonal.SparsityisregainedbydroppingsmalltermsinvariouswaysintheconstructionofLl;Ul,L¡1lFl;ElU¡1l,andAl+1.NotethatthematricesElU¡1l;L¡1lFlortheirapproximationsGl¼ElU¡1l;Wl¼L¡1lFl(6)whichareusedtoobtaintheSchurcomplementvia(5)neednotbesaved.TheyarecomputedonlyforthepurposeofobtaininganapproximationtoAl+1.Oncethisisaccomplished,theyarediscardedtosavestorage.SubsequentoperationswithL¡1lFlandElU¡1lareperformedusingUl;LlandtheblocksElandFl.Therationaleoftheorderingsthatwillbeproposednextisthatitiscriticaltohaveanaccurateandwell-conditionedBblockinthemultilevelfactorizationdiscussedabove.Alternatively,wecanalsothinkintermsofcompletepivoting.AssumeforamomentthatBisa1£1block.Then,clearlyonecanthinkofselectingthepermutationsPandQinsuchawayastomovethebestpossiblepivotfromthematrixtolocation(1;1).The¯rstrowcanthenbeeliminatedandtheprocesscouldthenberepeatedfortheremaining(n¡1)£(n¡1)matrix.Unfortunately,thisprocedureistoocostlybecauseofthesearchrequiredto¯ndthebestpivotateachstep.Amoree®ectivealternativeistoselectthebestkpivotsatthesametime,soBnowbecomesablockofsizek.ThenextsectiondiscussesafewheuristicswhichusediagonaldominanceasacriterionforselectingtheBblock.3.2TheblockcompletepivotingviewpointWebeginbyintroducingthenotationandterminologyusedindescribingthealgorithms.Apairofpermutations(P;Q)iscompletelyde¯nedfromthecorrespondingtwopermutationsfp1;p2;:::;pngfq1;q2;:::;qngofthesetf1;2;:::;ng.Sinceweusepartialpermutations(whichde¯netheblockB),wede¯neamatchingsetMasasetofnMpairs(pi;qi)wherenM·nand1·pi;qi·n;fori=1;:::;nMandpi6=pj;fori6=jqi6=qj;fori6=j:(7)5 NotethatthecasewhennM=ncorrespondspreciselytoa(full)permutationpair(P;Q).Apartialmatchingsetcanbecompletedintoacompletematchingsetbysimplyscanningthesetofpi'sandaddingentriesfromthesetf1;2;:::;ngwhicharemissing,andthenrepeatingtheprocessfortheqi's.Thefunctionreturningthenumberofnonzeroentriesinaroworcolumnisdenotedbynz(:),soforexamplenz(ai;:)isthenumberofnonzeroelementsinthei-throwofAandnz(a:;j)isthenumberofnonzeroelementsinthej-thcolumnofA.Forlackofabetternamewewillrefertotheclassofreorderingalgorithmstobedescribedinthissectionas`ddPQ'orderings(P,Qstandforthepermutationsusedand`dd'indicatesthattheidealpermutationwouldyieldadiagonallydominantBblock).Theseareheuristicmethodswhichconsistoftwostages.Ina¯rststage,candidateentriesaijare`preselected'aspossiblediagonalentries.Thisprocedurewillalsoorderthecandidateentriesthatareselectedbyusingacertainmeasurewhichwillattempttouseacriterionofdiagonaldominanceandthenumberofnonzeroentries.ThesecondstagescansthesecandidateentriesinordergivenbythepreselectionstageanddecidesonacceptingorrejectingtheentryintotheMset.Thesimplestpreselectionalgorithmisbasedonusingtheratiosjaijjkai;:k1:Acriterionusingratiosofthistypeforthediagonalentries(i=j)wasalsousedinanearlyversionofARMS[36]for¯lteringequationsthatareleastdiagonaldominantinasymmetricpermutationsetting.Letj(i)beacolumnindexofthelargest(inabsolutevalue)entryofrowi(jai;j(i)j¸jaijjforallj).Theratiojai;j(i)jkai;:k1(8)canbeviewedasameasureofrowwisediagonaldominanceofrowiiftheterm(i;j(i))ispermutedasadiagonalentry.Theselectionofai;j(i)asadiagonaltermwouldresultina(weakly)diagonallydominantrowiftheaboveratiois¸1=2.Inpractice,wecanscreenoutallentriesforwhichtheseratiosarenogreaterthanacertainthreshold¿.Inasecondstep,thepreselectionalgorithmwillrankallthepairs(i;j(i))inpreparationforthenextphaseofmatching,wherethepermutationsPandQareactuallybuilt.Therankingwillsimplyyieldagoodorderinwhichtoscanthelistofpairswhendeterminingwhichonesarevalidorbesttokeep.Thesimplestpossibilityistosorttheselectedpairs(i;j(i)bydecreasingvalueoftheratios(8).Thiswouldfavordiagonallydominantrowstobescanned¯rst.However,thisrankingdoesnottakesparsityintoaccount.Forexample,adenserowmaybeselectedbecauseofahighdiagonaldominanceratio,andthiswillyieldafullSchurcomplementmatrix.Toremedythis,wemultiplytheaboveratiosbyaquantitythatwillencouragesparserrowstobeselected¯rst.Severalsuchmeasuresweretestedandtheyleadtosimilarresults.Thefollowingmeasurejai;j(i)jkai;:k1£1nz(ai;:)(9)giveshigherprioritytorowswithgooddiagonaldominancecharacteristicswhichareatthesametimesparse.Otherpossibilitiestestedincorporatedthevaluenz(a:;j(i))orboththevaluesnz(a:;j(i))andnz(ai;:).Thefollowingalgorithmusestheratios(9).6 Algorithm3.1Preselection1.SetD=;2.Fori=1;:::;ndo3.Computeti=kai;:k1,andj(i)=argmaxkjaikj4.if(jai;j(i)j�¿ti)add(i;j(i))toD5.EndDo6.Foreach(i;j(i))2DDo:7.Computewi=jai;j(i)jkai;:k1£1nz(ai;:)8.EndDo9.SortDbydecreasingorderoftheweightwi:D=f(i1;j1);(i2;j2);:::;(inD;jnD)gwith:wi1¸wi2¸:::¸winD.Inordertoavoidsituationsinwhichthepreselectionalgorithmreturnswithanemptyset,wescalethethreshold¿relativetothelargestoftheseratios.So,givenaninputparameter¿0,theactualthreshold¿usedinthealgorithmis¿=¿0£maxi·jai;j(i)jkai;:k1¸:(10)Thisensurethattheworstrowsonarelativebasiswillberejected.Theparameter¿0shouldbebetween0and1andtypicalvaluesareintherange0.1{0.3.If¿0isstrictlylessthanone,thenatleastonerowwillbepreselected.ThealgorithmbeginsbycomputingtherequirednormsandtheninLine4,itpre-screenstherowsbykeepingonlythosethatdoshowenoughdiagonaldominance(relativetootherrows).Thisinitialrejectionofpoorcandidaterowswillreducethecostofthenextphasewhichinvolvesasort(Line9.)Asdiscussedabove,inordertoreduce¯ll-in,thediagonaldominanceratiosaremodi¯ed.Theyaredividedbythenumberofnonzeroelementsinrowi(seeLine7).ThelaststageofthealgorithmsortstheentriesinDindescendingorderoftheresultingratios.Theabovealgorithmselectsgoodcandidatesfordiagonalentriesaccordingtoasimplediagonaldominancecriterion.TocompletetheprocesswenowneedtoselectpairstokeepinM.Wecould,forexample,scanallpairsreturnedbyAlgorithm3.1andkeepallthepairsaslongasthattheydonotviolatetherequirementinthede¯nition(7).Clearly,thelistshouldbetraversedinthesameorderinwhichitisprovidedbyAlgorithm3.1.Therefore,thesimpleststrategyisagreedytechniquewhichscansthesetDandacceptsanypair(pi;qi)suchthatneitherpinorqihavealreadybeenincludedinapairofM.Algorithm3.2Greedymatchingsetselection0.DeterminetheorderedsetDbyapreselectionalgorithm.Setcount=0.1.SetM=;;SetP(i)=Q(i)=¡1fori=1;:::;n2.Fork=1;:::;nDDo:3.Get(ik;jk)thek-thpairinD4.If(Q(jk)==¡1):5.Add(ik;jk)toM:(a)count=count+16.(b)setP(ik)=count;Q(jk)=count;7 7.EndIf8.EndForNotethatnormallywewouldalsorequirethatP(ik)==¡1inLine4,butthisisunnecessarysincethereisonlyone(ik;jk)selectedforeachrowandthereforeP(ik)canbeassignedapositivevalueonlyonce.Atthecompletionofthealgorithm,thematchingsetisthesetofallpairs(i;j)suchthatP(i)andQ(j)arebothnon-negative.NotealsothatP(:)representsanincompletereversepermutationarray.Ifcompleteditwouldsimplybethe`old-to-new'usualmappingfortherowpermutations.Similarly,theQarrayrepresentsanincomplete`old-to-new'mappingforthecolumns.Theonlyremainingstepnowistocompletethesetwopermutations.Thisisachievedbyasimplegreedyproceduresuchasthefollowingone,appliedheretoP.1.Fori=1:nDo:2.if(P(i)0)3.count=count+14.P(i)=count;5.EndIf6.EndForNotethat,initially,countmusthavethevaluereturnedbythematchingalgorithm.ThesameprocedureasaboveisappliedtocompletethepermutationQ.Wewillrefertothisprocedureasthe`greedycompletion'algorithm.ItshouldbementionedthatthissimpleprocedurewillallowzeroentriestobeintroducedinthediagonaloftheCblock.Thisisnotaproblem,fortworeasons.First,itistheSchurcomplementmatrixthatcountsandwhenitisbuilt,¯ll-insarelikelytobeintroducedintoitsdiagonal.Second,thisblockwillbefurtherpermutedanyway,eitherbytherecursiveapplicationoftheprocesstothenewlevelcreated,or,ifthelastlevelisreached,bycolumnpivotingintheILUTPprocedure,orthefullGaussianelimination,whensolvingthesystem.ThisprocessisillustratedinFigure1.Inthe8£8matrixshownontheleftofthe¯gure,thecircledentriesaretheonespreselectedbythealgorithm,withthenumbersinthecirclesindicatingtheranksattributedbythesortingstepinLine9ofthealgorithm.Theentriesshowninshadedsquaresaretheothernonzeroelementsofthematrix.Observethat,byitsnature,thealgorithmclearlyselectsatmostoneentryperrow.However,theremaybemorethanoneselectedentrypercolumn.Ifthegreedymatchingalgorithmisapplied,itwillendupaccepting5pairs,andthestatusofthearraysP;Qattheendisasfollows:P5-132-141-1Q5-1-1132-14Thesearethencompletedby`greedycompletion'intoP56327418Q56713284Oncethepermutationisfound,thematrixisreorderedintotheform(2).Thethickdashedlinesinthematrixontherightsideofthe¯gurecorrespondtothepartitioningintheright-handsideof(2).NotethatthereisnoparticularstructureassociatedwiththeblockB.Inthenextblock-eliminationstep,equationsfrom1to5(¯rstlevel)ofthepermutedmatrixareeliminatedand8 21235478642837461512123547864214576831Figure1:An8£8matrixafterthepreselectionalgorithm(leftside)andtheresultofthenonsym-metricpermutationobtainedfromthegreedymatchingprocedure.Circlednumbersareselectedentriesandthenumbersaretheranksattributedbythepreselectionalgorithm(right).droppingisappliedtotheresulting3£3Schurcomplementmatrix.Thismeansthatwecomputethefactorization(3).ThisentailscomputingtheILUfactorizationofB(equation(4)),andthencomputinganapproximationtotheSchurcomplement,asshownin(5).Ifdesired,theprocesscannowberepeatedonthenextlevel,i.e.,ontheSchurcomplementmatrixAl+1.Theouter(Gaussian)eliminationprocedurestopswhenevereitheramaximumnumberoflevelsisexceededortheSchurcomplementbecomessmallenough.Assumethatfortheaboveexample,itisdecidedthatonlyonelevelisperformed.Thenitonlyremainstosolvethislast(3£3)linearsystem.Inourcode,thisisdonewithapivotingILUTPproceduretypicallywithverylittledropping(asdictatedbyaparameter).Forallpurposes,onecanthinkofthisasanaccuratesolvewithGaussianeliminationwithcolumnpivoting.ILUTPisselectedforthecaseswhenthelastblockisnotsmallenough,inordertoprovidemoreoptionstotheprocedure.NotethatonecaneasilyobtainasingularSchurcomplementmatrixSafterdroppingisapplied.Ifnodroppingisappliedthenitiswell-knownthatSwillbenonsingularaslongastheoriginalmatrixandBarebothnonsingular.AnimportantpointhereisthatthepermutationsP;QthataresoughtatagivenleveldonothavetotakeintoconsiderationthequalityoftheresultingSchurcomplementAl+1.Thismatrixwillbedealtwithatthenextlevel.Ofcourse,iftheoriginalmatrixisstronglydiagonallydominant,thentheprocesswillretainmanyrowsinthe¯rstoneortwolevels,sothealgorithmwillrequireveryfewlevels.Ingeneral,thealgorithmrequiresasmallnumberoflevels,unless¿0isclosetoone.IfBweretobeadiagonalmatrix,thentheeliminationprocesscouldbeviewedasamulti-eliminationalgorithmwithcompletepivoting.Insteadofselectingonepivotatatime,thealgorithmselectsthelargestpossiblenumberofpivotsallowedbyagreedyapproach,inwhichtheselectioncriterionattemptstocompromisebetweenadiagonaldominancerequirementandthenumberofnonzeroentriesintherows.9 NotethatonealternativetoAlgorithm3.2,istousetheMC21procedurewhichperformsamaximaltransversalforbipartitematching,see,e.g.,[23,21,20,19].Thisalgorithm,whosegoalisto¯ndamaximalsetofmatchingpairs,isnotexpensiveandiseasytoimplement.Onemajordi®erencewithAlgorithm3.2isthatMC21doesnottakeintoaccountnumericalvalues,oranyweightsassignedtoentries.Itonly¯ndsarow(orcolumn)permutationPsuchthatPAhasnonzeroentriesonthediagonal.AnotherobservationisthattheprocedurewillfailwhenaroworcolumnoftheSchurcom-plementiszero.Itturnsoutthatthisrarelyhappensinpractice.RecallthatifBisnonsingular,andAisnonsingularthenSisnonsingular,sothesituationwillnothappeninthisidealcase.TheonlyriskistomakeSsingularbecauseofdropping,butthisisrarewithstandardthreshold-baseddroppingprocedures.3.3DiagonalScalingScalingtherowandcolumnsofageneralsparsematrixpriortoattemptingtocomputeitsILUfactorizationiscriticaltothesuccessofthepreconditioner.Thee®ectofrowandcolumnscalingonthequalityofthepreconditionerisnottoowellunderstood,however.Whenscaling,e.g.,therowsofanon-diagonallydominantmatrixby,say,theirin¯nitynorms,thee®ectmaynotnecessarilybebene¯cialtotheaccelerator.Thismaybeduetothefactthatscalinga®ectsthedegreeofnon-normality.Diagonally,ornearlydiagonallydominantmatricesdonotseemtobeasmucha®ectedbyrow/columnscaling.OurresearchcodeARMS-Co®erssomeoptionsforscalingthematricesobtainedateachlevelandthelastlevel.Ifrequested,rowsoftheinter-levelmatricesAiare¯rstscaledbytheir1-norms.Then,ifrequested,columnsoftheresultingmatrixarealsoscaledbytheir1-norms.Inparticularvectorsmustbeallocatedtosavethescalingfactorsfortheleftandrightscaling,ateachlevel.Thoughthisstrategyworkswellinmanycases,experimentsrevealthatitisnotalwaysthebest.Aswasalreadymentioned,wehavefoundthattheuseofscalingmayyieldapoorerpreconditionerthanwithoutanyscalingatall(seethenumericaltestssection),thoughthissituationissomewhatuncommon.Therearealsocaseswhenitisbesttostartscalingthecolumns¯rstandthentherows,orscalingrowsonlyorcolumnsonly.Theissueofscalinginthecontextofpreconditionersisatopicofgreatimportance,andonethathasnotbeengivenenoughattentionintheliterature.Scalingwillberevisitedinthenumericalexperimentssection.4MatchingheuristicsWenowexamineafewalternativestothegreedyapproachofAlgorithm3.2.ItmaybedesirableforexampletoseektoenforcesomeformofdiagonaldominanceintheBpartof(1).HavingaBmatrixinaformthatleadstoeasysolutionsmayalsobeappealing.WehavetestedagoodnumberofstrategiesbutshowonlythreehereinadditiontothegreedytechniqueofAlgorithm3.2.Itisworthitto¯rstexaminecarefullytheproblemofenforcingdiagonaldominanceinB.Thestrategyofalgorithm3.2doesnotguaranteetoproduceaBmatrixthatisdiagonallydominant,ingeneral.Theidealsituationwouldbeto¯ndpermutationsP;QsothattheBblockin(2)isexactlydiagonallydominant,aswellasoflargesize.Notethatwearerestrictingthediagonaldominance10 requirementtotherowsofBnotthoseofA,i.e.,entriesinFdonotmatter.Wecanstatethattheproblemisto¯ndamatchingset,satisfying(7),suchthatthecorrespondingBmatrix,whichisde¯nedasfap(i);q(j)gi;j=1;:::;nMisdiagonallydominant.Ascanbeeasilygraspedthisisadi±cultproblemsinceitsrequirementisaglobalone,becausethediagonaldominancerequirementinvolvesthematchingsetasawhole.OnepossibilityforobtainingadiagonallydominantBissimplynottoacceptanyrowsthatarenotdiagonallydominantasrowsofAinthepreselectionalgorithm.Taking¿=0:5inAlgorithm3.1forcescandidaterowsscannedinLine4ofthealgorithmtobediagonallydominant.SincethecorrespondingBrowshavefewer(non-diagonal)entries,thanthesamerowofA,thenafortiorithecorrespondingrowinBwillbediagonallydominant.However,thisdoesnotworkingeneral,simplybecauseinmanycases,Amaynotevenhaveonesinglerowthatisdiagonallydominant.Thesu±cientconditionusedinthisstrategyistoorestrictive.Itshouldbementionedthatinourcodes,thescalar¿isdeterminedonarelativebasisbyaformulasuchas(10).EnforcingdiagonaldominanceinBiseasilyachievedbyrestrictingBtobetriangular.Thisstrategyispresented¯rstinspiteofitsmainlimitation,whichisthatittendstoproduceaBblockthatissmall.Itwillalsoserveasanintroductiontotheothermatchingstrategies,whichrelaxthetriangularityrestriction.4.1MatchingforatriangulardiagonallydominantBToobtainalowertriangularmatrix,itsu±cestomarkallcolumnindicesofaselectedrowisothattheywillbecomeineligibleforinclusioninthelatersteps.Inaddition,beforeadding(i;j(i))tothesetM,weneedtoverifythattherowisdiagonallydominant.Thisgivesthefollowingmodi¯cationofAlgorithm3.2.Algorithm4.1ReorderingforatriangularBblock1.SetM=;;SetP(i)=Q(i)=¡1fori=1;:::;n;Setcount=0;2.Fori=1;:::;nDDo:3.If(Q(j(i))!=¡1)Continue(skiptonexti)4.ComputetB=Pk2SB(i)jaikj,whereSB(i)=fkjk2adj(A;i);Q(k)¸0g,5.IftB·jai;j(i)jthen6.add(i;j(i))toM:(a)count=count+1;(b)P(i)=Q(j(i))=count;7.SetQ(k)=¡2forallk'sinadj(A;i)withQ(k)==¡1.8.Endif9.EndDo10.CompletematchingsetMarbitrarily.ThesetSB(i)introducedinLine4,representsthesetofcolumnindicesthathavealreadybeenacceptedintoBpriortostepi.ThesewillrepresenttheindicesofthenonzeroentriesofrowP(i)ofthestrictlowertriangularmatrixafterpermutation.Thus,step4computesthesumoftheabsolutevaluesofallentriesinthestrictlowerpartofthematrixthatwouldbecomeBafterreordering.Thenextlinetestsifthecorrespondingrowis(weakly)diagonallydominant.Ifitis,thenthe11 candidatepair(i;j(i))isaccepted(Line6)andallcolumnindicesintherowwhicharenotyetassignedareassignedthevalue¡2,sothatthesecolumnswillnotbeselectedinfuturesteps.ThisensuresthatBwillbelowertriangular.Anegativevaluethatisdi®erentfrom¡1isusedtoallowpropercompletionbythegreedycompletionalgorithm.4.2MatchingforanaugmentedtriangularBForcingBtobetriangularmayberestrictive.ItleadstoaBblockthatistypicallymuchsmallerthanwiththegreedyapproachofAlgorithm3.2,resultinginahighernumberoflevelsandamoreexpensivefactorizationintheend.ThealgorithmdescribednextcanbeviewedasalessrestrictiveversionofAlgorithm4.1wherebysomeentriesareallowedintheuppertriangularpartofBsolongasdiagonaldominanceispreserved.Ateachi-thstep,themodi¯edalgorithmbeginsbycomputingthequantityt=jai;j(i)j¡Xk2SB(i)jaikj:RecallthatSB(i)issimplythesetofindicesthatcorrespondtothelowertriangularpartofrowP(i)ofthematrixB.Similarly,wedenotebySF(i)thesetofcolumnindicesthathavebeenrejectedpriortostepi.WesetnB(i)=jSB(i)jand,similarly,nF(i)=jSF(i)j.Thus,thereareatmostnz(ai;:)¡nB(i)¡nF(i)entrieswhichhavenotyetbeentaggedasmembersofBorF.IftisnegativethenthisrowcannotbediagonallydominantsincethelowerpartofBisalreadynotdiagonallydominant.Inthiscasethealgorithmwillskiptothenextcandidatepair.If,ontheotherhand,tispositive,thenwemayacceptentriestotherightofthediagonalentryofB.Inthiscase,thealgorithmwillproceedtoacceptentriesthatarenotlargerthantnz(ai;:)¡nB(i)¡nF(i))wherenB(i)isthenumberofentriesoftherowbeingexamined,thatarealreadyinB,and,similarly,nF(i)isthenumberofentriesthatarealreadyinF.Allothercolumnentriesarerejected,i.e.,theyaretaggedaspartofF.Algorithm4.2AugmentedtriangularB1.SetM=;;SetP(i)=Q(i)=¡1fori=1;:::;n;Setcount=0.2.Fori=1;:::;nDDo:3.If(Q(j(i))!=¡1)Continue(skiptonexti)4.LetSB(i)=fkjk2adj(A;i);Q(k)¸0g,5.SF(i)=fkjk2adj(A;i);Q(k)==¡2g:6.ComputetB=Pk2SB(i)jaikj,nB=jSB(i)j,andnF=jSF(i)j.7.IftB·jai;j(i)jthen8.add(i;j(i))toM:(a)count=count+1;(b)P(i)=Q(j(i))=count;9.Compute°=(jai;j(i)j¡tB)=(nz(ai;:)¡nB¡nF)10.Foreachkinadj(A;i)withQ(k)==¡1Do:11.Ifjaikj�°setQ(k)=¡212.EndDo12 13.Endif14.EndDo15.CompletematchingsetMarbitrarily.TheresultingrowsofBwillberowdiagonallydominant.Proposition4.1WhenthepermutationsP;QresultfromAlgorithm4.2thenthematrixBin(2)isrow-wiseweaklydiagonallydominant.Proof.RecallthatSB(i)isthesubsetofculumnindicesthathavebeenacceptedintoBpriortostepiandSF(i)isthesetofrejectedcolumnspriortostepi.De¯nealsoS0B(i)tobethesetofallculumnindicesthathavebeenacceptedintoBatthecompletionofstepi,andSa(i)tobethesetofentriesthathavebeenacceptedatstepi.SoS0B(i)=SB(i)[Sa(i).Letna=jSa(i)jandnotena+nB(i)+nF(i)·nz(ai;:),i.e.,na·nz(ai;:)¡nB(i)¡nF(i).Then,jai;j(i)j¡Xk2S0B(i)jaikj=jai;j(i)j¡Xk2SB(i)jaikj¡Xk2Sa(i)jaikj=t¡Xk2Sajaikj¸t¡na£tnz(ai;:)¡nB(i)¡nF(i)¸0:Avariationonthisprocedureconsistsessentiallyofadjustingtheaverage°computedinLine9asnewentriesareaccepted.ThesumtBaswellasthenumbernBarechangedeachtimeanewentryisacceptedintoB.Thisallowstoacceptmoreentries.Analgorithmofthistypeisdescribednext.MatchingbasedondynamicaveragesThistechniqueassignsa`dynamic'measureofdiagonaldominanceofarowwhenconsideredasarowofBatagivenstep.Eachstepofthealgorithmexaminestheselectedpairsi;j(i)fromD.Itbeginsbycomputing½i=jai;j(i)jwhichistheentryoftherowwithlargestmagnitude.TherowisthenscannedtodeterminehowmuchroomisleftforacceptingnewentriesintoB.Thisisdoneasfollows.Priortostepithealgorithmhasacceptedanumberofentriesalready.Thesewillshowupinthelowertriangularpartafterreordering.Theindicator½iwillbeadjustedbysubtractingfromittheabsolutevaluesofalltheseentries.ItwillbeupdatedsimilarlyeverytimeacolumnisacceptedintotheBpart.SomecolumnshavealsobeenrejectedandthesewillshowupintheFpartafterreordering.ThecolumnsthatareleftarepotentialcandidatesforbeingacceptedintheBpart.Thealgorithmwillrejectcolumnsamongthesebasedonacriterionsimilartotheoneinthepreviousalgorithm.Whenarowisscanned,anentryjaikjistestedagainstthecurrentvalueof½idividedbythenumberofentriesthatmaystillpotentiallybecandidatesforbeingacceptedinB.Ifitisless,thenitisacceptedintoBand½iisdecreasedaccordingly.Ifthetestisnotsatis¯edthecolumnkis13 markedasnotaccepted.ThecounterofthenumberofentriesintherowthatarenotinBisalsodecremented.Algorithm4.3AugmentedtriangularBwithdynamicaverages1.SetM=;;SetP(i)=Q(i)=¡1fori=1;:::;n;Setcount=0.2.Fori=1;:::;nDDo:3.If(Q(j(i))!=¡1)Continue(skiptonexti)(pair(i;j(i)noteligible)4.LetSB(i)=fkjk2adj(A;i);Q(k)¸0g,5.SF(i)=fkjk2adj(A;i);Q(k)==¡2g:6.½=jai;j(i)j¡Pk2SB(i)jaikj7.nz=jadj(A;i)j¡jSB(i)j¡jSF(i)j.8.If(½0)Continue(skiptonexti)(rejectpair(i;j(i)))9.add(i;j(i))toM:(a)count=count+1;(b)P(i)=Q(j(i))=count;10.Foreach(j2Adj(i)suchthatQ(j)==¡1)Do:11.If(nz¤jai;jj�½)then12.Q(j)=¡2(Rejectj)13.Else(Donotrejectj)14.½=½¡jai;jj(Update½)15.EndIf16.nz=nz¡1(Updatenz)17.EndDo18.EndDo19.CompletematchingsetMarbitrarily.Wenowbrie°ydiscussthemainstepsofthealgorithm.Thevariable½canbeviewedasadiagonaldominanceindicatorfortherunningrow,asarowofB.Aslongasitisnonnegativethenthecurrentrowremainsdiagonallydominant.Lines4to7computethevalueof½atthestart,whichisthemaxvaluejai;j(i)jminustheabsolutevaluesoftheentriesai;jforwhichthecolumnjisalreadyarowofB.Thenumberofcolumnsthatarestill'undecided',i.e.,thatareneitherinBnorinF,isalsocomputed(Line7)anditiscallednz.Ifthecomputedinitialvalueof½isnegativethenclearlyrowimustberejectedasisdoneinLine8.Otherwisethepair(i;j(i))isaddedtoMbyupdatingP,Q,andthecountercountasisdoneinLine9.TheloopinLines10-17,scansrowiasecondtime.Eachcolumnjthatisstillundecided(Q(j)==¡1),isexamined.Ifjaijj�½=nzthenthecolumnistaggedasrejected(Q(j)==¡2).Otherwiseitisaccepted(sofar),andtheB-diagonaldominanceindicator½isupdatedaccordingly.Ineithercase,thenumberofundecidedcolumnsisreducedbyone.ItisclearthatacceptanceintoBisonlytentativesinceacolumnthatisacceptedatthisstepmayberejectedlater.Ascanbeeasilyestablished,thealgorithmproducesamatrixBthatisrow(weakly)diagonallydominant.Proposition4.2WhenthepermutationsP;QresultfromAlgorithm4.3thenthematrixBin(2)isrow-wiseweaklydiagonallydominant.Proof.Throughoutstepiofthealgorithm,thevariable½representstheabsolutevalueofthedi®erencebetweenai;j(i)andtheentriesaijthatareacceptedintotheBpart,i.e.,thoseforwhich14 Q(j)willwinduppositiveafterstepi¡1.AfterLine8,theB-partofrowiisdiagionallydominantsince½¸0.Whenacolumnisnottaggedwith¡2,thenitcanpotentiallybecomepartofB.Considernowtherowatthecompletionofthenextloop(Lines10-17).Ateachstepwhenacolumnisnotrejected(Lines13-15),anew½iscomputed,½new=½¡jaijjandthenewtotalpotentialentriesintheBpartoftherowisalsoupdatedasnznew=nz¡1.Wehave½new=½¡jaijj¸nzjaijj¡jaijj=nznewjaijjwhichshowsthat½willremainnon-negativethroughoutthestep.Intheendtherewillbefewerentriesthatare¯nallyaccepted(assomecolumnswillberejectedinlatersteps).ThereforetheresultingrowofBwillremaindiagonallydominant.5CostanalysisofmultilevelnonsymmetricorderingsThe¯rstconcernwhenconsideringtheuseofthealgorithmsdescribedaboveislikelytobeinregardstoitscost,speci¯callythecostofobtainingthepairofpermutations.Obtainingthepermutationsmaybeexpensivewithapoorimplementationorapoorchoiceofparameters.Considerthepreselectionalgorithm¯rstandletnnzlbethenumberofnonzeroelementsinAl.Ateachleveltherearennzlpotentialcandidatestobecomediagonalentries.Theparameter¿allowstoeliminatealargefractionofthemwiththegoalofreducingthecost.Thealgorithmstartsbyobtainingthelargestentryineachrowatthecostofnnzl.Inthesecondphaseofthealgorithm,weightfactorsarecomputedforeachofthenDentriesofDatthecostofO(nD).Finally,theentriesinDaresortedaccordingtotheseweights,atthecostofO(nDlognD).SincewedonothaveanideaofthesizeofnD,weneedtomakeanassumptionwhichwillenableustoobtainanestimate.WewillassumethatfromoneleveltothenextthesizenlofthematrixAlisreducedbyaconstantratioK.Inotherwords,referringtothesplitting(3),weassumethatsize(Al)=size(Al+1)¼K�1:Onaveragethisisafairlyrepresentativemodel.ItisclearthatKwilldependon¿aswellasonthestructureofthematrix,itsnumberofnonzeroelementsperrow,etc..AteachlevelthesizeofDisafractionofnl,thematrixsizeatlevell.Sincewelimitthe¯ll-inineachrow,(seetheparametersusedintheexperiments)thenumbernnzlofnonzeroentriesinAlisamultipleofnl(inworsesituationsitcanpossiblybeashighasnllognlbutthisdoesnota®ecttheendresult).WiththisthecostateachlevelbecomesO(nnzl)+O(nDlognD)·¯nl+°nllognl»°nllognl:IfwehaveplevelsthetotalcostwillbeoftheformT(n)=°hnlogn+nKlognK+nK2lognK2+¢¢¢+nKplognKpi=°pXi=0hnKilognKii15 =°pXi=0·nlognKi¡inlogKKi¸=°"K¡K¡pK¡1nlogn¡nlog(K)pXi=0iKi#:Acloselookatthesecondterminthe¯nalexpressionrevealsthatitisofordern,sincethesumofi=KiisboundedbyaconstantwhichdependsonlyonK(andnotp).Intheend,T(n)¼°KK¡1nlogn:Ascanbeseen,whenKisclosetoone,thealgorithmwilldopoorly,re°ectingthefactthatthediagonaldominancecriteriondoesnotrejectenoughrowsateachlevel.Theanalysisindicatesthattheoverallcostforobtainingthepermutationisdominatedbythesortingstepinthepreselectionphase.Infact,itisclearthatanexactsortisnotnecessaryandcouldeasilybereplacedbyaninexactandlessexpensivesort.Forexample,itispossibletoreducethecostofthesortingstepbygroupingtheratiosintoasmallnumberofnearbyrepresentativesandresortingtoabucketsort.ThiswouldleadtoacostofO(n).Thecostofthematchingalgorithm,forexample,Algorithm4.2,iseasiertoestimateusingasimilarmodel.Onceagain,themainassumptionisthatnnzlisasmallmultiple,say®,ofnl-duetothedroppingstrategy.SoonaverageeachrowofAlhas®nonzeroelements.Inthiscasethecostofthei-thstepofAlgorithm4.2isO(nD)£®,withnD¼nl=K.Addingthesecostsforeachlevel,willyieldn®·1K+1K2+¢¢¢+1Kp¸¼n®K¡1acostwhichislinearinn,andwhichshowsasimilardegradationasforthepreselectionalgorithm,whenKisclosetoone.6NumericaltestsWehaveimplementedaversionofthealgorithmdescribedinearliersectionsbyusingtheARMSframework[36].WerefertotheresultingalgorithmasARMS-C(Cforcompletepivoting).ThegoaloftheexperimentsinthissectionistoillustratetheperformanceoftheARMS-Cstrategyaswellastogiveanideaonitscostandthee®ectofsomeparameters.TheexperimentsinSection6.2and6.3wereperformedononeprocessorofanIBM/SPcomputer.Theprocessorisa375MHzPower3(Winterhawk)nodewith4GBofmemory.AllotherexperimentswereperformedonaLinuxplatformwithtwo1.7XeonGHzprocessors,a256KBcache,and1GBofmainmemory.6.1ComparingmatchingstrategiesWebeginthissectionwithacomparisonofafewmatchingstrategies.Inthefollowingexperiments,weuseonlyonepreselectiontechnique,namelytheonedescribedbyAlgorithm3.1.WerunanumberofstrategiesandattempttosolveasequenceoflinearsystemsfromtheHarwell-Boeingcollection[22].ThelinearsystemsareobtainedfromallthosematricesfromtheRUAcollectionwhosesizeis500orhigher.Thereare58matricesthatsatisfythiscriterionandtheirnamesare16 BP1000822BP1200822BP1400822BP1600822BP200822BP400822BP600822BP800822FS5411541FS5412541FS5413541FS5414541FS6801680FS6802680FS6803680FS7601760FS7602760FS7603760GEMAT114929GEMAT124929GRE11071107GRE512512JPWH991991LNS39373937LNS511511LNSP39373937LNSP511511MAHINDAS1258HBMCFE765NNC13741374NNC666666ORANI6782529ORSIRR11030ORSIRR2886ORSREG12205PDE9511961PORES21224PORES3532PSMIGR13140PSMIGR23140PSMIGR33140SAYLR31000SAYLR43564SHERMAN11000SHERMAN21080SHERMAN35005SHERMAN41104SHERMAN53312HBSHL0663SHL200663SHL400663STEAM2600WATT11856WATT21856WEST0655655WEST0989989WEST15051505WEST20212021Table1:The58matricesfromtheHarwell-Boeingcollectionwithsizes(showntotheirright)¸500listedinTable1alongwiththeirsizes.Ifone(orseveral)right-handsidesis(are)providedthenthesystemwiththis(the¯rst)righ-handsideissolved.Otherwiseanarti¯cialright-handsideistaken,whichisformedbytakingb=A¤e,whereeisthevectorofallones.Theiterationisstoppedwhenevertheresidualnormhasbeenreducedby8ordersofmagnitude.Thisdoesnotnecessarilymeanthatthesolutionthatiscomputedisaccurate,becausethematricesinvolvedmaybehighlyill-conditioned.Sinceright-preconditioningisused,thetestsareontheactualresidualsnotthepreconditionedones.Inspiteoftheirrathersmallsizes,someofthematricesinthislistcausedi±cultiestoiterativesolvers.Itisforexamplenoteasyto¯ndaniterativeprocedurewhichsolvesallthesystemswiththesameinputparameters.Mostofthematricesareirregularlystructured,andsomehaveaveryirregularpattern.ThenextplotsshowacomparisonoftheperformanceofGMRESpreconditionedwithARMS-Cwithvariousmatchingstrategies.Wecomparefourmethods:²ThegreedystrategyofAlgorithm3.2,²ThetriangularBalgorithmdescribedbyAlgorithm4.1,²TheaugmentedtriangularBtechniqueasdescribedbyAlgorithm4.2²ThealgorithnmofdynamicaveragesdescribedbyAlgorithm4.3.TheparametersusedfortheexperimentarelistedbelowDroptoleranceFill-maxnlevmaxtolDDLU-BGWSLU-SLU-BGWSLU-S1000.10.0010.010.0010.011010105Inthetable,nlevmaxisthemaximumnumberoflevelsallowed,tolDDisthetolerance¿0usedinthepreselectionalgorithm,see(10).Thenext4parametersde¯nethedroppingfactorsforthefactorizationalgorithms.ThedroptoleranceforLU-BistheoneusedintheILUTfactorization[35]oftheBblock,whiletheonedenotedbyGWisusedwhencomputingthematricesGland17 Wlde¯nedby(6)ateachlevel.ThedroptoleranceusedtocomputeSfromGlandWlisgivennext.Finally,thedroptolerancefortheILUfactorizationatthelastlevelisgiven.Theincompletefactorizationwithcolumnpivoting(ILUTP)isusedforthis.Thenextsetofcolumnsde¯netheupperlimitfor¯ll-insforeachofthesecomputationsinthesameorder.Notethatthesenumbersarethemultiplesoftheaveragenumberofnonzeroelements,i.e.,atthel-thleveltheactualamountofnonzeroentriesallowedineachrowisfillmax¤nnz(Al)=n.Anotherparameternotshownhereisthecut-o®dimensionforthelastlevelwhichissetto100.ThismeansthatnofurtherlevelsaresoughtwhenthedimensionoftheSchurcomplementis·100orwhenthemaximumnumberoflevelsallowedisreached.WeranARMS-Cwithdi®erentmatchingstrategieswiththeexactsameparametersgivenaboveforall58linearsystems.20304050607080363840424446485052545658Systems solved out of 58IterationsNumber of systems solved out of 58 versus iterationsAlg. 3.2 8010012014016018020040424446485052545658Systems solved out of 58IterationsNumber of systems solved out of 58 versus iterationsAlg. 3.2 Figure2:Experimentwith58HBmatricesshowingthenumberofconvergedsystemsamong58,versusiterationcount.Theincompletefactorizationswerecomputableinallcases.However,theiterationphasewasnotsuccessfulinallcases.Aniterationwaslabeledasafailurewhentheresidualnormwasnotreducedby8ordersofmagnitudeinfewerthan200GMRESsteps.WeusedGMRESwitharestartof100.Thiswasdoneinordertomeasurethequalityofthepreconditioneraloneindistinctionfromthatoftheaccelerator.Theresultsaredividedintwoplotsforbetterclarity.TheleftsideofFigure2showsthenumberofsuccessfulsolvesrequiringiterationcountsof80orlessandtherightsidethoserequiringmorethan80steps.Forbettervisibility,thecurveshavebeenslightlyshiftedinordertoavoidsuperposition.AscanbeobservedthestrategyofAlgorithm4.3handlesharderproblemsquitewell(rightsideof¯gure)whileAlgorithm4.1yieldsfasterconvergenceoneasierproblems(leftsideof¯gure).AstandardpreconditionerwhichisoftenusedforcomparisonpurposesisILUTP(ILUwithThresholdandPivoting[35]).Whenagoodamountof¯ll-inisallowedthepreconditionercanbefairlyrobust.However,itdoesnotperformwellformatriceswithhighlyirregularpatternsandwhennotenough¯ll-inisallowed.Forcomparisonpurposes,weranILUTP-GMRESundersimilarconditionsonthe58problems.Speci¯cally,weselectedadroptoleranceof0:01anda¯ll-inlimitof3¤nnz(A)=n,whichmeansthateachrowofthefactorsLandUhaveatmost3timesthenumber18 AverageAverageSuccessfulMethodFill-factIter.countIterationsILUTP2.7112.1243Alg.3.21.6120.5843Alg.4.11.8715.0051Alg.4.21.6317.2352Alg.4.31.6522.1154Table2:Somestatisticsonthetestswiththe58HBmatricesofnonzeroelementsperrowofA.Togiveanideaontheoverallperformance,weshowinTable2theaverage¯ll-factorandaverageiterationcountforeachsuccessfulcaseforallmethods,includingILUTP.The¯llfactorre°ectsthememoryrequirementofthepreconditionerandisde¯nedastheratioofthenumberofnonzeroelementsoftheresultingpreconditioneroverthatoftheoriginalmatrix.ItisinterestingtonotethatILUTPrequiredfarmore¯ll-intosolvethesamenumberofproblems(43)astheworstperformeramongtheARMS-Calgorithms.Theaverageiterationcountsarenotindicativeofrobustness.Sincethe¯ll-inisratherhigh,theiterationcountsfortheeasyproblemsisusuallyaverysmallnumberforthe(fewer)successfulrunsofILUTP,whichleadstoalowaverageamongthe43successfulcases.ThisisincontrastwiththeARMS-Cstrategieswhichsolvemoreoftheharderproblemsbut,ascanbeexpected,atthecostofalargernumberofGMRESstepsonaverage.ILUTPwasabletosolvesomeoftheproblemsforwhichitfailedintheabovetest.However,thiswasachievedatthecostofmuchmorememory.6.2E®ectofthe¯lteringparameter¿0Wevarytheparameter¿0usedinthepreselectionalgorithmandcomparetheiterationnumberandthetimestocomputethefactorizationforasystemassociatedwiththematrixtwotonefromtheUniversityofFloridasparsematrixcollection[16].Thismatrixisknowntocausedi±cultiestostandardILUTandILUTPpreconditioners.TheparametersusedforthetestaregiveninthefollowingtableDroptoleranceFill-maxnlevmaxtolDDLU-BGWSLU-SLU-BGWSLU-S200varies0.0010.00010.0010.011010105Algorithm4.3wasusedtoobtaintheP;Qpermutations.ForthismatrixtheARMS-Cprocedurerequiresunusuallysmallvaluesforthedroptolerancesbeforethepreconditionerstartsyieldingaconvergingiteration.Ontheotherhand,andsomewhatsurprisingly,theresulting¯ll-factorisnottoolarge,andconvergencecanbeobtainedfor¯ll-factorsbelow2.Thisphenomenonisnotobservedtoooftenandislikelyrelatedtothestructureoftheproblem.ILUTPdidnotyieldconvergenceforthisproblem,evenforafairlylarge¯ll-factor.As¿0increasesfrom0.0to0.4onecanobserveasteadyimprovementintheconvergenceofthealgorithmwhilethe¯ll-factorremainsalmostthesame.Thereforeabetterqualityfactorizationisbeingcomputedwithaboutthesameamountof¯ll.Thentheiterationcountseesasizable19 ¿0¯lflevitersetupseciterssec0.001.7591815.96e+016.84e+010.101.7990746.05e+016.24e+010.201.7597875.90e+017.45e+010.301.78117717.01e+016.19e+010.401.80145478.70e+014.29e+010.501.89193341.18e+023.60e+010.601.91200281.46e+023.41e+010.701.95200271.64e+024.14e+010.801.95200291.72e+025.45e+010.901.91200352.34e+029.73e+01Table3:PerformanceofARMS-Cforvariousvaluesof¿0fortheTwotonematrixdecrease,reachingthesmallestnumberof27when¿0=0:7.Atthispointthemaximumnumberoflevels,whichhasbeensetto200,isreachedandthe¯ll-factorincreases.Eventually,theiterationstartstodeteriorateslowlyreaching38when¿0=0:9.Forasmallvalueof¿0moreentriesareinitiallypreselected.Forexamplewhen¿0=0allthelargestentriesineachrowarecandidates.Theyarethensortedusingameasurewhichmixesdiagonaldominancewiththenumberofnonzeroentries.Theresultingalgorithmismoretolerantofnondiagonaldominance.Selectingalarger¿0setsastricterconditionforacceptinganentryasacandidate.Excludingmorerowsinthiswayresultsinmorerobust,oftenfaster,convergence,butthisincreasesthenumberoflevelsascanbeseeninthetable.Infactthenumberoflevelsshowninthetablefor¿0¸0:7isthelimitallowed.Onemaywonderwhytheiterationtimeisnotproportionaltotheamountof¯llofthefactorizationandtheiterationcount.Forexamplewhenbetween¿0=0:8and¿0=0:9,theiterationtimejumps78%whiletheiterationcountincreasesonly20%.ThisislikelyduetothefactthatthelastSchurcomplementismuchlargerforthelattercase.6.3E®ectofthenumberoflevelsOneimportantconsiderationwhenusingARMS-C,isthenumberoflevels.Atoneextremewhenonlyonelevelisused,ARMS-CwillyieldtheILUTPfactorization.Indeed,theprocedureconsidersthematrixatthetopleveltobetheSchurcomplementmatrixandusesILUTPtofactorit.LargeSchurcomplementsarelikelytoleadtoexpensivefactorizations.ThecurrentcodeusesasaparameterthemaximumnumberoflevelsandthesmallestallowableSchurcomplementwhichisusuallysetto100,i.e.,assoonasamatrixAl+1in(3)reachesasizeof100orsmallertheprocedurestopsgeneratingmorelevels.Inthenextexperimentweexplorethiswithagoalofdemonstratingthebene¯ciale®ectofamultilevelapproachincontrastwithonethatisbasedononeorveryfewlevels.Thematrixusedinthistestarisesfromthesimulationofa2-dimensional°owinadrivencavityandisavailablefromthematrixmarket3underthenameE40R0100inthecollectionDRIVCAV.Itssizeisn=17;922andithasnnz=567;467nonzeroentries.TheReynoldsnumberfortheexampleis100.Theparametersusedfortheexperimentarelistedbelow3http://math.nist.gov/MatrixMarket/20 0510152025300102030405060708090100iterationsNumber of levelsGMRES Iters.GMRES iterations and fill-factors vs. number of levels0510152025300246810fillfactor00.10.20.30.40.50.60.701020304050607080iterationstauNumber of levels524949454035343938405675110181200GMRES iterations and fillfactors vs. tau00.10.20.30.40.50.60.701020304050607080iterationstauGMRES Iters.FillFactorNumber of levels524949454035343938405675110181200GMRES iterations and fillfactors vs. tau00.10.20.30.40.50.60.701020304050607080iterationstauGMRES Iters.FillFactorNumber of levels524949454035343938405675110181200GMRES iterations and fillfactors vs. tau00.10.20.30.40.50.60.701020304050607080iterationstauGMRES Iters.FillFactorNumber of levels524949454035343938405675110181200GMRES iterations and fillfactors vs. tau00.10.20.30.40.50.60.701020304050607080iterationstauGMRES Iters.FillFactorNumber of levels524949454035343938405675110181200GMRES iterations and fillfactors vs. tau00.050.10.150.20.250.30.350.40.450.50.550.60.650.711.21.41.61.822.22.4fillfactorFigure3:Iterationand¯ll-factorforthedrivencavityproblem,versusthenumberoflevels(left-side)and¿0parameter(right-side).Therightplotusesaslightlydi®erentsetofparameters.DroptoleranceFill-maxnlevmaxtolDDLU-BGWSLU-SLU-BGWSLU-Svaries0.20.010.050.050.016666Thestoppingcriterionisasbefore,namelythenormoftheresidualshouldbereducedby8ordersofmagnitude.Thecasenlevmax=0correspondstoILUTP-withthedroptolerance0.01andamax¯ll-inof6£nz=n.Figure3showsthenumberofGMRESiterationsaswellastheresulting¯ll-factorwhenthenumberoflevelsvariesfomzeroto30.Notethatforthiscaseonly27levelswererequiredbeforetheSchurcomplementmatrixreachedthesmallestallowablesizeof100,sothecasenlevmax=30wasneverreached.Theplotshowsafairlyregulardecreaseinthe¯ll-factorasthenumberoflevelsincreases.Atthesametime,theiterationcountincreasesandthenlevelso®ataround75.Adi®erentperspectiveonthisexperimentwouldbetovarytheparameter¿0aswasdoneintheprevioussectionandletthenumberoflevelsbeaslargeasrequired.Withthesetofparametersusedinthepreviousexperiment,themaximumnumberoflevelsforthematrixE40R0100was27.Inordertoobtainlargernumbersoflevels,wetightenedthedroptoleranceslightlybyhalvingthetwodroptolerancevaluesof0.05usedfortheG;WmatricesandtheSchurcomplementto0.025.Theparameter¿0isvariedfrom0.0to0.7withincrementsof0.05.TheplottotherightofFigure3showstheresult.Essentially,the¯ll-factorremainsinarathernarrowband[1.932.23],whiletheiterationcountchangesbutnotsubstantially.Theadditionalaxisshowsthenumberoflevelsrequiredforeachcase.Thesenumbersaremuchbiggerthantheonesobtainedwithadroptoleranceof0.05.ThislastexperimentaswellastheoneinSection6.2andothersnotreportedhere,suggestthatitisbestnottolimitthenumberoflevelstoatoosmallnumber.LimitingthenumberoflevelsinthiswaymayleadtoalargelastSchurcomplementmatrixwhichisthenpoorlyhandledbytheILUTPapproach.21 6.4SolutionofKKTproblemsandacomparisonTheARMS-Cprocedurecanbeappliedtosolvesaddle-pointproblemssuchasthosearisingfromtheincompressibleNavier-StokesequationsorfromKarush-Kuhn-Tucker(KKT)optimalityconditionsinnonlinearprogramming.In[28]HawsandMeyerfocusonKKTproblemsandcompareafewmethodsforsolvingthem.Suchproblemsaresymmetricandhighlyinde¯nitebutoneofthemethodstestedin[28]istoignoresymmetryanduseILUTasapreconditioneronareorderedsystem.ThereorderingusedisMC64,theone-sidedreorderingdevelopedin[23,24,33]whosegoalistoputlargeentriesonthediagonal.Twosetsofproblemsaretestedin[28].Weonlyconsiderthe¯rstsetcorrespondingtotheresultsreportedinTableIofthepaper.MatrixnnnznnzprelevItersSetupsec.Itersec.sti®4849641318733324490.090.39sti®53341017725634136361120.486.73mass0484965681861799270.070.04mass05334102410122590265410.351.53mass06337942572203020096200.470.73Table4:Resultsforsolving¯veKKTproblemswithARMS-C.MatrixnnzpreIters.Setupsec.Iter.secsti®477118280.1290.786sti®53414171350.57515.48mass046887530.1260.186mass05323299320.7053.902mass06394821300.8064.038Table5:Resultsforsolvingthe¯veKKTsystemsofTable4withILUT-MC64,asreportedin[28].NotethatthetimesarenotdirectlycomparablewiththoseofTable4.Table4showsthesamestatisticsasthoseofTableIof[28]plusthenumberoflevels(`lev')requiredforeachcaseforasimilarapproachwhichignoressymmetryandsolvestheproblemsbyanARMS-C/GRMREScombination.Inthetablenandnnzarethesizeandnumberofnonzeroentriesofthematrix,nnzpreisthenumberofnonzeroentriesrequiredforthepreconditioner,andtherestofthedataisself-explanatory.Theparametersusedfortheexperimentarelistedbelow.DroptoleranceFill-maxnlevmaxtolDDLU-BGWSLU-SLU-BGWSLU-S1000.250.050.010.010.014444Therighthandside,initialguess,andconvergencecriterionareidenticalwiththoseof[28].Aswithearliertestswedonotattemptinanywaytotunetheparametersforeachseparatecase.Thesewereselectedtoyieldacomparablenumberofnonzeroelementsforthepreconditionersasin[28].Infactall¯veproblemstestedherearerelativelyeasytosolvewiththeARMS-Cprocedureinthesensethatalmostanyparameterselectionwouldyieldconvergence.Thisisnotthecaseforsomematrices(see,e.g.,Section6.2).22 AsacomparisonwereproduceinTable5thestatisticsfortheMC64-ILUTcombinationfrom[28].Ascanbeseenforsimilar¯ll-in,thenumberofiterationsobtainedineachcasearecomparable.Notethattheacceleratorusedin[28]isBCGSTABinsteadofGMRES.Executiontimesarenoteasilycomparableastheydependonmanyfactorsinadditiontotherawspeedoftheprocessorsused.Theprocessorusedin[28]isanAMDDuron800Mhzprocessor,whichisslowerthantheXeon1.7Ghzonwhichthisexperimenthasbeenconducted.Ontheotherhandthecodesin[28]wereoptimizedwiththeuseofATLASBLAS.Inaddition,thealgorithmsin[28]arecodedinFORTRANwhereastheARMS-CcodeiscodedmostlyinC.TheARMS-Ccodeshavenotbeenoptimizedinanyway.Allthatcanbesaidisthatbytakingintoaccountthespeedsoftheprocessors,thetimesobtainedintheseexperimentsarewithinacomparablerange.NotethattheMC-64/ILUTprocedurewasnotthebestoverallmethodreportedin[28].However,thepointofthisexperimentisnottoshowthatthemethoddiscussedinthispapercancompetewithspecializedsolverstailoredtoagivenapplicationorstructure.Itisonlytoshowthatthemethodcanperformreasonablywellasageneral-purposesolver,inthatitcanhandleKKTproblemsaseasilyasaproblemarisingfromthediscretizationofPDEs.MatrixordernonzerosApplication(Origin)barrier2-9115,6253,897,557SemiconductorDevice(UFL,Schenkset)matrix9103,4302,121,550SemiconductorDevice(UFL,Schenkset)matrix-new3125,3292,678,750SemiconductorDevice(UFL,Schenkset)ohne2181,34311,063,545SemiconductorDevice(UFL,Schenkset)para-4153,2265,326,228SemiconductorDevice(UFL,Schenkset)cir2a482,9693,912,413circuitsimulationscircuit170998958936Digitalcircuitsimulation(UFL,Hammset)circuit480209307604Circuitsimulation(UFL,Bomhofset)wang3.rua26064177168Devicesimulation(UFL,Wangset)wang4.rua26068177196Devicesimulation(UFL,Wangset)Table6:Generalinformationon10testmatrices6.5TestswithlargermatricesApartfromthetestinSection6.2withthematrixtwotone,thelinearsystemsoftheearliernumericaltestsareallrelativelysmall.Inthissectionwereportonsometestsperformedwithlarger,morerealistic,matricesarisingfromapplicationswhichareknowntocausedi±cultiestoiterativesolvers.Weselected10matrices,mostofwhichhavealreadybeenusedastestproblemsintheliterature.The¯rst5matricesweretakenfrom[38].Theyarethelargestlinearsystemsinsizetakenfrom[38],ifthematrixpre2thereinisignored.Forthesystempre2nomethodamongthosereportedin[38]workedandasimilarconclusionwasreachedwithARMS-C.These5matrices,whichoriginatefromdevicesimulation,areavailablefromtheSchenksetsoftheUniversityofFloridacollection[16].Alongwiththesematrices,wealsoselectedthetwolargestmatricesfromthepaper[2],namelythematrixcirc2aandthematrixscircuit.ThematrixscircuitcanalsobeobtainedfromthesetHammoftheUniversityofFloridacollection,whilethelargermatrixcirc2awassenttousbyAchimBasermann[2].Finally,weaddedthe2largestmatrices(wang1,23 wang2)fromtheWangsetoftheUniversityofFloridacollection,andthelargestmatrix(circuit4)fromthesetBomhoffofthesamecollection.Theseexamplesareallfromcircuitsimulation.Somegeneralinformationonall10matricesisshowninTable6.WetestedARMS-C,basedontheddPQorderinggivenbyAlgorithm4.3onall10matriceswiththefollowingparametersettings:DroptoleranceFill-maxnlevmaxtolDDLU-BGWSLU-SLU-BGWSLU-S400.10.010.010.011.e-0533320TheparametersfortheILUTPpreconditionerforthelastlevelessentiallyleadtoadirectsolve.Anotherparameternotshownhereisthecut-o®dimensionforthelastlevelwhichissetto300.NofurtherlevelsaresoughtwhenthedimensionoftheSchurcomplementis·300orwhenthemaximumnumberoflevelsallowed(40)isreached.WerantwotestswiththeGMRESaccelerator,onewitharestartdimensionof60,theotherwitharestartdimensionof100.NotethatmemoryisallocateddynamicallyforthebasisvectorsrequiredbyGMRES(k).Inbothcases,theiterationwasstoppedwhenevereitheramaximumnumberofiterationsof300wasreachedortheinitialresidualwasreducedby10¡8.Thetimesareinsecondsonthe1.7GhzXeonprocessormentionedearlier.TheiterationtimesarerepeatedforthecasewhenGMRES(60)andGMRES(100)takethesamenumberofiterations(thesecorrespondtotheexactsameoperationsandshouldbethesameunderidealconditions).Ofallthematricestested,onlyscircuitfailedtoreducetheresidualby8ordersofmagnitudein300stepswiththesetofparametersselectedabove.In300stepstheresidualnormwasreducedby¼1:8£10¡07byGMRES(100)andby3:13£10¡7byGMRES(60).Thisparticularmatrixisagoodrepresentativeofthosesituationsmentionedearlierinwhichrow/columnscalingdoesnotnecessarilyhelp.Iftheexactsameparametersareused,butnoroworcolumnscalingisperformed(setbyparametersinARMS-C)then,asisshowninTable8,GMRES(100)doessucceedinreducingtheresidualnormby8ordersofmagnitudein297steps,whileGMRES(60)alsoachievesthesameresultin400steps.Table8alsoshowsanadditionaltestforthesamematrixwhennoscalingisused.Herethedroptoleranceparameterslabeled\LU-B",\GW",and\S",whichweresetto0.01FillSet-upGMRES(60)GMRES(100)MatrixFactorLevelsTimeIts.TimeIts.Timebarrier2-90.6254.01e+001133.29e+01933.02e+01matrix-new30.8987.53e+00401.02e+01401.00e+01matrix91.7795.53e+001604.94e+01822.70e+01ohne20.6264.34e+01996.35e+01805.43e+01para-40.6255.70e+00491.94e+01491.93e+01wang32.3348.90e-01452.09e+00451.95e+00wang41.8645.10e-01311.25e+00311.20e+00scircuit0.9041.86e+00Fail7.08e+01Fail8.80e+01circuit40.7521.60e+001991.69e+01961.07e+01circ2a0.7632.19e+02181.08e+01181.03e+01Table7:Solutionofthe10systemsinTable6withGMRES(60)andGMRES(100)preconditionedwithARMS-C.24 FillSet-upGMRES(60)GMRES(100)FactorLevelsTimeIts.TimeIts.TimeSameparameters0.8941.81e+004009.13e+012978.79e+01Drop-tol=0.0011.0051.89e+00982.23e+01822.27e+01Table8:Solutionofthesystemscircuitwhennoscalingisused,usingtwodi®erentsetsofparameters.thisexperiment,arechangedto0.001.Theresultingconvergenceismuchfaster.Itshouldbenotedthatconvergenceisalsoachievedwiththesevaluesofthedroptoleranceforthesituationwhenrow/columnscalingisused,althoughitismuchslower.Observethatthe¯ll-factors,thememoryrequiredforthepreconditionernormalizedbythatrequiredfortheoriginalmatrixA,aregenerallyquitemodest.Itislessthanonein7ofthe10cases.Notseeninthereportedresultsisapeculiarbehaviorofthesolverwiththematrixcircuit4aswellaswiththematrixcirc2a.Incontrastwithmostothersystems,resultswithcircuit4wereexceedinglysensitivetoparametersandeventotheinitialguess(whichisalwaysarandomvector).Regardingthematrixcirc2a,itcanbeobservedinthetablethatthetimetocomputethepreconditionerappearstobeabnormallyhigh,relativetotheothertestcases.Thismatrixisthelargestinsizeamongthe10tested,butitisnotthelargestintermsofthenumberofnonzeroentries.The¯rstreactiontothismightbethatthealgorithmshavesomehiddencostassociatedwiththesizenratherthanthenonzeroentries.Atestwiththematrixpre2mentionedearlierdisprovedthis.Thismatrixisofsizen=659;033andhasnz=5;959;282nonzeroentries.Yet,thesameparametersproducedthefactorizationinalmostanorderofmagnitudelesstime(24:2secondstobeexact)onthesameplatform.The¯ll-factorwas0.99whichwasevenhigherthanthatforcirc2a.Weexaminedthesizesandnumbersofnonzeroentriesofthematricesobtainedateachlevel.ThemethodstoppedprematurelywithaCblockofsizezeroyieldingthepreviouslevel(3rdoneinthiscase)asthelastlevel.Thislevelhadasizeof4,679whichissomewhatlarge.Indeed,no¯ll-minimizingorderingisperformedpriortofactoringthelastSchurcomplement,becauseinmostcasesthelastSchurcomplementisasmallmatrixofsizeafewhundredsatmost.However,furtherexaminationrevealedthat,quiteremarkably,thislastSchurcomplementisadiagonalmatrix.Thisinfactexplainswhythealgorithmstoppedprematurely.Intheend,wereachedtheonlyplausibleexplanationthatsomeintermediatematricesthatweregeneratedwerequitedense,beforetheyweresparsi¯edbydropping.Selectingadi®erent¯lterparameter¿0canforcealargernumberoflevelsandreducetheset-upsubstantiallyinthiscase.Forexample,with¿0=0:7,18levelswereused,witha¯llfactorof0.77,quiteclosetothepreviouscase.Theset-uptimewasreducedto43seconds.ThenumberofGMRESstepswasalsoreducedfrom18to12(buttheiterationtimeincreasedslightlyto12.1secondsduetothelargernumberoflevels).6.6AvailablesoftwareandothertestsThealgorithmsdescribedinthispaperarepartofarecentpackage,calledILUPACK,developedprimarilybyMatthiasBollhÄofer[11].ThepackageisavailablefromBollhÄofer'sweb-site,see[12].Ito®ersalargeselectionofpreconditioners,manyofwhichutilizetheinverse-baseddroppingtechnique25 mentionedintheintroduction,seealso[9].Alongwithsoftware,thesiteo®ersalsoarepositoryofstatisticsonasizablenumberofnumericalexperimentswithvariousmethods.TheseexperimentsshowinparticularanumberofcomparisonswiththesimilarMC-64strategy.7ConclusionWeconcludewithafewremarksandobservationsregardingtheperformanceofARMS-Candonissuesremainingtobeexamined.Our¯rstobservationisthatthemethodseemstoperformwellasageneralpurposealternativetoadirectsolver.Thoughthealgorithmsdonotcompetewiththerobustnessofdirectsolvers,theycanoftentackleproblemsthatwereinaccessibletoiterativesolverssofar,atasmallfractionofthememoryandcomputationalcostsrequiredbydirectsolvers.NotealsothatthemethodsperforminasimilarwayforproblemswithaverypoorstructureasforsystemswhicharisefromdiscretizedPDEs.Thereare,however,anumberofquestionsthatremaintobeansweredandissuestobeexaminedforabetterunderstandingofthegeneralapproachproposedinthispaper.Oneoftheprimaryconcernswithastrategysuchastheonedescribedinthispaperisthelargenumberoftunableparametersavailable.Withsuchvarietyitmaybeindeeddi±cultto¯ndthebest¯tforagivenlinearsystem.However,inasoftwarepackage,onemaysimply¯xmostoftheseparameterswithoutcausingseriousharmtoperformance.Theprimarycandidatesforthis`hard-wiring'arethemaximumnumberoflevelsallowed(alongwiththeminimumsizeoftheSchurcomplementbeforeswitchingtoILUTP),andthediagonaldominance¯ltrationparameter¿0.Indeed,whenthebettermatchingalgorithmsareused(Alg.4.3orAlg.4.2)performancedoesnotvarytoomuchwiththeseparametersformostproblems.Theotherparametersleftarethosethatcontrol¯ll-in,i.e.,theaccuracyofthefactorization.Althoughourresearchcodeprovidesseparate¯ll-inparametersforvariouspartsofthefactorizationatagivenlevel,thesecanbetakentobethesameaswasdoneinmostofourexperiments.Thedroptoleranceand¯ll-inparameterscanbe¯xedtoprovideaveryaccuratefactorizationforthelastlevelSchurcomplementmatrix.Intheendallthatremainstoselectarethedroptolerancesandthemax¯ll-inallowedfortheinterlevelmatrices(thoseforLU-BandGWinthevarioustables).Thiswouldnotbetoodi®erentfromtheparametersrequiredbyILUT[35].Weobservedthat¯ll-indoesnotnecessarilyimprovethequalityofthefactorization.Infactthecontraryisoftenseen.WhilethesameobservationcanbemadeforILUT,wenotethatitisnotascommon.Thisbehaviorunderscoresthedi±cultyinpredictingthee®ectofdroppingsmalltermsinthefactors,ontheconditioningofthepreconditionedsystem.Thedroppingstrategieswhichweretestedutilizeasimplethresholdcriterionbutitisclearthatonebasedonestimatingtheconditionnumberofthefactors[9]wouldbeidealinthissituation.AdiagonaldominancecriterionwasselectedasameansofbuildingagoodBblock,becauseitiseasytoapplyandensureslargediagonalentriesinarelativesensetothe1-normoftherow.However,onecanaskifitispossibleto¯ndamoreelaboratecriterion,forexampleonethattakesintoaccountthe\stability"oftheresultingfactors.Arelatedsubquestionistheproblemofselectingtheparameter¿.Thechoicemadeinthispaperusesarelativemeasurewhichkeepsthebestrowsaccordingtoarulebasedonaparameterinputbytheuser.Thereremainsto¯ndarigorousway26 toselecttheparameter¿.Anotherimportantissuethatdeservesfurtherinvestigationisthatofscaling.Aswasreported,theperformanceofthepreconditionersissigni¯cantlya®ectedbyscaling.Yet,itstillremainsdi±culttodevisescalingstrategieswhosee®ectsonthepreconditionerarewellunderstoodandpredictable.Finally,ourcurrentcodesdonotattempttoreduce¯ll-inbyothermeansthandropping.Asimplestrategywhichcanbeemployedistouse¯ll-reducing(symmetric)orderingsontheBmatricesaftertheyhavebeengenerated.Acknowledgements.ThecodeARMS-CdiscussedintheexperimentshasbeenadaptedfromARMS[36],aCcodewhichwasdevelopedjointlywithBrianSuchomel.IamgratefultoMatthiasBollhÄoferfornumerousinsightfuldiscussionsonthetopicofthispaper.Thenumericalexperimentswouldnothavebeenpossiblewithouttheavailabilityofasizablecollectionoftestmatrices.IwouldliketothankJohnHawsforprovidingthematricesusedinSection6.4,AchimBasermannforsendingushiscircuitsimulationmatrices,andOlafSchenkformakingavailablethelargedeviceandcircuitmatricesdiscussedinSection6.5.SpecialthankstoTimDavisformakingtheseandotherlinearsystemsavailableinhisexcellentrepositoryoftestmatrices.Finally,Iamgratefultotherefereesfortheirnumeroussuggestionswhichhelpedimprovethequalityofthispaper.References[1]R.E.BankandC.Wagner.MultilevelILUdecomposition.NumerischeMathematik,82(4):543{576,1999.[2]A.Basermann,U.Jaekel,andK.Hachiya.Preconditioningparallelsparseiterativesolversforcircuitsimulation.InSIAMconferenceonAppliedLinearAlgebra,July15-19,2003,Williams-burg,VA,pagesCP{2,2004.[3]M.Benzi,J.C.Haws,andTºuma.Preconditioninghighlyinde¯niteandnonsymmtricmatrices.SIAMJournalonScienti¯cComputing,22(4):1333{1353,2000.[4]M.Benzi,R.Kouhia,andTºuma.Stabilizedandblockapproximateinversepreconditionersforproblemsinsolidandstructuralmechanics.Comput.MethodsAppl.Mech.Engrg.,190:6533{6554,2001.[5]M.Benzi,C.D.Meyer,andTºuma.Asparseapproximateinversepreconditionerfortheconjugategradientmethod.SIAMJournalonScienti¯cComputing,17:1135{1149,1996.[6]M.BenziandTºuma.Orderingsforfactorizedsparseapproximateinversepreconditioners.SIAMJournalonScienti¯cComputing,21:1851{1868,2000.[7]M.BenziandM.Tºuma.Asparseapproximateinversepreconditionerfornonsymmetriclinearsystems.SIAMJournalonScienti¯cComputing,19(3):968{994,1998.27 [8]M.BenziandM.Tºuma.Arobustincompletefactorizationpreconditionerforpositivede¯nitematrices.Numer.Lin.Alg.w.Appl.,10:385{400,2003.[9]M.BollhÄofer.ArobustILUwithpivotingbasedonmonitoringthegrowthoftheinversefactors.LinearAlgebraanditsApplications,338(1-3):201{213,2001.[10]M.BollhÄoferandY.Saad.OntherelationsbetweenILUsandfactoredapproximateinverses.SIAMJournalonMatrixAnalysisandApplications,24:219{237,2002.[11]M.BollhÄoferandY.Saad.Multilevelpreconditionersconstructedfrominverse{basedILUs.TechnicalReportumsi-2004-75,MinnesotaSupercomputerInstitute,UniversityofMinnesota,Minneapolis,MN,2004.submitted.[12]MatthiasBollhÄoferandYousefSaad.ILUPACK-preconditioningsoftwarepackage,releasev1.0,may14,2004.Availableonlineathttp://www.tu-berlin.de/ilupack/.[13]C.W.BomhofandH.A.VanderVorst.Aparallellinearsystemsolverforcircuitsimulationproblems.NumericalLinearAlgebrawithApplications,7:649{665,2000.[14]E.F.F.BottaandF.W.Wubs.MatrixRenumberingILU:ane®ectivealgebraicmultilevelILU.SIAMJournalonMatrixAnalysisandApplications,20:1007{1026,1999.[15]E.ChowandY.Saad.Approximateinversepreconditionersviasparse-sparseiterations.SIAMJournalonScienti¯cComputing,19:995{1023,1998.[16]T.Davis.UniversityofFloridasparsematrixcollection,http://www.cise.u°.edu/research/sparse,June1997.[17]T.A.Davis.AparallelalgorithmforsparseunsymmetricLUfactorizations.PhDthesis,UniversityofIllinoisatUrbanaChampaign,Urbana,IL.,1989.[18]I.S.Du®.Asurveyofsparsematrixresearch.InProceedingsoftheIEEE,65,pages500{535,NewYork,1977.PrenticeHall.[19]I.S.Du®.Algorithm575:Permutationsforazero-freediagonal.ACMTransactionsonMathematicalSoftware,7:387{390,1981.[20]I.S.Du®.Onalgorithmsforobtainingamaximaltransversal.ACMTransactionsonMathe-maticalSoftware,7:315{330,1981.[21]I.S.Du®,A.M.Erisman,andJ.K.Reid.DirectMethodsforSparseMatrices.ClarendonPress,Oxford,1986.[22]I.S.Du®,R.G.Grimes,andJ.G.Lewis.User'sguidefortheHarwell-Boeingsparsematrixcollection.TechnicalReportTR/PA/92/86,CERFACS,Toulouse,France,1992.[23]I.S.Du®andJ.Koster.Thedesignanduseofalgorithmsforpermutinglargeentriestothediagonalofsparsematrices.SIAMJournalonMatrixAnalysisandApplications,20:889{901,1999.28 [24]I.S.Du®andJ.Koster.Onalgorithmsforpermutinglargeentriestothediagonalofasparsematrix.SIAMJournalonMatrixAnalysisandApplications,22(4):973{996,2001.[25]J.A.GeorgeandJ.W-H.Liu.ComputerSolutionofLargeSparsePositiveDe¯niteSystems.Prentice-Hall,EnglewoodCli®s,NJ,1981.[26]G.H.GolubandC.VanLoan.MatrixComputations,3rdedn.TheJohnHopkinsUniversityPress,Baltimore,1996.[27]M.J.GroteandT.Huckle.Parallelpreconditioningswithsparseapproximateinverses.SIAMJournalonScienti¯cComputing,18:838{853,1997.[28]J.C.HawsandC.D.Meyer.PreconditioningKKTsystems.TechnicalReportM&CT-TECH-02-002,Boeing,2002.[29]L.Yu.KolotilinaandA.Yu.Yeremin.Onafamilyoftwo-levelpreconditioningsofthein-completeblockfactorizationtype.SovietJournalofNumericalAnalysisandMathematicalModeling,1:293{320,1986.[30]R.Leuze.IndependentsetorderingsforparallelmatrixfactorizationsbyGaussianelimination.ParallelComputing,10:177{191,1989.[31]N.Li,Y.Saad,andE.Chow.CroutversionsofILUforgeneralsparsematrices.SIAMJournalonScienti¯cComputing,25(2):716{728,2003.[32]M.Luby.Asimpleparallelalgorithmforthemaximumindependentsetproblem.SIAMJ.onComputing,14(4):1036{1053,1986.[33]M.OlschowkaandA.Neumaier.Anewpivotingstrategyforgaussianelimination.LinearAlgebraanditsApplications,240:131{151,1996.[34]Y.Saad.ILUM:amulti-eliminationILUpreconditionerforgeneralsparsematrices.SIAMJournalonScienti¯cComputing,17(4):830{847,1996.[35]Y.Saad.IterativeMethodsforSparseLinearSystems,2ndedition.SIAM,Philadelpha,PA,2003.[36]Y.SaadandB.Suchomel.ARMS:Analgebraicrecursivemultilevelsolverforgeneralsparselinearsystems.NumericalLinearAlgebrawithApplications,9,2002.[37]Y.SaadandJ.Zhang.BILUTM:Adomain-basedmulti-levelblockILUTpreconditionerforgeneralsparsematrices.SIAMJournalonMatrixAnalysisandApplications,21:279{299,2000.[38]O.Schenk,S.RÄollin,andA.Gupta.Thee®ectsofnonsymmetricmatrixpermutationsandscalingsinsemiconductordeviceandcircuitsimulation.IEEEtrans.onComputerAideddesignofIntegratedCircuitsandsystems,23(3):400{411,2004.29