/
Notes on Cholesky Factorization Robert A Notes on Cholesky Factorization Robert A

Notes on Cholesky Factorization Robert A - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
475 views
Uploaded On 2014-12-12

Notes on Cholesky Factorization Robert A - PPT Presentation

van de Geijn Department of Computer Science Institute for Computational Engineering and Sciences The University of Texas at Austin Austin TX 78712 rvdgcsutexasedu March 11 2011 1 De64257nition and Existence The Cholesky factorization is only de6 ID: 22866

van Geijn Department

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Notes on Cholesky Factorization Robert A" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

forj=1:n j;j:=p j;jfori=j+1:n i;j:= i;j= j;jendforfork=j+1:nfori=k:n i;k:= i;k� i;j k;jendforendforendfor forj=1:n j;j:=p j;j9�=�; j+1:n;j:= j+1:n;j= j;j9������=������; j+1:n;j+1:n:= j+1:n;j+1:n�tril( j+1:n;j Tj+1:n;j)endforFigure1:FormulationsoftheCholeskyfactorizationthatexposeindicesusingMatlab-likenotation. partthatisthenoverwrittenwiththeresult.Inthisdiscussion,wewillassumethatthelowertriangularpartofAisstoredandoverwritten.2ApplicationTheCholeskyfactorizationisusedtosolvethelinearsystemAx=ywhenAisSPD:SubstitutingthefactorsintotheequationyieldsLLTx=y.Lettingz=LTx,Ax=L(LTx)| {z }z=Lz=y:Thus,zcanbecomputedbysolvingthetriangularsystemofequationsLz=yandsub-sequentlythedesiredsolutionxcanbecomputedbysolvingthetriangularlinearsystemLTx=z.3AnAlgorithm(Variant3)ThemostcommonalgorithmforcomputingA:=Chol(A)canbederivedasfollows:Con-siderA=LLT.PartitionA= 11 ? a21 A22!andL= 11 0 l21 L22!: (1) 2 Lemma8.LetA2RmmbeSPDandl21=a21=p 11.ThenA22�l21lT21isSPD.Proof:SinceAissymmetricsoareA22andA22�l21lT21.Letx16=0beanarbitraryvectoroflengthn�1.De nex= 0 x1!where0=�aT21x1= 11.Then,sincex6=0,0xTAx= 0 x1!T 11 aT21 a21 A22! 0 x1!= 0 x1!T 110+aT21x1 a210+A22x1!= 1120+0aT21x1+xT1a210+xT1A22x1= 11aT21x1 11xT1a21 11�xT1a21 11aT21x1�xT1a21aT21x1 11+xT1A22x1=xT1(A22�a21aT21 11)x1=xT1(A22�l21lT21)x1:WeconcludethatA22�l21lT21isSPD.Proof:ofCholeskyFactorizationTheoremProofbyinduction. Basecase:n=1.Clearlytheresultistruefora11matrixA= 11:Inthiscase,thefactthatAisSPDmeansthat 11isrealandpositiveandaCholeskyfactoristhengivenby11=p 11,withuniquenessifweinsistthat11ispositive. Inductivestep:AssumetheresultistrueforSPDmatrixA2R(n�1)(n�1).WewillshowthatitholdsforA2Rnn.LetA2RnnbeSPD.PartitionAandLasin(4)andlet11=p 11(whichiswell-de nedbyLemma4),l21=a21=11,andL22=Chol(A22�l21lT21)(whichexistsasaconsequenceoftheInductiveHypothesisandLemma4).ThenListhedesiredCholeskyfactorofA. Bytheprincipleofmathematicalinduction,thetheoremholds.5BlockedAlgorithm(Variant3)Inordertoattainhighperformance,thecomputationiscastintermsofmatrix-matrixmultiplicationbyso-calledblockedalgorithms.FortheCholeskyfactorizationablockedversionofthealgorithmcanbederivedbypartitioningA! A11 ? A21 A22!andL! L11 0 L21 L22!; 4 donedonedonepartiallyupdatedBeginningofiteration ATLABL?ABR## Repartition A00??aT10 11?A20a21A22## UPD.UPD.UPD.Update p 11a21 11A22�a21aT21## @@RdonedonedonepartiallyupdatedEndofiteration ATLABL?ABRFigure3:Left:ProgressionofpicturesthatexplainCholeskyfactorizationalgorithm.Right:Samepictures,annotatedwithlabelsandupdates. Beginningofiteration: Atsomestageofthealgorithm(Topoftheloop),thecomputationhasmovedthroughthematrixtothepointindicatedbythethicklines.Noticethatwehave nishedwiththepartsofthematrixthatareinthetop-left,top-right(whichisnottobetouched),andbottom-leftquadrants.Thebottom-rightquadranthasbeenupdatedtothepointwhereweonlyneedtoperformaCholeskyfactorizationofit. Repartition: Wenowrepartitionthebottom-rightsubmatrixtoexpose 11,a21,andA22. 6 Remark11.ClearlyFig.4doesnotpresentthealgorithmasconciselyasthealgorithmsgiveninFigs.1and2.However,itdoescapturetoalargedegreetheverbaldescriptionofthealgorithmmentionedaboveandtherefore,inouropinion,reducesboththee ortrequiredtointerpretthealgorithmandtheneedforadditionalexplanations.ThenotationalsomirrorsthatusedfortheproofinSection4.Remark12.ThenotationinFigs.3and4allowsthecontentsofmatrixAatthebeginningoftheiterationtobeformallystated:A= ATL ? ABL ABR!= LTL ? LBL ^ABR�tril(LBLLTBL)!;whereLTL=Chol(^ATL),LBL=^ABLL�TTL,and^ATL;^ABLand^ABRdenotetheoriginalcontentsofthequadrantsATL;ABLandABR,respectively.Exercise13.ImplementtheCholeskyfactorizationwithM-script.7CostThecostoftheCholeskyfactorizationofA2Rmmcanbeanalyzedasfollows:InFig.4(left)duringthekthiteration(startingkatzero)A00iskk.Thus,theoperationsinthatiterationcost  11:=p 11:negligiblewhenkislarge.  a21:=a21= 11:approximately(m�k�1) ops.  A22:=A22�tril(a21aT21):approximately(m�k�1)2 ops.(Arank-1updateofallofA22wouldhavecost2(m�k�1)2 ops.ApproximatelyhalftheentriesofA22areupdated.)Thus,thetotalcostin opsisgivenbyCChol(m)m�1Xk=0(m�k�1)2| {z }(DuetoupdateofA22)+m�1Xk=0(m�k�1)| {z }(Duetoupdateofa21)=m�1Xj=0j2+m�1Xj=0j1 3m3+1 2m21 3m3whichallowsustostatethat(obvious)mostcomputationisintheupdateofA22. 8 Algorithm:A:=Chol unb(A) PartitionA! ATL ? ABL ABR!whereATLis00 whilem(ATL)m(A)do Repartition ATL ? ABL ABR!!0B@A00 ? ? aT10 11 ? A20 a21 A221CAwhere 11is11 Variant1 (BorderedAlgorithm)aT10:=aT10tril(A00)�T 11:= 11�aT10a10 11:=p 11 Variant2 (Left-lookingAlgorithm) 11:= 11�aT10a10 11:=p 11a21:=a21�A20a10a21:=a21= 11 Variant3 (Right-lookingAlgorithm) 11:=p 11a21:=a21= 11A22:=A22�tril(a21aT21) Continuewith ATL ? ABL ABR! 0B@A00 ? ? aT10 11 ? A20 a21 A221CA endwhile Algorithm:A:=Chol blk(A) PartitionA! ATL ? ABL ABR!whereATLis00 whilem(ATL)m(A)do DetermineblocksizebRepartition ATL ? ABL ABR!!0B@A00 ? ? A10 A11 ? A20 A21 A221CAwhereA11isbb Variant1 (BorderedAlgorithm)A10:=A10tril(A00)�TA11:=A11�A10AT10A11:=Chol(A11) Variant2 (Left-lookingAlgorithm)A11:=A11�A10AT10A11:=Chol(A11)A21:=A21�A20AT10A21:=A21tril(A11)�T Variant3 (Right-lookingAlgorithm)A11:=Chol(A11)A21:=A21tril(A11)�TA22:=A22�tril(A21AT21) Continuewith ATL ? ABL ABR! 0B@A00 ? ? A10 A11 ? A20 A21 A221CA endwhile Figure5:MultiplevariantsforcomputingtheCholeskyfactorization. 10 fromwhichweconcludethatA00=L00LT00 ? ? aT10=lT10LT00 11=lT10l10+211 ? A20=L20LT00 a21=L20l10+11l21 A22=L20LT20+l21lT21+L22LT22:Now,letusassumethatinpreviousiterationsoftheloopthecomputationhasproceededsothatAcontains0B@L00 0 0 lT10 11 0 L20 a21 A221CA:ThepurposeofthecurrentiterationistomakeitsothatAcontains0B@L00 0 0 lT10 11 0 L20 l21 A221CA:Thefollowingalgorithmaccomplishesthis: 1. PartitionA!0B@A00 ? ? aT10 11 ? A20 a21 A221CAandassumethatduetocomputationfrompreviousiterationsitcontains0B@L00 0 0 lT10 11 0 L20 a21 A221CA: 2. Overwrite 11:=11=p 11�lT10l10. 3. Overwritea21:=a21�L20l10. 4. Overwritea21:=l21=a21=11.Thisjusti estheunblockedVariant2inFigure5.Ablockedalgorithmcanbesimilarlyderived.PartitionA=0B@A00 ? ? A10 A11 ? A20 A21 A221CAandL=0B@L00 0 0 L10 L11 0 L20 L21 L221CA (4) 12 3. OverwriteA21:=A21=A21�L20LT10. 4. OverwriteA21:=L21=A21L�T11.Thisjusti estheblockedVariant2inFigure5.9CodingthealgorithmsAspartoftheFLAMEprojectatTheUniversityofTexasatAustin,UniversidadJaumeI,Spain,andRWTHAachenUniversity,Germany,wehavedevelopedAPIsforcodingthesealgorithmsthatmakeitsothatthecodecloselyresemblesthealgorithms.Thedemonstratehowthesecodesarewritten,wecreatedavideotitledHigh-PerformanceImplementationofCholeskyFactorizationthatcanbefoundat http://www.cs.utexas.edu/users/flame/Movies.html#Chol.Exercise14.ImplementallthealgorithmsinFigure5inM-script(thescriptinglanguageofMatlabandOctave).Exercise15.ImplementallthealgorithmsinFigure5usingtheFLAME/CAPImentionedinthevideo.10PerformanceInFigure6weshowtheperformanceattainedonamulticorearchitecture.Allexperimentswereperformedusingdouble-precision oating-pointarithmeticonaDell/PowerEdge-1435poweredbyaIntelXeonX5355(2.666GHz)processor.Thisparticulararchitecturehasfourcores,eachofwhichhasaclockrateofabout2.7GHzandeachcorecanperformfour oatingpointoperationsperclockcycle.Thus,thepeakofthismachineis4cores4FLOPS cycle2:7109cycles core43109FLOPS=43GFLOPS:Whatthegraphshowsistheproblemsizeforwhichperformancewasmeasuredalongthex-axisandtherateatwhichthealgorithmcomputedalongthey-axis.Theratewascomputedby,foragiventime,measuringthetimerequiredtocompletethecomputation,t(n)(inseconds),andthendividingthenumberof oatingpointoperationsbythistime.Thisgivestherate,in oatingpointoperationspersecond(FLOPS),atwhichthecomputationexecuted.Sincethisisanumbermeasuredinbillions,wethendividethisnumberbyabilliontogettherateinGFLOPS(GigaFLOPS,orbillionsof oatingpointoperationspersecond):rateinGFLOPS=n3 t(n)109: 14 algorithmsrelatedtotheinversionofasymmetricpositivede nitematrix.ACMTransactionsonMathematicalSoftware,35(1). 2. Twopapersthatshowhowmatrix-matrixoperationscanachievenear-peakperfor-mance: KazushigeGotoandRobertA.vandeGeijn.Anatomyofhigh-performancematrixmultiplication.ACMTrans.Math.Soft.,34(3:Article12,25pages),May2008.and KazushigeGotoandRobertvandeGeijn.High-performanceimplementa-tionofthelevel-3BLAS.ACMTrans.Math.Softw.,35(1):1{14,2008. 3. Abookthatshowshowtosystematicallyderiveallloop-basedalgorithmsforabroadrangeoflinearalgebraoperations,includingCholeskyfactorization: RobertA.vandeGeijnandEnriqueS.Quintana-Ort.TheScienceofPro-grammingMatrixComputations.http://www.lulu.com/content/1911788,2008. 4. TheUser'sGuideforthelibflamelibrary,amodernreplacementfortheLAPACKlibrarythatiscodedusingtheAPIsmentionedinthemovie: FieldG.VanZee.libflame:TheCompleteReference.www.lulu.com,2009. 5. AnoverviewoftheFLAMEproject: FieldG.VanZee,ErnieChan,RobertvandeGeijn,EnriqueS.Quintana-Ort,andGregorioQuintana-Ort.Introducing:Thelib amelibraryfordensematrixcomputations.IEEEComputationinScience&Engineering,11(6):56{62,2009. 16