TheHumbleGaussianDistributionDavidJCMacKayCavendishLaboratoryCambridgeCB30HEUnitedKingdomJune112006Draft10AbstractTheseareelementarynotesonGaussiandistributionsaimedatpeoplewhoareabouttolearnabo ID: 507682
Download Pdf The PPT/PDF document "Preambleto`TheHumbleGaussianDistribution..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Preambleto`TheHumbleGaussianDistribution'.DavidMacKay1GaussianQuizy1y3y2H11.Assumingthatthevariablesy1,y2,y3inthisbeliefnetworkhaveajointGaussiandistribution,whichofthefollowingmatricescouldbethecovariancematrix?ABCD2649313931393752648 31 39 31 383752649303930393752649 30 310 30 393752.Whichofthematricescouldbetheinversecovariancematrix?y1y3y2H23.Whichofthematricescouldbethecovariancematrixofthesecondgraphicalmodel?4.Whichofthematricescouldbetheinversecovariancematrixofthesecondgraphicalmodel?5.Letthreevariablesy1,y2,y3havecovariancematrixK(3),andinversecovariancematrixK 1(3).K(3)=2641:50:51:50:51375K 1(3)=2641:5 1:5 12 1:5 11:5375Nowfocusonthevariablesy1andy2.WhichstatementsabouttheircovariancematrixK(2)andinversecovariancematrixK 1(2)aretrue?(A)(B)K(2)="1:5:51#K 1(2)="1:5 1 12# TheHumbleGaussianDistributionDavidJ.C.MacKayCavendishLaboratoryCambridgeCB30HEUnitedKingdomJune11,2006{Draft1.0AbstractTheseareelementarynotesonGaussiandistributions,aimedatpeoplewhoareabouttolearnaboutGaussianprocesses.Iemphasizethefollowingpoints.Whathappenstoacovariancematrixandinversecovariancematrixwhenweomitavariable.Whatitmeanstohavezerosinacovariancematrix.Whatitmeanstohavezerosinaninversecovariancematrix.Howprobabilisticmodelsexpressedintermsof`energies'relatetoGaussians.Whyeigenvectorsandeigenvaluesdon'thaveanyfundamentalstatus.1IntroductionLet'schataboutaGaussiandistributionwithzeromean,suchasP(y)=1Ze 12yTAy;(1)whereA=K 1istheinverseofthecovariancematrix,K.I'mgoingtoemphasizedimensionsthroughoutthisnote,becauseIthinkdimension-consciousnessenhancesunderstanding.1I'llwriteK=264K11K12K13K12K22K23K13K23K33375(4)1It'sconventionaltowritethediagonalelementsinKas2iandtheodiagonalelementsasij.ForexampleK=2421121312222313232335(2)Aconfusingconvention,sinceitimpliesthatijhasdierentdimensionsfromi,evenifallaxesi,jhavethesamedimensions!Anotherwayofwritingano-diagonalcoecientisKij=ijij;(3)whereisthecorrelationcoecientbetweeniandj.Thisisabetternotationsinceit'sdimensionallyconsistentinthewayitusestheletter.ButIwillstickwiththenotationKij.2 ThedenitionofthecovariancematrixisKij=hyiyji(5)sothedimensionsoftheelementKijare(dimensionsofyi)times(dimensionsofyj).1.1ExamplesLet'sworkthroughafewgraphicalmodels.y1y3y2H1y1y3y2H2Example1Example21.1.1Example1Maybey2isthetemperatureoutsidesomebuildings(orrather,thedeviationoftheoutsidetemperaturefromitsmean),andy1isthetemperaturedeviationinsidebuilding1,andy3isthetemperatureinsidebuilding3.Thisgraphicalmodelsaysthatifyouknowtheoutsidetemperaturey2theny1andy3areindependent.Let'sconsiderthisgenerativemodel:y2=2(6)y1=w1y2+1(7)y3=w3y2+3;(8)wherefigareindependentnormalvariableswithvariancesf2ig.Thenwecanwritedowntheentriesinthecovariancematrix,startingwiththediagonalentriesK11=hy1y1i=h(w12+1)(w12+1)i=w21h22i+2w1h12i+h12i=w2122+21(9)K22=22(10)K33=w2322+23(11)Sowecanllinthismuch:K=264K11K12K13K12K22K23K13K23K33375=264w2122+2122w2322+23375(12)TheodiagonaltermsareK12=hy1y2i=h(w12+1)(2)i=w122(13)(andsimilarlyforK23)andK13=hy1y3i=h(w12+1)(w32+3)i=w1w322(14)3 Sothecovariancematrixis:K=264K11K12K13K12K22K23K13K23K33375=264w2122+21w122w1w32222w322w2322+23375(15)(wheretheremainingblankelementscanbelledinbysymmetry).Nowlet'sthinkabouttheinversecovariancematrix.Onewaytogettoitistowritedownthejointdistribution.P(y1;y2;y3jH1)=P(y2)P(y1jy2)P(y3jy2)(16)=1Z2exp y22222!1Z1exp (y1 w1y2)2221!1Z3exp (y3 w3y2)2223!(17)Wecannowcollectallthetermsinyiyj.P(y1;y2;y3)=1Z0exp y22222 (y1 w1y2)2221 (y3 w3y2)2223!=1Z0exp y22"1222+w21221+w23223# y211221+2y1y2w1221 y231223+2y3y2w3223!=1Z0exp0BBBBBBBB@ 12hy1y2y3i2666666664121 w1210 w121"122+w2121+w2323# w3230 w323123377777777526666664y1y2y3377777751CCCCCCCCASotheinversecovariancematrixisK 1=2666666664121 w1210 w121"122+w2121+w2323# w3230 w3231233777777775TherstthingI'dlikeyoutonoticehereisthezeroes.[K 1]13=0.Themeaningofazeroinaninversecovariancematrix(atlocationi;j)isconditionalonalltheothervariables,thesetwovariablesiandjareindependent.Next,noticethatwhereasy1andy2werepositivelycorrelated(assumingw10),thecoecient[K 1]12isnegative.It'scommonthatacovariancematrixKinwhichalltheelementsarenon-negativehasaninversethatincludessomenegativeelements.Sopositiveo-diagonaltermsinthecovariancematrixalwaysdescribepositivecorrelation;buttheo-diagonaltermsintheinversecovariancematrixcan'tbeinterpretedthatway.Thesignofanelement(i;j)intheinversecovariancematrixdoesnottellyouaboutthecorrelationbetweenthosetwovariables.Forexample,remember:thereisazeroat[K 1]13.Butthatdoesn'tmeanthatvariablesy1andy3areuncorrelated.Thankstotheirparenty2,theyarecorrelated,withcovariancew1w322.Theo-diagonalentry[K 1]ijinaninversecovariancematrixindicateshowyiandyjarecorrelatedifweconditiononalltheothervariablesapartfromthosetwo:if[K 1]ij0,theyarepositivelycorrelated,conditionedontheothers;if[K 1]ij-1.2;虔0,theyarenegativelycorrelated.4 Theinversecovariancematrixisgreatforreadingoutpropertiesofconditionaldistributionsinwhichweconditiononallthevariablesexceptone.Forexample,lookathK 1i11=121;ifweknowy2andy3,thentheprobabilitydistributionofy1isGaussianwithvariance1=[K 1]11.Thatonewaseasy.LookathK 1i22="122+w2121+w2323#.ifweknowy1andy3,thentheprobabilitydistributionofy2isGaussianwithvariance1[K 1]22=1122+w2121+w2323:(18)That'snotsoobvious,butit'sfamiliarifyou'veappliedBayestheoremtoGaussians{whenwedoinferenceofaparentlikey2givenitschildren,theinverse-variancesofthepriorandthelikelihoodsadd.Here,theparentvariable'sinversevariance(alsoknownasitsprecision)isthesumoftheprecisioncontributedbytheprior122,theprecisioncontributedbythemeasurementofy1,w2121,andtheprecisioncontributedbythemeasurementofy3,w2323.Theo-diagonalentriesinKtellushowthemeanof[theconditionaldistributionofonevariablegiventheothers]dependson[theothers].Let'stakevariabley3conditionedontheothertwo,forexample.P(y3jy1;y2;H1)/P(y1;y2;y3jH1)/1Z0exp0BBBBBBB@ 12hy1y2y3i266666664121 w1210 w121"122+w2121+w2323# w3230 w32312337777777526666664y1y2y3377777751CCCCCCCALet'shighlightinBluethetermsy1,y2thatarexedandknownanduninteresting,andhighlightinGreeneverythingthatismultiplyingtheinterestingtermy3.P(y3jy1;y2;H1)/P(y1;y2;y3jH1)/1Z0exp0BBBBBBB@ 12hy1y2y3i266666664121 w1210 w121"122+w2121+w2323# w3230 w32312337777777526666664y1y2y3377777751CCCCCCCAAllthosebluemultipliersinthecentralmatrixaren'tachievinganything.Wecanjustignorethem(andredenetheconstantofproportionality).Forthebenetofanyonewithacolour-blindprinter,hereitisagain:P(y3jy1;y2;H1)/exp0BBBBB@ 12hy1y2y3i266666400000 w3230 w3231233777775266664y1y2y33777751CCCCCA5 P(y3jy1;y2;H1)/exp 12123[y3]2 [y3]"0y1 w323y2#!Weobtainthemeanbycompletingthesquare.2P(y3jy1;y2;H1)/exp0BBBBB@ 12123266664y3 "0y1+w323y2#12337777521CCCCCAInthiscase,thisallcollapsesdown,ofcourse,toP(y3jy1;y2;H1)/exp 12123[y3 w3y2]2!;(19)asdenedintheoriginalgenerativemodel(8).Ingeneral,theodiagonalcoecientsK 1tellusthesensitivityof[themeanoftheconditionaldistribution]totheothervariables.y3jy1;y2= K 113y1 K 123y2K 133(20)Sotheconditionalmeanofy3isalinearfunctionoftheknownvariables,andtheodiagonalentriesinK 1tellusthecoecientsinthatlinearfunction.y1y3y2H1y1y3y2H2Example1Example21.2Example2Here'sanotherexample,wheretwoparentshaveonechild.Forexample,thepriceofelectricityy2fromapowerstationmightdependonthepriceofgas,y1,andthepriceofcarbonemissionrights,y3.y1y3y2H2Example22`Completingthesquare'is12ay2 by=12a(y b=a)2+constant.6 y2=w1y1+w3y3+2(21)Notethattheunitsinwhichgasprice,electricityprice,andcarbonpricearemeasuredarealldierent(poundspercubicmetre,penniesperkWh,andeurospertonne,forexample).Soy1,y3,andy3havedierentdimensionsfromeachother.Mostpeoplewhododatamodellingtreattheirdataas`justnumbers',butIthinkitisausefuldisciplinetokeeptrackofdimensionsandtocarryoutonlydimensionallyvalidoperations.[Dimensionallyvalidoperationssatisfythetworulesofdimensions:(1)onlyadd,subtractandcomparequantitiesthathavelikedimensions;(2)argumentsofallfunctionslikeexp,log,sinmustbedimensionless.Rule2isreallyjustaspecialcaseofrule1,sinceexp(x)=1+x+x2+:::,sotosatisfyrule1,thedimensionsofxmustbethesameasthedimensionsof1.]Whatisthecovariancematrix?Hereweassumethattheparentvariablesy1andy3areuncorrelated.ThecovariancematrixisK=264K11K12K13K12K22K23K13K23K33375=26421w121022+w2121+w2323w32323375(22)Noticethezerocorrelationbetweentheuncorrelatedvariables(1;3).Whatdoyouthinkthe(1;3)entryintheinversecovariancematrixwillbe?Let'sworkitoutinthesamewayasbefore.ThejointdistributionisP(y1;y2;y3jH2)=P(y1)P(y3)P(y2jy1;y3)(23)=1Z1exp y21221!1Z3exp y23223!1Z2exp (y2 w1y1 w3y3)2222!(24)Wecollectallthetermsinyiyj.P(y1;y2;y3)=1Z0exp y21221 y23223 (y2 w1y1 w3y3)2222!=1Z0exp y21"1221+w21222# y221222+2y1y2w1221 y23"1223+w23222#+2y3y2w3222 2y3y1w1w3222!=1Z0exp0BBBBBBBB@ 12hy1y2y3i2666666664"1221+w2122# w122+w1w322 w122122 w322+w1w322 w322"1223+w2322#377777777526666664y1y2y3377777751CCCCCCCCASotheinversecovariancematrixisK 1=2666666664"1221+w2122# w122+w1w322 w122122 w322+w1w322 w322"1223+w2322#3777777775(25)7 Notice(assumingw10andw30)thattheodiagonaltermconnectingaparentandachild[K 1]12isnegativeandtheodiagonaltermconnectingthetwoparents[K 1]13ispositive.Thispositivetermindicatesthat,conditionalonalltheothervariables(i.e.,y2),thetwoparentsy1andy3areanticorrelated.That's`explainingaway'.Onceyouknowthepriceofelectricitywasaverage,forexample,youcandeducethatifgaswasmoreexpensivethannormal,carbonprobablywaslessexpensivethannormal.2OmissionofonevariableConsiderexample1.y1y3y2H1Example1Thecovariancematrixofallthreevariablesis:K=264K11K12K13K12K22K23K13K23K33375=264w2122+21w122w1w32222w322w2322+23375(26)Ifwedecidewewanttotalkaboutthejointdistributionofjusty1andy2,thecovariancematrixissimplythesub-matrix:K2="K11K12K12K22#="w2122+21w12222#(27)Thisfollowsfromthedenitionofthecovariance,Kij=hyiyji:(28)Theinversecovariancematrix,ontheotherhand,doesnotchangeinsuchasimpleway.The33inversecovariancematrixwas:K 1=2666666664121 w1210 w121"122+w2121+w2323# w3230 w3231233777777775Whenweworkoutthe22inversecovariancematrix,alltheBluetermsthatoriginatedfromthechildy3arelost.SowehaveK 12=26664121 w121 w121"122+w2121#377758 Specically,noticethathK 12i22isdierentfromthe(2;2)entryinthethreebythreeK 1.Weconclude:LeavingoutavariableleavesKunchangedbutchangesK 1.Thisconclusionisimportantforunderstandingtheanswertothequestion,`WhenworkingwithGaussianprocesses,whynotparameterizetheinversecovarianceinsteadofthecovariancefunction?'Theansweris:youcan'twritedowntheinversecovarianceassociatedwithtwopoints!Theinversecovariancedependscapriciouslyonwhattheothervariablesare.3EnergymodelsSometimespeopleexpressprobabilisticmodelsintermsofenergyfunctionsthatareminimizedinthemostprobableconguration.Forexample,inregressionwithcubicsplines,aregularizerisdenedwhichdescribestheenergythatasteelrulerwouldhaveifbentintotheshapeofthecurve.Suchmodelsusuallyhavetheform:P(y)=1Ze E(y)T;(29)andinsimplecases,theenergyE(y)maybeaquadraticfunctionofy,suchasE(y)= XijJijyiyj+Xiaiy2i(30)Ifso,thenthedistributionisaGaussian(justlike(1)),andthe`couplings'Jijareminusthecoecientsintheinversecovariancematrix.Asasimpleexample,considerasetofthreemassescoupledbysprings,andsubjectedtothermalperturbations.y1y2y3k01k12k23k34Threemasses,fourspringsTheequilbriumpositionsare(y1;y2;y3;y4)=(0;0;0;0),andthespringconstantsarekij.Theextensionofthesecondspringisy2 y1.TheenergyofthissystemisE(y)=12k01y21+12k12(y2 y1)2+12k23(y3 y2)2+12k34y23=12hy1y2y3i264k01+k12 k120 k12k12+k23 k230 k23k23+k34375264y1y2y3375SoattemperatureT,theprobabilitydistributionofthedisplacementsisGaussianwithinversecovariancematrix1T264k01+k12 k120 k12k12+k23 k230 k23k23+k34375(31)Noticethatthereare0entriesbetweendisplacementsy1andy3,thetwomassesthatarenotdirectlycoupledbyaspring.9 y1y2y3y4y5kkkkkkFigure1.Fivemasses,sixspringsSoinversecovariancematricesaresometimesverysparse.Ifwehavevemassesinarowconnectedbyidenticalspringskforexample,thenK 1=kT266666642 1000 12 1000 12 1000 12 1000 1237777775:(32)Butthissparsitydoesn'tcarryovertothecovariancematrix,whichisK=Tk266666640:830:670:500:330:170:671:331:000:670:330:501:001:501:000:500:330:671:001:330:670:170:330:500:670:8337777775:(33)4EigenvectorsandeigenvaluesaremeaninglessThereseemstobeaknee-jerkreactionwhenpeopleseeasquarematrix:`whatareitseigenvec-tors?'Buthere,wherewearediscussingquadraticforms,eigenvectorsandeigenvalueshavenofundamentalstatus.Theyaredimensionallyinvalidobjects.Anyalgorithmthatfeatureseigen-vectorseitherdidn'tneedtodoso,orshouldn'thavedoneso.(Ithinkthewholeideaofprincipalcomponentanalysisismisguided,forexample.)Hangon,yousay,whataboutthethreemassesexample?Don'tthosethreemasseshavemeaningfulnormalmodes?Yes,theydo,butthosemodesarenottheeigenvectorsofthespringmatrix(31).Remember,Ididn'ttellyouwhatthemassesofthemasseswere!I'mnotsayingthateigenvectorsarenevermeaningful.WhatI'msayingis,inthecontextofquadraticforms12yTAy;(34)eigenvectorsaremeaninglessandarbitrary.Consideracovariancematrixdescribingthecorrelationbetweensomething'smassy1anditslengthy2.K="K11K12K12K22#(35)ThedimensionsofK11aremass-squared.K11mightbemeasuredinkg2,forexample.ThedimensionsofK12hy1y2iaremasstimeslength.K12mightbemeasuredinkgm,forexample.Here'sanexample,whichmightdescribethecorrelationbetweenweightandheightofsomeanimalsinasurvey.K="K11K12K12K22#="10000kg270kgm70kgm1m2#(36)10 -1 0 1 2-200-100 0 100 200length / mmass / kgFigure2.Datasetwithits`eigenvectors'.Asthetextexplains,theeigenvectorsofcovariancematricesaremeaninglessandarbitrary.Theknee-jerkreactionis\let'sndtheprincipalcomponentsofourdata",whichmeans\ignorethosesillydimensionalunits,andjustndtheeigenvectorsof"1000070701#.Butlet'sconsiderwhatthismeans.Aneigenvectorisavectorsatisfying"10000kg270kgm70kgm1m2#e=e:(37)Byaskingforaneigenvector,weareimaginingthattwoequationsaretrue{rst,thetoprow:10000kg2e1+70kgme2=e1;(38)and,second,thebottomrow:70kgme1+1m2e2=e2:(39)Theseexpressionsviolatetherulesofdimensions.Tryallyoulike,butyouwon'tbeabletonddimensionsfore1,e2,andsuchthatrule1issatised.No,no,thematlabloversays,Ileaveoutthedimensions,andIget:[e,v]=eig(s)e=0.0070002-0.9999755v=5.0998e-010.0000e+00-0.9999755-0.00700020.0000e+001.0000e+04Inoticethattheeigenvectors(0:007; 0:9999),and(0:9999;0:007),whicharealmostalignedwiththecoordinateaxes.Veryinteresting!Ialsonoticethattheeigenvaluesare104and0:5.Whataninterestinglylargeeigenvalueratio!Wow,thatmeansthatthereisoneverybigprincipalcomponent,andthesecondoneismuchsmaller.Ooh,howinteresting.11 Butthisisnonsense.Ifwechangetheunitsinwhichwemeasurelengthfrommtocmthenthecovariancematrixcanbewritten:K="K11K12K12K22#="10000kg27000kgcm7000kgcm10000cm2#(40)Thisisexactlythesamecovariancematrixofexactlythesamedata.Buttheeigenvectorsandeigenvaluesarenow:e=-0.707110.70711v=300000.707110.70711017000Figure2illustratesthissituation.Ontheleft,adatasetofmassesandlengthsmeasuredinmetres.Thearrowsshowthe`eigenvectors'.(Thearrowsdon'tlook`orthogonal'inthisplotbecauseastepofoneunitonthex-axishappenstocoverlesspaperthanastepofoneunitonthey-axis.)Ontheright,exactlythesamedatasetbutwithlengthsmeasuredincentimetres.Thearrowsshowthe`eigenvectors'.Inconclusion,eigenvectorsofthematrixinaquadraticformarenotfundamentallymeaningful.[Propertiesofthatmatrixthataremeaningfulincludeitsdeterminant.]4.1AsideThiscomplaintabouteigenvectorscomeshandinhandwithanothercomplaint,about`steepestdescent'.Asteepestdescentalgorithmisdimensionallyinvalid.Astepinaparameterspacedoesnothavethesamedimensionsasagradient.Toturnagradientintoasensiblestepdirection,youneedametric.Themetricdeneshow`big'astepis(inratherthesamewaythatwhengnuplotplottedthedataabove,itchoseaverticalscaleandahorizontalscale).Onceyouknowhowbigalternativestepsare,itbecomesmeaningfultotakethestepthatis`steepest'(thatis,it'sthedirectionwiththebiggestchangeinfunctionvalueperunit`distance'moved).Withoutametric,steepestdescentsalgorithmsarenotcovariant.Thatis,thealgorithmwouldbehavedierentlyifyoujustchangedtheunitsinwhichoneparameterismeasured.Appendix:AnswerstoquizFortherstfour,youcanquicklyguesstheanswersbasedonwhetherthe(1;3)entriesarezeroornot.Foracarefulansweryoushouldalsocheckthatthematricesreallyarepositivedenite(theyare)andthattheyarerealisablebytherespectivegraphicalmodels(whichisn'tguaranteedbytheprecedingconstraints).1.AandB2.CandD3.CandD4.AandB5.Aistrue,Bisfalse.12