/
Preambleto`TheHumbleGaussianDistribution'.DavidMacKay1GaussianQuizy1y3 Preambleto`TheHumbleGaussianDistribution'.DavidMacKay1GaussianQuizy1y3

Preambleto`TheHumbleGaussianDistribution'.DavidMacKay1GaussianQuizy1y3 - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
378 views
Uploaded On 2017-01-08

Preambleto`TheHumbleGaussianDistribution'.DavidMacKay1GaussianQuizy1y3 - PPT Presentation

TheHumbleGaussianDistributionDavidJCMacKayCavendishLaboratoryCambridgeCB30HEUnitedKingdomJune112006Draft10AbstractTheseareelementarynotesonGaussiandistributionsaimedatpeoplewhoareabouttolearnabo ID: 507682

TheHumbleGaussianDistributionDavidJ.C.MacKayCavendishLaboratoryCambridgeCB30HEUnitedKingdomJune11 2006{Draft1.0AbstractTheseareelementarynotesonGaussiandistributions aimedatpeoplewhoareabouttolearnabo

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Preambleto`TheHumbleGaussianDistribution..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Preambleto`TheHumbleGaussianDistribution'.DavidMacKay1GaussianQuizy1y3y2H11.Assumingthatthevariablesy1,y2,y3inthisbeliefnetworkhaveajointGaussiandistribution,whichofthefollowingmatricescouldbethecovariancematrix?ABCD26493139313937526483139313837526493039303937526493031030393752.Whichofthematricescouldbetheinversecovariancematrix?y1y3y2H23.Whichofthematricescouldbethecovariancematrixofthesecondgraphicalmodel?4.Whichofthematricescouldbetheinversecovariancematrixofthesecondgraphicalmodel?5.Letthreevariablesy1,y2,y3havecovariancematrixK(3),andinversecovariancematrixK1(3).K(3)=2641:50:51:50:51375K1(3)=2641:51:5121:511:5375Nowfocusonthevariablesy1andy2.WhichstatementsabouttheircovariancematrixK(2)andinversecovariancematrixK1(2)aretrue?(A)(B)K(2)="1:5:51#K1(2)="1:5112# TheHumbleGaussianDistributionDavidJ.C.MacKayCavendishLaboratoryCambridgeCB30HEUnitedKingdomJune11,2006{Draft1.0AbstractTheseareelementarynotesonGaussiandistributions,aimedatpeoplewhoareabouttolearnaboutGaussianprocesses.Iemphasizethefollowingpoints.Whathappenstoacovariancematrixandinversecovariancematrixwhenweomitavariable.Whatitmeanstohavezerosinacovariancematrix.Whatitmeanstohavezerosinaninversecovariancematrix.Howprobabilisticmodelsexpressedintermsof`energies'relatetoGaussians.Whyeigenvectorsandeigenvaluesdon'thaveanyfundamentalstatus.1IntroductionLet'schataboutaGaussiandistributionwithzeromean,suchasP(y)=1Ze12yTAy;(1)whereA=K1istheinverseofthecovariancematrix,K.I'mgoingtoemphasizedimensionsthroughoutthisnote,becauseIthinkdimension-consciousnessenhancesunderstanding.1I'llwriteK=264K11K12K13K12K22K23K13K23K33375(4)1It'sconventionaltowritethediagonalelementsinKas2iandtheo diagonalelementsasij.ForexampleK=2421121312222313232335(2)Aconfusingconvention,sinceitimpliesthatijhasdi erentdimensionsfromi,evenifallaxesi,jhavethesamedimensions!Anotherwayofwritingano -diagonalcoecientisKij=ijij;(3)whereisthecorrelationcoecientbetweeniandj.Thisisabetternotationsinceit'sdimensionallyconsistentinthewayitusestheletter.ButIwillstickwiththenotationKij.2 Thede nitionofthecovariancematrixisKij=hyiyji(5)sothedimensionsoftheelementKijare(dimensionsofyi)times(dimensionsofyj).1.1ExamplesLet'sworkthroughafewgraphicalmodels.y1y3y2H1y1y3y2H2Example1Example21.1.1Example1Maybey2isthetemperatureoutsidesomebuildings(orrather,thedeviationoftheoutsidetemperaturefromitsmean),andy1isthetemperaturedeviationinsidebuilding1,andy3isthetemperatureinsidebuilding3.Thisgraphicalmodelsaysthatifyouknowtheoutsidetemperaturey2theny1andy3areindependent.Let'sconsiderthisgenerativemodel:y2=2(6)y1=w1y2+1(7)y3=w3y2+3;(8)wherefigareindependentnormalvariableswithvariancesf2ig.Thenwecanwritedowntheentriesinthecovariancematrix,startingwiththediagonalentriesK11=hy1y1i=h(w12+1)(w12+1)i=w21h22i+2w1h12i+h12i=w2122+21(9)K22=22(10)K33=w2322+23(11)Sowecan llinthismuch:K=264K11K12K13K12K22K23K13K23K33375=264w2122+2122w2322+23375(12)Theo diagonaltermsareK12=hy1y2i=h(w12+1)(2)i=w122(13)(andsimilarlyforK23)andK13=hy1y3i=h(w12+1)(w32+3)i=w1w322(14)3 Sothecovariancematrixis:K=264K11K12K13K12K22K23K13K23K33375=264w2122+21w122w1w32222w322w2322+23375(15)(wheretheremainingblankelementscanbe lledinbysymmetry).Nowlet'sthinkabouttheinversecovariancematrix.Onewaytogettoitistowritedownthejointdistribution.P(y1;y2;y3jH1)=P(y2)P(y1jy2)P(y3jy2)(16)=1Z2exp y22222!1Z1exp (y1w1y2)2221!1Z3exp (y3w3y2)2223!(17)Wecannowcollectallthetermsinyiyj.P(y1;y2;y3)=1Z0exp y22222(y1w1y2)2221(y3w3y2)2223!=1Z0exp y22"1222+w21221+w23223#y211221+2y1y2w1221y231223+2y3y2w3223!=1Z0exp0BBBBBBBB@12hy1y2y3i2666666664121w1210w121"122+w2121+w2323#w3230w323123377777777526666664y1y2y3377777751CCCCCCCCASotheinversecovariancematrixisK1=2666666664121w1210w121"122+w2121+w2323#w3230w3231233777777775The rstthingI'dlikeyoutonoticehereisthezeroes.[K1]13=0.Themeaningofazeroinaninversecovariancematrix(atlocationi;j)isconditionalonalltheothervariables,thesetwovariablesiandjareindependent.Next,noticethatwhereasy1andy2werepositivelycorrelated(assumingw1�0),thecoecient[K1]12isnegative.It'scommonthatacovariancematrixKinwhichalltheelementsarenon-negativehasaninversethatincludessomenegativeelements.Sopositiveo -diagonaltermsinthecovariancematrixalwaysdescribepositivecorrelation;buttheo -diagonaltermsintheinversecovariancematrixcan'tbeinterpretedthatway.Thesignofanelement(i;j)intheinversecovariancematrixdoesnottellyouaboutthecorrelationbetweenthosetwovariables.Forexample,remember:thereisazeroat[K1]13.Butthatdoesn'tmeanthatvariablesy1andy3areuncorrelated.Thankstotheirparenty2,theyarecorrelated,withcovariancew1w322.Theo -diagonalentry[K1]ijinaninversecovariancematrixindicateshowyiandyjarecorrelatedifweconditiononalltheothervariablesapartfromthosetwo:if[K1]ij0,theyarepositivelycorrelated,conditionedontheothers;if[K1]ij&#x-1.2;虔0,theyarenegativelycorrelated.4 Theinversecovariancematrixisgreatforreadingoutpropertiesofconditionaldistributionsinwhichweconditiononallthevariablesexceptone.Forexample,lookathK1i11=121;ifweknowy2andy3,thentheprobabilitydistributionofy1isGaussianwithvariance1=[K1]11.Thatonewaseasy.LookathK1i22="122+w2121+w2323#.ifweknowy1andy3,thentheprobabilitydistributionofy2isGaussianwithvariance1[K1]22=1122+w2121+w2323:(18)That'snotsoobvious,butit'sfamiliarifyou'veappliedBayestheoremtoGaussians{whenwedoinferenceofaparentlikey2givenitschildren,theinverse-variancesofthepriorandthelikelihoodsadd.Here,theparentvariable'sinversevariance(alsoknownasitsprecision)isthesumoftheprecisioncontributedbytheprior122,theprecisioncontributedbythemeasurementofy1,w2121,andtheprecisioncontributedbythemeasurementofy3,w2323.Theo -diagonalentriesinKtellushowthemeanof[theconditionaldistributionofonevariablegiventheothers]dependson[theothers].Let'stakevariabley3conditionedontheothertwo,forexample.P(y3jy1;y2;H1)/P(y1;y2;y3jH1)/1Z0exp0BBBBBBB@12hy1y2y3i266666664121w1210w121"122+w2121+w2323#w3230w32312337777777526666664y1y2y3377777751CCCCCCCALet'shighlightinBluethetermsy1,y2thatare xedandknownanduninteresting,andhighlightinGreeneverythingthatismultiplyingtheinterestingtermy3.P(y3jy1;y2;H1)/P(y1;y2;y3jH1)/1Z0exp0BBBBBBB@12hy1y2y3i266666664121w1210w121"122+w2121+w2323#w3230w32312337777777526666664y1y2y3377777751CCCCCCCAAllthosebluemultipliersinthecentralmatrixaren'tachievinganything.Wecanjustignorethem(andrede netheconstantofproportionality).Forthebene tofanyonewithacolour-blindprinter,hereitisagain:P(y3jy1;y2;H1)/exp0BBBBB@12hy1y2y3i266666400000w3230w3231233777775266664y1y2y33777751CCCCCA5 P(y3jy1;y2;H1)/exp 12123[y3]2[y3]"0y1w323y2#!Weobtainthemeanbycompletingthesquare.2P(y3jy1;y2;H1)/exp0BBBBB@12123266664y3"0y1+w323y2#12337777521CCCCCAInthiscase,thisallcollapsesdown,ofcourse,toP(y3jy1;y2;H1)/exp 12123[y3w3y2]2!;(19)asde nedintheoriginalgenerativemodel(8).Ingeneral,theo diagonalcoecientsK1tellusthesensitivityof[themeanoftheconditionaldistribution]totheothervariables.y3jy1;y2=K113y1K123y2K133(20)Sotheconditionalmeanofy3isalinearfunctionoftheknownvariables,andtheo diagonalentriesinK1tellusthecoecientsinthatlinearfunction.y1y3y2H1y1y3y2H2Example1Example21.2Example2Here'sanotherexample,wheretwoparentshaveonechild.Forexample,thepriceofelectricityy2fromapowerstationmightdependonthepriceofgas,y1,andthepriceofcarbonemissionrights,y3.y1y3y2H2Example22`Completingthesquare'is12ay2by=12a(yb=a)2+constant.6 y2=w1y1+w3y3+2(21)Notethattheunitsinwhichgasprice,electricityprice,andcarbonpricearemeasuredarealldi erent(poundspercubicmetre,penniesperkWh,andeurospertonne,forexample).Soy1,y3,andy3havedi erentdimensionsfromeachother.Mostpeoplewhododatamodellingtreattheirdataas`justnumbers',butIthinkitisausefuldisciplinetokeeptrackofdimensionsandtocarryoutonlydimensionallyvalidoperations.[Dimensionallyvalidoperationssatisfythetworulesofdimensions:(1)onlyadd,subtractandcomparequantitiesthathavelikedimensions;(2)argumentsofallfunctionslikeexp,log,sinmustbedimensionless.Rule2isreallyjustaspecialcaseofrule1,sinceexp(x)=1+x+x2+:::,sotosatisfyrule1,thedimensionsofxmustbethesameasthedimensionsof1.]Whatisthecovariancematrix?Hereweassumethattheparentvariablesy1andy3areuncorrelated.ThecovariancematrixisK=264K11K12K13K12K22K23K13K23K33375=26421w121022+w2121+w2323w32323375(22)Noticethezerocorrelationbetweentheuncorrelatedvariables(1;3).Whatdoyouthinkthe(1;3)entryintheinversecovariancematrixwillbe?Let'sworkitoutinthesamewayasbefore.ThejointdistributionisP(y1;y2;y3jH2)=P(y1)P(y3)P(y2jy1;y3)(23)=1Z1exp y21221!1Z3exp y23223!1Z2exp (y2w1y1w3y3)2222!(24)Wecollectallthetermsinyiyj.P(y1;y2;y3)=1Z0exp y21221y23223(y2w1y1w3y3)2222!=1Z0exp y21"1221+w21222#y221222+2y1y2w1221y23"1223+w23222#+2y3y2w32222y3y1w1w3222!=1Z0exp0BBBBBBBB@12hy1y2y3i2666666664"1221+w2122#w122+w1w322w122122w322+w1w322w322"1223+w2322#377777777526666664y1y2y3377777751CCCCCCCCASotheinversecovariancematrixisK1=2666666664"1221+w2122#w122+w1w322w122122w322+w1w322w322"1223+w2322#3777777775(25)7 Notice(assumingw1�0andw3�0)thattheo diagonaltermconnectingaparentandachild[K1]12isnegativeandtheo diagonaltermconnectingthetwoparents[K1]13ispositive.Thispositivetermindicatesthat,conditionalonalltheothervariables(i.e.,y2),thetwoparentsy1andy3areanticorrelated.That's`explainingaway'.Onceyouknowthepriceofelectricitywasaverage,forexample,youcandeducethatifgaswasmoreexpensivethannormal,carbonprobablywaslessexpensivethannormal.2OmissionofonevariableConsiderexample1.y1y3y2H1Example1Thecovariancematrixofallthreevariablesis:K=264K11K12K13K12K22K23K13K23K33375=264w2122+21w122w1w32222w322w2322+23375(26)Ifwedecidewewanttotalkaboutthejointdistributionofjusty1andy2,thecovariancematrixissimplythesub-matrix:K2="K11K12K12K22#="w2122+21w12222#(27)Thisfollowsfromthede nitionofthecovariance,Kij=hyiyji:(28)Theinversecovariancematrix,ontheotherhand,doesnotchangeinsuchasimpleway.The33inversecovariancematrixwas:K1=2666666664121w1210w121"122+w2121+w2323#w3230w3231233777777775Whenweworkoutthe22inversecovariancematrix,alltheBluetermsthatoriginatedfromthechildy3arelost.SowehaveK12=26664121w121w121"122+w2121#377758 Speci cally,noticethathK12i22isdi erentfromthe(2;2)entryinthethreebythreeK1.Weconclude:LeavingoutavariableleavesKunchangedbutchangesK1.Thisconclusionisimportantforunderstandingtheanswertothequestion,`WhenworkingwithGaussianprocesses,whynotparameterizetheinversecovarianceinsteadofthecovariancefunction?'Theansweris:youcan'twritedowntheinversecovarianceassociatedwithtwopoints!Theinversecovariancedependscapriciouslyonwhattheothervariablesare.3EnergymodelsSometimespeopleexpressprobabilisticmodelsintermsofenergyfunctionsthatareminimizedinthemostprobablecon guration.Forexample,inregressionwithcubicsplines,aregularizerisde nedwhichdescribestheenergythatasteelrulerwouldhaveifbentintotheshapeofthecurve.Suchmodelsusuallyhavetheform:P(y)=1ZeE(y)T;(29)andinsimplecases,theenergyE(y)maybeaquadraticfunctionofy,suchasE(y)=XijJijyiyj+Xiaiy2i(30)Ifso,thenthedistributionisaGaussian(justlike(1)),andthe`couplings'Jijareminusthecoecientsintheinversecovariancematrix.Asasimpleexample,considerasetofthreemassescoupledbysprings,andsubjectedtothermalperturbations.y1y2y3k01k12k23k34Threemasses,fourspringsTheequilbriumpositionsare(y1;y2;y3;y4)=(0;0;0;0),andthespringconstantsarekij.Theextensionofthesecondspringisy2y1.TheenergyofthissystemisE(y)=12k01y21+12k12(y2y1)2+12k23(y3y2)2+12k34y23=12hy1y2y3i264k01+k12k120k12k12+k23k230k23k23+k34375264y1y2y3375SoattemperatureT,theprobabilitydistributionofthedisplacementsisGaussianwithinversecovariancematrix1T264k01+k12k120k12k12+k23k230k23k23+k34375(31)Noticethatthereare0entriesbetweendisplacementsy1andy3,thetwomassesthatarenotdirectlycoupledbyaspring.9 y1y2y3y4y5kkkkkkFigure1.Fivemasses,sixspringsSoinversecovariancematricesaresometimesverysparse.Ifwehave vemassesinarowconnectedbyidenticalspringskforexample,thenK1=kT26666664210001210001210001210001237777775:(32)Butthissparsitydoesn'tcarryovertothecovariancematrix,whichisK=Tk266666640:830:670:500:330:170:671:331:000:670:330:501:001:501:000:500:330:671:001:330:670:170:330:500:670:8337777775:(33)4EigenvectorsandeigenvaluesaremeaninglessThereseemstobeaknee-jerkreactionwhenpeopleseeasquarematrix:`whatareitseigenvec-tors?'Buthere,wherewearediscussingquadraticforms,eigenvectorsandeigenvalueshavenofundamentalstatus.Theyaredimensionallyinvalidobjects.Anyalgorithmthatfeatureseigen-vectorseitherdidn'tneedtodoso,orshouldn'thavedoneso.(Ithinkthewholeideaofprincipalcomponentanalysisismisguided,forexample.)Hangon,yousay,whataboutthethreemassesexample?Don'tthosethreemasseshavemeaningfulnormalmodes?Yes,theydo,butthosemodesarenottheeigenvectorsofthespringmatrix(31).Remember,Ididn'ttellyouwhatthemassesofthemasseswere!I'mnotsayingthateigenvectorsarenevermeaningful.WhatI'msayingis,inthecontextofquadraticforms12yTAy;(34)eigenvectorsaremeaninglessandarbitrary.Consideracovariancematrixdescribingthecorrelationbetweensomething'smassy1anditslengthy2.K="K11K12K12K22#(35)ThedimensionsofK11aremass-squared.K11mightbemeasuredinkg2,forexample.ThedimensionsofK12hy1y2iaremasstimeslength.K12mightbemeasuredinkgm,forexample.Here'sanexample,whichmightdescribethecorrelationbetweenweightandheightofsomeanimalsinasurvey.K="K11K12K12K22#="10000kg270kgm70kgm1m2#(36)10 -1 0 1 2-200-100 0 100 200length / mmass / kgFigure2.Datasetwithits`eigenvectors'.Asthetextexplains,theeigenvectorsofcovariancematricesaremeaninglessandarbitrary.Theknee-jerkreactionis\let's ndtheprincipalcomponentsofourdata",whichmeans\ignorethosesillydimensionalunits,andjust ndtheeigenvectorsof"1000070701#.Butlet'sconsiderwhatthismeans.Aneigenvectorisavectorsatisfying"10000kg270kgm70kgm1m2#e=e:(37)Byaskingforaneigenvector,weareimaginingthattwoequationsaretrue{ rst,thetoprow:10000kg2e1+70kgme2=e1;(38)and,second,thebottomrow:70kgme1+1m2e2=e2:(39)Theseexpressionsviolatetherulesofdimensions.Tryallyoulike,butyouwon'tbeableto nddimensionsfore1,e2,andsuchthatrule1issatis ed.No,no,thematlabloversays,Ileaveoutthedimensions,andIget:�[e,v]=eig(s)e=0.0070002-0.9999755v=5.0998e-010.0000e+00-0.9999755-0.00700020.0000e+001.0000e+04Inoticethattheeigenvectors(0:007;0:9999),and(0:9999;0:007),whicharealmostalignedwiththecoordinateaxes.Veryinteresting!Ialsonoticethattheeigenvaluesare104and0:5.Whataninterestinglylargeeigenvalueratio!Wow,thatmeansthatthereisoneverybigprincipalcomponent,andthesecondoneismuchsmaller.Ooh,howinteresting.11 Butthisisnonsense.Ifwechangetheunitsinwhichwemeasurelengthfrommtocmthenthecovariancematrixcanbewritten:K="K11K12K12K22#="10000kg27000kgcm7000kgcm10000cm2#(40)Thisisexactlythesamecovariancematrixofexactlythesamedata.Buttheeigenvectorsandeigenvaluesarenow:e=-0.707110.70711v=300000.707110.70711017000Figure2illustratesthissituation.Ontheleft,adatasetofmassesandlengthsmeasuredinmetres.Thearrowsshowthe`eigenvectors'.(Thearrowsdon'tlook`orthogonal'inthisplotbecauseastepofoneunitonthex-axishappenstocoverlesspaperthanastepofoneunitonthey-axis.)Ontheright,exactlythesamedatasetbutwithlengthsmeasuredincentimetres.Thearrowsshowthe`eigenvectors'.Inconclusion,eigenvectorsofthematrixinaquadraticformarenotfundamentallymeaningful.[Propertiesofthatmatrixthataremeaningfulincludeitsdeterminant.]4.1AsideThiscomplaintabouteigenvectorscomeshandinhandwithanothercomplaint,about`steepestdescent'.Asteepestdescentalgorithmisdimensionallyinvalid.Astepinaparameterspacedoesnothavethesamedimensionsasagradient.Toturnagradientintoasensiblestepdirection,youneedametric.Themetricde neshow`big'astepis(inratherthesamewaythatwhengnuplotplottedthedataabove,itchoseaverticalscaleandahorizontalscale).Onceyouknowhowbigalternativestepsare,itbecomesmeaningfultotakethestepthatis`steepest'(thatis,it'sthedirectionwiththebiggestchangeinfunctionvalueperunit`distance'moved).Withoutametric,steepestdescentsalgorithmsarenotcovariant.Thatis,thealgorithmwouldbehavedi erentlyifyoujustchangedtheunitsinwhichoneparameterismeasured.Appendix:AnswerstoquizForthe rstfour,youcanquicklyguesstheanswersbasedonwhetherthe(1;3)entriesarezeroornot.Foracarefulansweryoushouldalsocheckthatthematricesreallyarepositivede nite(theyare)andthattheyarerealisablebytherespectivegraphicalmodels(whichisn'tguaranteedbytheprecedingconstraints).1.AandB2.CandD3.CandD4.AandB5.Aistrue,Bisfalse.12

Related Contents


Next Show more