Probabilistic Partial Canonical Correlation Analysis Y - PDF document

Probabilistic Partial Canonical Correlation Analysis Y
Probabilistic Partial Canonical Correlation Analysis Y

Probabilistic Partial Canonical Correlation Analysis Y - Description


Partial CCA is known to be closely related to a causal ity measure between two time series However partial CCA requires the inverses of covariance matrices so the calculation is not stable This is particularly the case for highdimensional data or sm ID: 65152 Download Pdf

Tags

Partial CCA known

Download Section

Please download the presentation from below link :


Download Pdf - The PPT/PDF document "Probabilistic Partial Canonical Correlat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Embed / Share - Probabilistic Partial Canonical Correlation Analysis Y


Presentation on theme: "Probabilistic Partial Canonical Correlation Analysis Y"— Presentation transcript


ProbabilisticPartialCanonicalCorrelationAnalysis YusukeMukutaMUKUTA@MI.T.U-TOKYO.AC.JPGraduateSchoolofInformationScienceandTechnology,TheUniversityofTokyo7–3–1,Hongo,Bunkyo-ku,Tokyo,113-8656,JapanTatsuyaHaradaHARADA@MI.T.U-TOKYO.AC.JPGraduateSchoolofInformationScienceandTechnology,TheUniversityofTokyo7–3–1,Hongo,Bunkyo-ku,Tokyo,113-8656,JapanAbstractPartialcanonicalcorrelationanalysis(partialCCA)isastatisticalmethodthatestimatesapairoflinearprojectionsontoalowdimensionalspace,wherethecorrelationbetweentwomulti-dimensionalvariablesismaximizedafterelimi-natingtheinuenceofathirdvariable.PartialCCAisknowntobecloselyrelatedtoacausal-itymeasurebetweentwotimeseries.However,partialCCArequirestheinversesofcovariancematrices,sothecalculationisnotstable.Thisisparticularlythecaseforhigh-dimensionaldataorsmallsamplesizes.Additionally,wecan-notestimatetheoptimaldimensionofthesub-spaceinthemodel.Inthispaper,wehavead-dressedtheseproblemsbyproposingaproba-bilisticinterpretationofpartialCCAandderiv-ingaBayesianestimationmethodbasedontheprobabilisticmodel.Ournumericalexperimentsdemonstratedthatourmethodscanstablyesti-matethemodelparameters,eveninhighdimen-sionsorwhenthereareasmallnumberofsam-ples.1.IntroductionPartialcanonicalcorrelationanalysis(partialCCA)wasproposedbyRao(1969).Itisastatisticalmethodusedtoestimateapairoflinearprojectionsontoalow-dimensionalspace,wherethecorrelationbetweentwomultidimensionalvariablesismaximizedaftereliminatingtheinuenceofathirdvariable.ThisiscalculatedusingaCCAoftheresidu-alsofalinearregressionofthethirdvariable.Thismethodisageneralizedversionofthepartialcorrelationcoef- Proceedingsofthe31stInternationalConferenceonMachineLearning,Beijing,China,2014.JMLR:W&CPvolume32.Copy-right2014bytheauthor(s).cientformultidimensionaldata.Wedenethevariablesfy1ngNn=12Rd1andfy2ngNn=12Rd2,thethirdvariablefxngNn=12Rdxandthedimensionofthesubspacedz.ThenthepartialCCAiscalculatedusingthegeneraleigen-valueproblem12jx122jx21jxu1=211jxu1;21jx111jx12jxu2=222jxu2;(1)wherem1m2jx=m1xm1x1xxxm2,andabisasamplecovariancematrix.PartialCCAhasvariousap-plicationsinareassuchassocialscience(Kowalskietal.,2003),andcanbeusedasacausalitymeasure.Causalitymeasuresareindicesthatmeasuretheinuenceofonetimeseriesonanother.Transferentropy(Schreiber,2000)isameasurebasedoninformationtheory.Itmea-suresthemagnitudeofachangetotheconditionaldistribu-tionofygivenx,andiscalculatedusingTx!y=ZZZp(y;y(l)1;x(k)1)log2p(yjy(l)1;x(k)1) p(yjy(l)1)dydy(l)1dx(k)1;(2)wherekandldenotetheembeddingdimensions,y(l)1=yT1yT2yTl+1T;andx(k)1=xT1xT2xTk+1T.Shibuyaetal.(2009)showedthatwhenweassumethatthevariablesarenormallydis-tributedandestimatethemodelparametersusingmaxi-mumlikelihoodestimation,transferentropyisequivalenttoGrangercausality(Granger,1969).Grangercausalityisbasedonchangestotheestimationerrorofanautore-gressivemodel.Shibuyaetal.(2011)showedthatwecanusethepartialcanonicalcorrelations,,calculatedus-ingpartialCCAonyandx(k)1andeliminatetheeffectofy(l)1.Then,thetransferentropycanbecalculatedus-ingTx!y=1 2Pmin(d;kdx)=1log21 12.Transferentropy ProbabilisticPartialCanonicalCorrelationAnalysis hasmanyapplicationssuchasbrainanalysis(Ch´avezetal.,2003),medicalscience(Verdes,2005),cognitivedevelop-mentmodelling(Sumiokaetal.,2008),anddetectingmo-tioninamovie(Yamashitaetal.,2012).However,partialCCArequirestheinversesofsampleco-variancematrices,sothecalculationisunstablewhenthevariablesarehighlycorrelated,thedimensionofthedataislarge,ortherearenotenoughdata.Yamashitaetal.regu-larizedthecovariancematrixtosolvethisproblem(2012),buttheappropriateoptimizationofthepluralregularizationparametershasnotbeendetermined.Additionally,wecan-notestimatetheproperdimensionofthesubspaceofthemodel.Wehaveaddressedtheseproblemsbyproposingaprob-abilisticinterpretationofpartialCCA,andbyderivingavariationalBayesianestimationalgorithmforthemodelparametersbasedonthisprobabilisticinterpretation.Ourexperimentsshowthattheproposedmethodscanmoreac-curatelyestimatethesubspacedimension,andcanmorestablyestimatethemodelparametersonbothsyntheticandrealdata,eveninhighdimensionsorwhentherearefewsamples.2.CanonicalCorrelationAnalysisanditsExtensionInthissection,wereviewcanonicalcorrelationanalysis,whichisastatisticalmethodsimilartopartialCCA.Wealsoconsideritfromaprobabilisticperspective.Canonicalcorrelationanalysis(CCA)wasproposedbyHotelling(1936).Itisamethodforndingstatisticaldependenciesbetweentwodatasources.Givenvariablesfy1ngNn=12Rd1andfy2ngNn=12Rd2,andthedimensionofthesubspacedzmin(d1;d2),theCCAcanbecalculatedusingthegeneraleigenvalueproblem1212221u1=211u1;2111112u2=222u2;(3)wherem1m2representsasamplecovariancematrixbe-tweenym1andym2.Theprojectionisadzd(i=1;2)matrixwiththed-throweigenvectorcorrespondingtothed-thlargesteigenvalue.Eacheigenvalueequalsthecorre-lationineachdimension.NumerousstudieshaveextendedCCA,includinganonlinearextensionusingkernels(Lai&Fyfe,2000;Melzeretal.,2001),onlineinferencesofthemodelparameters(V´aetal.,2007;Ygeretal.,2012),andsparsevariants(Hardoon&Shawe-Taylor,2009).BachandJordangaveaprobabilisticinterpretationofCCA(2005),suchthatthemaximumlikelihoodestimatesofthemodelparameterscanbederivedfromtheCCA.Giventhisprobabilisticinterpretation,wecanextendCCAtoproba-bilisticmodels.Figure1showsagraphicalmodelofthe Figure1.GraphicalmodelforprobabilisticCCA.interpretation,wherefzngNn=12Rdzarethelatentvari-ables.ThegenerativemodelisznN(0;Idz);ymnN(mzn+m; m);(4)whereN(;)denotesthemultivariatenormaldistribu-tionwithmeanandcovariance,andIddenotestheddimensionalidentitymatrix.m2Rdmdzand m2Rdmdmarethemodelparametersthatwemustestimate.WedenetheUmdzmatricesashavingtheird-thcolumnequaltothed-theigenvector,andPdz2Rdmdzasadiag-onalmatrixwithd-thelementequaltothed-theigenvalueofEquation(3).Then,themaximumlikelihoodsolutionism=mmUmdzMm; m=mmm(m)T;m= ym;(5)whereMm2RdmdmarearbitrarymatricessuchthatM1MT2=PdzandthespectralnormsofMmaresmallerthanone. ymisthesamplemean1 NPNn=1ymn.Therearesomeextensionsofthisprobabilisticmodel.Theyincludearobustestimationmethodthatassumesastudentdistri-butionfornoise(Archambeauetal.,2006),andanonlin-earextensionthatusesaGaussianprocesslatentvariablemodel(Leen&Fyfe,2006;Eketal.,2008).BayesianCCA(Klami&Kaski,2007;Wang,2007)as-sumesthatthemodelparametersarealsorandomvariables.WangusedaWishartpriorfortheprecisionmatricesofthenoise,anARDprior(Neal,1995)foreachcolumnoftheprojectionmatrices,andderivedavariationalBayesianes-timationalgorithmfortheposteriordistributionofthepa-rameters.Virtanenetal.(2011)reducedthenumberofmodelparametersbyassumingthatthenoisewasisotropicandbyintroducingnon-sharedlatentvariables.Klamietal.(2013)derivedanalgorithmthatsimultaneouslyinferredtheprojectionmatricesforthesharedandnon-sharedvari-ables.Damianouetal.(2012)studiedaBayesianextensionofaGaussianprocesslatentvariablemodel.Fujiwaraetal.(2009)usedBayesianCCAtoestimateimagebasesfromfMRIdata. ProbabilisticPartialCanonicalCorrelationAnalysis Figure2.GraphicalmodelforprobabilisticpartialCCA.3.ProbabilisticInterpretationofPartialCCAInthissection,weproposeagenerativemodelthates-timatesthemaximumlikelihoodparametersusingpartialCCA.Wealsoderiveanexpectation-maximization(EM)algorithmthatestimatesthemodelparametersandlatentvariables.3.1.GenerativeModelWeconsideragenerativemodelthatcombinestheregres-sionsofvariablesthathaveeffectswewanttoeliminateandsharedlatentvariables,asshowninFigure2.ThemodelisdenedasznN(0;Idz);ymnN(mxxn+mzzn+m; m):(6)WewillshowthatthemaximumlikelihoodsolutionargmaxWx;Wz; logp(yjx;x;Wz; )canbecalculatedusingpartialCCA.Tothisend,weshowthattheproposedmodelcanbereducedtothegenerativemodelofprobabilisticCCAEquation(4).WhenwedenetheloglikelihoodLandC= 100 2+1z2z1zT2zT;itholdsthat@L @=NXn=1C112y1ny2n+1x2xxn:BecauseCispositivedenite,thelikelihoodismaxi-mizedwhenissuchthatthepartialderivativeequalszero.Therefore,m= ymmx x:(7)Wedenoteeachdatumminusthesamplemeanasey1n=y1n y1;andsubstituteEquation(7).Then,@L @Wx=NXn=1C1ey1ney2nexTn1x2xexnexTn:Wecanalsoshowthatifthedataspaceisspannedbythesamples,Listhenegativedenitequadraticformofx.SoLismaximizedwhenxissuchthatthepartialderiva-tiveiszero.Therefore,mx=mx1xx:(8)WhenwesubstitutethisintoEquation(6),themodelisequivalenttotheprobabilisticCCAmodelwithinputvari-ablesy0mn=eymnmx1xxexn:Becausethecovariancematricesofthesedataare1 NNXn=1y0m1ny0m2nT=m1m2m1x1xxxm2=m1m2jx;(9)theparameterestimationisreducedtopartialCCA.Tosummarize,themaximumlikelihoodsolutionofthepro-posedmodelcanbewrittenasmx=mx1xx;mz=mmjxUmdzMm; m=mmjxmzmzT;m= ymmx x;(10)whereUmdzdenotesmatricesthathavetheird-thcolumnequaltothed-theigenvector,Pddenotesthediagonalma-trixwithitsd-thelementequaltothed-thcanonicalcorre-lationofEquation(1),andMmarearbitrarymatricesthatsatisfyM1MT2=Pdzandhavespectralnormssmallerthanone.Fromthispoint,weassumethatsampleshavezeromeanandwedonotinferasamplemean.3.2.EMParameterEstimationAswithCCA,wecanestimatethelatentvariablesusingtheEMalgorithmwithoutintegratingthemout.Inthiscase,znfollowsanormaldistributionandtheupdaterulefortimetis(z)=(I+(z)T( )1(z))1;hZi=(z)(z)T( )1(Y(x)X);m+1=YmXhZiTXXTXhZiThZiXThZZTi1;(11) m+1=1 NYmYmT+1XhZiYTmm;where isthematrixwith monitsdiagonal,x;andzarethematricesthathavemxandmzintheircolumns,Ammistheblockmatrixcorrespondingtoeachview,Ymisthematrixthathasymninitsrows,andY=Y1Y2:Additionally,XandZarematriceswithxnandynintheirrows,andhiaretheexpectationsoftherandomvariables. ProbabilisticPartialCanonicalCorrelationAnalysis Figure3.GraphicalmodelforBPCCA.4.BayesianPartialCCAToaddressthepreviouslymentionedweaknessofpartialCCA,weproposeahierarchicalBayesianapproachtotheprobabilisticpartialCCAproposedintheprevioussection.4.1.ModelthatDirectlyusesProbabilisticPartialCCAInthissection,wefollowWang'sapproach(2007)andcon-siderthegenerativemodelshowninFigure3.Ittreatsthemodelparametersproposedintheprevioussectionasran-domvariables.WeuseanARDprior(Neal,1995)foreachcolumnoftheprojectionmatrices,andaninverseWishartpriorforthecovariancematricesofthenoise.Thegenera-tivemodelis mkGamma(a0;b0);m:;kN(0;( mk)1Idm); mIW(m0;Km0);znN(0;Idz);ymnN(mxxn+mzzn; m);(12)wherethepriorforthethirdvariablep(x)doesnotaffecttheinferencewhenp(xn)�0foreachsample,becauseweconsidertheconditionaldistributiongivenxn.HereGamma(a;b)istheGammadistributionwithshapeparam-eteraandscaleparameterb,andIW(;K)istheinverseWishartdistribution.m=mxmz:Wm:;kisthek-thcolumnofm:Thehyperparametersa0;b0;m0;Km0shouldbesmallsothatthepriorsarebroad,butfromthedenitionoftheWishartdistribution,m0�dm1:Inourexperiments,weseta0;b0=1014;m0=dm;Km0=1014Idm:TheARDpriordrivesunnecessarycompo-nentstozero,sowecanestimatethedimensionsofthelatentvariablesbychoosingsufcientlylargedz,orbyrstchoosingasmalldzandthengraduallyincreasingitac-cordingtotheoutputprojectionmatrices.WerefertothismodelasBayesianPCCA(BPCCA).Next,weproposeavariationalBayesianinferencealgo-rithm.Thefullposteriorp(Z;jX;Y)isapproximatedasq(Z;)=q(Z)2Ym=10@q( m)q( m)dmYj=1q(wmj)1A;(13)wherewmjisthej-throwofm.Weapplystandardcycli-calupdatestotheseparatetermsofq.Whenthefactorizeddistributionqhastheformq(),theupdateruleisq()/exphlogp(X;Y;Z;)iZ;k=;q(Z)/exp(hlogp(X;Y;Z;)i):(14)Becausep(X)isindependentoftheothervariables,itfol-lowsthatq()/exphlogp(Y;Z;jX)iZ;k=;q(Z)/exp(hlogp(Y;Z;jX)i);(15)wherehiwithsubscriptsdenotetheexpectationwithre-specttotheapproximateposteriordistributionofthecorre-spondingvariables.Theapproximateposteriordistributionhastheshapeq(zn)=N(zn;zn);q( m)=IW(m;Km);q(wmj)=N(mj;mj);q( m)=YkGamma(am;bmk):(16)Furthermore,theparametersareupdatedaszn= I+Xmh(mz)T( m)1mzi!1;zn=znXmh(mz)Tih( m)1iymnh(mz)T( m)1mxixn;Km=Km0+Ym(Ym)T+hmXXTXZTZXTZZT(m)TiYmXThZTih(m)TihmiXhZiYm;m=m0+N;mj=diagh mi+h( m)1j;jiXXTXhZiThZiXThZZTi1;mj=h( m)1j;:iYmXTZTXl=jh( m)1j;lihml;:iXXTXhZiThZiXThZZTi;am=a0+dm=2;bmk=b0+hkm:;kki=2;(17) ProbabilisticPartialCanonicalCorrelationAnalysis Figure4.GraphicalmodelforGSPCCA.wherediagh miisthediagonalmatrixwithk-thelementh mki:4.2.ModelwithIsotropicNoiseThemodelproposedintheprevioussubsectionrequiresalargenumberofcalculationstoinfernoiseprecisionmatri-ces.Additionally,thepriordistributionhasalargeinu-encewhenthereareasmallnumberofsamples,becausem0�dm1.Therefore,followingtheapproachusedbyKlamietal.(2013),weproposeamodelthatusesisotropicnoiseandnon-sharedlatentvariables.Thegen-erativemodelisznN(0;Idz);zmnN(0;Idzm);(18)ymnNmxxn+Amzn+Bmzmn;(m)1Idm:Whenthezmnareintegratedout,thismodelisequiva-lenttothemodelproposedintheprevioussubsectionwith m=Bm(Bm)T+(m)1Idm.Sowecanconsiderthismodelasequivalenttoimposingalow-rankassump-tiononthecovariancematrices.Tosimultaneouslyesti-mateAandB,wewritez=A(1)B(1)0A(1)0B(1);=xz;andconsiderthemodel mkGamma(a0;b0);m:;kN(0;( mk)1Idm);mGamma(a0;b0);znN(0;I(dz+dz1+dz2));ymnNmxxn+mzzn;(m)1Idm;(19)asshowninFigure4.Thisrepresentationreducesthenum-berofmodelparameters.WerefertothismodelasgroupsparsePCCA(GSPCCA).Thismodelalsorequiressmallhyperparameters.Wehaveuseda0;b0=1014inourex-periments.Additionally,wechoosetheapproximateposte-riorq(Z;)=q(Z)Ym(q(m)q( m)q(m));(20)andtheshapeq(Z)=YnN(zn;z);q(m)=YdN(Wmd;:;Wm);q( m)=YkGamma(am;bmk);q(m)=Gamma(am;bm):(21)TheparametersareupdatedasWm=diagh mi+hmiXXTXhZiThZiXThZZTi1;Wm=YmXThZTi;z= I+Xmhmih(mz)Tmzi!1;hZi=z Xmhmih(mz)TiYmh(mz)TmxiX!;am=a0+dm=2;bmk=b0+h(m)Tmik;k=2;am=a0+Ndm=2;bm=b0+1 2 TrYm(Ym)T2YmXThZTih(m)Ti+Trh(m)TmiXXTXhZiThZiXThZZTi:(22)4.3.OptimizationoftheLinearTransformationoftheLatentVariablesThemaximumlikelihoodsolutionofprobabilisticpartialCCAhasthesamedegreesoffreedomasthelineartrans-formationoflatentvariables.IntheBayesianmodel,weoptimizethistransformationineachiterationtoobtainanapproximatedistributionthatisclosertothepriordistribu-tion.Weexpectthatthisspeedsuptheconvergenceandthatthelatentvariablesaremoreindependent.Thefunc-tiontobemaximizedissimilartothatin(Virtanenetal.,2011),andisdenedasL(R)=Tr(R1hZZTiRT) 2+(d1+d2N)logjRj1 22Xm=1dmdzXk=1log(rTkh(mz)Tmzirk):(23)Tosolvethis,weusetheL-BFGSmethod(Liu&Nocedal,1989)initializedwiththeidentitymatrix.Usingtheopti- ProbabilisticPartialCanonicalCorrelationAnalysis 25 50 100 200 400 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 NW estimation error CV BIC GSPCCA (ours) BPCCA (ours) 25 50 100 200 400 800 0 0.2 0.4 0.6 0.8 1 Naccuracy CV BIC GSPCCA (ours) BPCCA (ours) (a)High-dimensionaldata 25 50 100 200 400 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 NW estimation error CV BIC GSPCCA (ours) BPCCA (ours) 25 50 100 200 400 800 0 0.2 0.4 0.6 0.8 1 Naccuracy CV BIC GSPCCA (ours) BPCCA (ours) (b)Low-dimensionaldataFigure5.ComparisonoftheWxestimationerrorandthemodelaccuracy.TheleftpanelshowstherelativeestimationerrorofWx.Therightpanelshowstheaccuracyofdz.malR,theapproximatedistributionsaretransformedintohZi R1hZi;ZR1ZRT;WmzWmzR;WmzRTWmzR:(24)5.ExperimentsWehaveappliedourmethodstosyntheticandrealdata,toverifythattheycanbeusedwithasmallnumberofsamplesorhigh-dimensionaldata.Wecomparedthestabilityofthemodelselectionandthecausalitymeasures.5.1.ModelSelectionWerstinvestigatedtheestimatesofxanddzusingsyntheticdata.Wedidnotconsiderzbecausethemaximumlikelihoodsolutionofzisnotunique.Wecomparedourmethods(BPCCA,GSPCCA)withthemodelselectiontechniquesusingtheBayesianinformationcriterion(BIC)andve-foldcrossvalidation(CV).Inourmethods,weconsideredthatacomponentkofthesolutionwasactivewhenh mki50,andletdzbeanestimateofthenumberofkforthatareactiveforeachview.Wesetd1=5;d2=4;dx=3,anddz=2forlow-dimensionaldata,andd1=50;d2=50;dx=5,anddz=5for 6 7 8 9 10 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 the number of columns of Wzoutput dz divided by true dz N=25 N=50 N=100 N=200 (a)low-dimensionaldata 12 14 16 18 20 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 the number of columns of Wzoutput dz divided by true dz N=25 N=50 N=100 N=200 (b)high-dimensionaldataFigure6.Comparisonoftheestimatesofdz.Theleftpanelshowstheperformanceonlow-dimensionaldata.Therightpanelcorre-spondstohigh-dimensionaldata. 25 50 100 200 400 0 0.5 1 1.5 2 2.5 Ncausality measure �x-y �y-x (a)=0 25 50 100 200 400 0 0.5 1 1.5 2 2.5 Ncausality measure �x-y �y-x 25 50 100 200 400 2 4 6 8 10 Ncausality measure �x-y �y-x (b)=0:01 25 50 100 200 400 0 1 2 3 Ncausality measure �x-y �y-x 25 50 100 200 400 2 4 6 8 10 Ncausality measure �x-y �y-x (c)=0:1Figure7.Comparisonofthestabilityofacausalitymeasure.(a)andtheleftpanelsof(b)and(c)showtheperformanceofGSPCCA(ours).Therightpanelsof(b)and(c)showtheper-formanceofPCCA.Thebluelineshowstheestimatedcausalitymeasuresforthetruedirection.Thegreenlineshowstheesti-matesforthereversedirection.high-dimensionaldata.Ineachsetting,wegeneratedN=25;50;100;200;300;400;500;600;700;800;900;1000samplesfromagenerativemodel.EachcolumnoftheprojectionmatrixwassampledfromaNormaldistributionwithzeromeanandunitvariance,andthenoisecovariancematriceswereIdz+Pbz 2c=1uuT;foruN(0;Idz):zhadvecolumnsforlow-dimensionaldata,and10columnsforhigh-dimensionaldata.Weconducted50experimentsforeachparameter.FortheBayesianmeth-ods,wedeterminedthatthemethodhadconvergediftherelativechangeinthevariationallowerboundwasbelow104.Asitconvergestoalocalmaxima,weinitializedthemodelbyrandomlysamplingthelatentvariablesfromtheprior,andranthealgorithm10timeschoosingthesolutionwiththebestvariationallowerbound.Foreachmethod,wecalculatedthemeanoftherelativeerrorofxusing ProbabilisticPartialCanonicalCorrelationAnalysis Tr((Wx^Wx)T(Wx^Wx)) Tr(WTxWx);where^xisanestimateofx.Wealsorecordedtheaccuracyrateofthesystem,dz.TheresultsarepresentedinFigure5.IntherightsubgureofFigure5(a),theCVresultishiddenbecauseithasbeenoverwrittenbytheBICresult.BecausewecannotstablycalculatetheBICandCVofnon-BayesianmethodswhenD=50andN=25;50,andtheBICandCVofBPCCAcannotbecalculatedwhenD=50andN=25,wehavenotincludedtheseresults.Theseplotsshowthatexistingmodelselectionmethodsperformpoorlyandthattheac-curacydecreasestozeroinhighdimensions.Conversely,thetwoBayesianmethodsareveryaccurate,evenforhigh-dimensionaldata.BPCCA'sperformancedegradeswhenD=50andN=50,butGSPCCA'sperformancede-gradesmoregradually.Theestimateofxfollowsasim-ilartrend.Theseresultsdemonstratethatourmethodscal-culatethemodelselectionandparameterestimationmoreaccuratelythannon-Bayesianmethods,andthatGSPCCAisthebestmethod.Next,wecomparedthemodel-selectionperformanceofGSPCCAbyvaryingthenumberofcolumnsofzto6,7,8,9,and10forlow-dimensionaldata,and12,14,16,18,20forhigh-dimensionaldata,withN=25;50;200;800:Theperformancewasmeasuredusingthemeanofthenum-berofactivecomponentsdividedbythetruedz:TheresultsareshowninFigure6.Inhighdimensions,theperformanceisalmostoneforalltheparameters.Inlowdimensions,ifN=25theperformancedecreasesgradually.However,thiseffectcanbeignoredbecausethetruedzistwo.Theseresultsindicatethatthenumberofcolumnsinzhaslittleeffectontheperformance,ifitissufcientlylarge.5.2.CausalityMeasurewithSyntheticDataToevaluatethestabilityofthecausalitycalculationsforasmallsampleofhigh-dimensionaldata,wegeneratedatimeseriesusingthefollowinglinearmodel.x=0:5x1+t;x;y2t=0:5y2t1+Wx1+t;y2;y=yT2tyT2tT+t;y;(25)wherethersttwocolumnsofaresampledfromN(0;0:5I20)andtheothercolumnsarezero.t;x;t;y2denotesGaussiannoisewithzeromeanandunitvari-ance.t;yis0whenr=0;andisGaussiannoisewithzeromeanandvariancerI40otherwise.Thetruecausalitydirectionisxy.Therstandsec-ondhalvesofyarestronglycorrelated.Thiscorrela-tionisstrongwhenrissmall.Theoptimaldimensionofthelatentvariablesistwo.Usingthismodel,wesettheembeddingdimensionto1,rto0;0:01;0:1,andthesamplesizetoN=25;50;100;200;400,foreachpa-rameter.WeexpectthatthecausalitymeasuresderivedfromPCCAandprobabilisticPCCAareequivalent,sowecomparedPCCAwithGSPCCA(thebestperformingmethod).WeusedP20d=11 2log21 12asacausalitymea-sureforPCCA.ForGSPCCA,weletkbethecorrela-tionbetweenhY1(k;:)jYiandhY1(k;:)jX1i,andusedPk1 2log21 12kasacausalitymeasure,wherethesumma-tionisovertheactivecomponents.Figure7showsthere-sults.Wehavenotincludedresultsifthesolutioncouldnotbestablyevaluated.ThecausalitymeasureusingPCCAdivergedwhenNwasbelow200,irrespectiveof:Thismeasurealsoincreasedinthedirectionofyx,sothismeasureisunreliablewhenNissmall.However,themea-sureusingGSPCCAwaszerointheyxdirectionwhenNwaslargerthan100,becausetheBayesianmodelmakesdirectionsthathaveanegligibleinuenceconvergetozero.Thisbehaviorhelpseliminatefalsecausalityrelations,butthismodelmayoverlooktruecausalityrelationswhentheinuenceissmall.Insuchcases,wecoulddetectsmallinuencesbymodifyingthehyperparametersoftheARDprior.Thismeasuretendedtodivergewhenwaslessthan0.01andN=50,or=0:1andN=25.However,themeasureinthexydirectionwaslargerthanthatintheyxdirection.TheBayesmodelalsobecomesunstablewhenthereareaninsufcientnumberofsamples.5.3.CausalityMeasurewithRealDataNext,weappliedGSPCCAandPCCAtometeorologicaldata,usingtheGlobalSummaryoftheDay(GSOD)pro-videdbytheNationalClimaticDataCenter(NCDC)onitswebsite.Forthisexperiment,weuseddatafromtheUSAbetweenDecember24,2008andFebruary28,2009.Fig-ure8showstheobservedjetstreamduringthatsamepe-riod.Weselectedseventypesofvariablesthatdidnothaveasubstantialamountofmissingdata:meantemperature,meandewpoint,meanvisibility,meanwindspeed,max-imumsustainedwindspeed,maximumtemperature,andminimumtemperature.Therefore,thetimeserieshassevendimensions.Thelengthwas66.Werandomlychose224targetsbasedondistance,afterexcludingtargetswithmanymissingvalues.Weconductedazero-orderholdformiss-ingvalues.Wesettheembeddingdimensionsto2,3,and4andusedthesamecausalitymeasureasinthesyntheticdataexperiments.Figure9showsourresults.Figure9showsthelargest50indexvalues.BecausethecausalitymeasurethatusedPCCAwiththeembeddingdimensionoffourdivergedinsomepairs,wehaveincludedalltheindexvaluesthatdiverged.Whentheembeddingdimen-sionwastwo,GSPCCAandPCCAhadasimilartendencytoshowastronginformationowfromwesttoeastovertheeasternregion,andfromnorthtosouthinthecentralregion.ThisisconsistentwithFigure8.Whentheembed-dingdimensionswerefour,thearrowsdrawnusingPCCA ProbabilisticPartialCanonicalCorrelationAnalysis (a)19/1/2009 (b)19/2/2009Figure8.WeatherinformationowmapfortheUSA(source:TheCaliforniaRegionalWeatherServer,SanFranciscoUniver-sity).werescatteredoverthemainland,althoughtheindexval-uesusingGSPCCAhadasimilartendencytothosewiththeembeddingdimensionoftwo.ThisresultimpliesthatPCCAovertsthedatawhentheembeddingdimensionishigh.Next,wecalculatedtheaveragearrowlengthusingtheHubenyformula1.Itwas1:0103;1:1103;1:1103km/dayforGSPCCAand9:8102;1:2103;1:7103km/dayforPCCA.ThisshowsthatthecausalitymeasureusingGSPCCAwasmorestableandsimilartotheactualaircurrent,whichwasapproximately8:6102km/day(Shibuyaetal.,2011),evenwhentheembeddingdimen-sionwashigh.Becausethetrueembeddingdimensionisunknown,GSPCCAisamorereliablemethod. 1http://www.kashmir3d.com/kash/manual-e/std_siki.htm GSPCCA(ours)Averagearrowlength:1:0103km/day PCCAAveragearrowlength:9:8102km/day(a)Embeddingdimension=2 GSPCCA(ours)Averagearrowlength:1:1103km/day PCCAAveragearrowlength:1:2103km/day(b)embeddingdimension=3 GSPCCA(ours)Averagearrowlength:1:1103km/day PCCAAveragearrowlength:1:7103km/day(c)Embeddingdimension=4Figure9.WeatherinformationowmapofUSA(2008/12/24–2009/02/28).MapsontheleftwerecalculatedusingGSPCCA,andmapsontherightwerecalculatedusingPCCA.6.ConclusionWeproposedaprobabilisticinterpretationofpartialCCA.WealsopresentedaBayesianextensionandaninferencealgorithmbasedontheprobabilisticinterpretation.Ourex-perimentshavedemonstratedthattheproposedmethodsaremoreappropriateformodelselectionandestimatingcausalrelationsfromtimeseriesthanexistingmethods,whenthereareasmallnumberofsamplesorinhighdi-mensions.WeexpectthatPCCAandcausalitymeasureswillbeextensivelyappliedtomanyareasusingourmeth-ods.OurBayesianpartialCCAmethodcanbeextendedtoarobustestimationmethodusingaStudentdistributionforthenoise(Archambeauetal.,2006),ortoaninferencemethodusingtheonlinevariationalBayestechnique(Hoff-manetal.,2013).Additionally,byconsideringtheprojec-tionmatricesasrandomvariables,wecanconstructamorecomplexmodelthatallowsthecausalrelationtochangeovertime. ProbabilisticPartialCanonicalCorrelationAnalysis ReferencesArchambeau,C´edric,Delannay,Nicolas,andVerleysen,Michel.Robustprobabilisticprojections.InICML,pp.33–40,2006.Bach,FrancisRandJordan,MichaelI.Aprobabilisticin-terpretationofcanonicalcorrelationanalysis.TechnicalReport688,DepartmentofStatistics,UniversityofCal-ifornia,Berkeley,2005.Ch´avez,Mario,Martinerie,Jacques,andLeVanQuyen,Michel.Statisticalassessmentofnonlinearcausality:applicationtoepilepticeegsignals.JournalofNeuro-scienceMethods,124(2):113–128,2003.Damianou,Andreas,Ek,Carl,Titsias,MichalisK,andLawrence,NeilD.Manifoldrelevancedetermination.InICML,pp.145–152,2012.Ek,CarlHenrik,Rihan,Jon,Torr,PhilipHS,Rogez,Gr´egory,andLawrence,NeilD.Ambiguitymodelinginlatentspaces.InMLMI,pp.62–73,2008.Fujiwara,Yusuke,Miyawaki,Yoichi,andKamitani,Yukiyasu.Estimatingimagebasesforvisualimagere-constructionfromhumanbrainactivity.InNIPS,pp.576–584,2009.Granger,CliveWJ.Investigatingcausalrelationsbyecono-metricmodelsandcross-spectralmethods.Economet-rica:JournaloftheEconometricSociety,37(3):424–438,1969.Hardoon,DavidRandShawe-Taylor,John.Sparsecanon-icalcorrelationanalysis.stat,1050:19,2009.Hoffman,M,Blei,D,Wang,Chong,andPaisley,John.Stochasticvariationalinference.JMLR,14:1303–1347,2013.Hotelling,Harold.Relationsbetweentwosetsofvariates.Biometrika,28(3/4):321–377,1936.Klami,ArtoandKaski,Samuel.Localdependentcompo-nents.InICML,pp.425–432,2007.Klami,Arto,Virtanen,Seppo,andKaski,Samuel.Bayesiancanonicalcorrelationanalysis.JMLR,14:965–1003,2013.Kowalski,J,Tu,XM,Jia,G,Perlis,M,Frank,E,Crits-Christoph,P,andKupfer,DJ.Generalizedcovariance-adjustedcanonicalcorrelationanalysiswithapplicationtopsychiatry.Statisticsinmedicine,22(4):595–610,2003.Lai,PeiLingandFyfe,Colin.Kernelandnonlinearcanon-icalcorrelationanalysis.IJNS,10(05):365–377,2000.Leen,GayleandFyfe,Colin.Agaussianprocessla-tentvariablemodelformulationofcanonicalcorrelationanalysis.InESANN,pp.413–418,2006.Liu,DongCandNocedal,Jorge.Onthelimitedmemorybfgsmethodforlargescaleoptimization.Mathematicalprogramming,45(1-3):503–528,1989.Melzer,Thomas,Reiter,Michael,andBischof,Horst.Ker-nelcanonicalcorrelationanalysis.InICANN,pp.353–360,2001.Neal,RadfordM.Bayesianlearningforneuralnetworks.PhDthesis,UniversityofToronto,1995.Rao,BRaja.Partialcanonicalcorrelations.Trabajosdeestadisticaydeinvestigaci´onoperativa,20(2):211–219,1969.Schreiber,Thomas.Measuringinformationtransfer.Phys-icalReviewLetters,85(2):461,2000.Shibuya,Takashi,Harada,Tatsuya,andKuniyoshi,Yasuo.Causalityquanticationanditsapplications:structuringandmodelingofmultivariatetimeseries.InKDD,pp.787–796,2009.Shibuya,Takashi,Harada,Tatsuya,andKuniyoshi,Yasuo.Reliableindexformeasuringinformationow.PhysicalReviewE,84(6):061109,2011.Sumioka,Hidenobu,Yoshikawa,Yuichiro,andAsada,Mi-noru.Developmentofjointattentionrelatedactionsbasedonreproducinginteractioncontingency.InICDL,pp.256–261,2008.Verdes,PF.Assessingcausalityfrommultivariatetimeseries.PhysicalReviewE,72(2):026222.1–026222.9,2005.V´a,Javier,Santamar´a,Ignacio,andP´erez,Jes´us.Alearn-ingalgorithmforadaptivecanonicalcorrelationanalysisofseveraldatasets.NeuralNetworks,20(1):139–152,2007.Virtanen,Seppo,Klami,Arto,andKaski,Samuel.Bayesianccaviagroupsparsity.InICML,pp.457–464,2011.Wang,Chong.Variationalbayesianapproachtocanonicalcorrelationanalysis.IEEETransactionsonNeuralNet-works,18(3):905–910,2007.Yamashita,Yuya,Harada,Tatsuya,andKuniyoshi,Yasuo.Causalow.IEEETransactionsonMultimedia,3(3):619–629,2012.Yger,Florian,Berar,Maxime,Gasso,Gilles,andRako-tomamonjy,Alain.Adaptivecanonicalcorrelationanal-ysisbasedonmatrixmanifolds.InICML,pp.1071–1078,2012.

Shom More....