Partial CCA is known to be closely related to a causal ity measure between two time series However partial CCA requires the inverses of covariance matrices so the calculation is not stable This is particularly the case for highdimensional data or sm ID: 65152
Download Pdf The PPT/PDF document "Probabilistic Partial Canonical Correlat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
ProbabilisticPartialCanonicalCorrelationAnalysis YusukeMukutaMUKUTA@MI.T.U-TOKYO.AC.JPGraduateSchoolofInformationScienceandTechnology,TheUniversityofTokyo731,Hongo,Bunkyo-ku,Tokyo,113-8656,JapanTatsuyaHaradaHARADA@MI.T.U-TOKYO.AC.JPGraduateSchoolofInformationScienceandTechnology,TheUniversityofTokyo731,Hongo,Bunkyo-ku,Tokyo,113-8656,JapanAbstractPartialcanonicalcorrelationanalysis(partialCCA)isastatisticalmethodthatestimatesapairoflinearprojectionsontoalowdimensionalspace,wherethecorrelationbetweentwomulti-dimensionalvariablesismaximizedafterelimi-natingtheinuenceofathirdvariable.PartialCCAisknowntobecloselyrelatedtoacausal-itymeasurebetweentwotimeseries.However,partialCCArequirestheinversesofcovariancematrices,sothecalculationisnotstable.Thisisparticularlythecaseforhigh-dimensionaldataorsmallsamplesizes.Additionally,wecan-notestimatetheoptimaldimensionofthesub-spaceinthemodel.Inthispaper,wehavead-dressedtheseproblemsbyproposingaproba-bilisticinterpretationofpartialCCAandderiv-ingaBayesianestimationmethodbasedontheprobabilisticmodel.Ournumericalexperimentsdemonstratedthatourmethodscanstablyesti-matethemodelparameters,eveninhighdimen-sionsorwhenthereareasmallnumberofsam-ples.1.IntroductionPartialcanonicalcorrelationanalysis(partialCCA)wasproposedbyRao(1969).Itisastatisticalmethodusedtoestimateapairoflinearprojectionsontoalow-dimensionalspace,wherethecorrelationbetweentwomultidimensionalvariablesismaximizedaftereliminatingtheinuenceofathirdvariable.ThisiscalculatedusingaCCAoftheresidu-alsofalinearregressionofthethirdvariable.Thismethodisageneralizedversionofthepartialcorrelationcoef- Proceedingsofthe31stInternationalConferenceonMachineLearning,Beijing,China,2014.JMLR:W&CPvolume32.Copy-right2014bytheauthor(s).cientformultidimensionaldata.Wedenethevariablesfy1ngNn=12Rd1andfy2ngNn=12Rd2,thethirdvariablefxngNn=12Rdxandthedimensionofthesubspacedz.ThenthepartialCCAiscalculatedusingthegeneraleigen-valueproblem12jx 122jx21jxu1=211jxu1;21jx 111jx12jxu2=222jxu2;(1)wherem1m2jx=m1x m1x 1xxxm2,andabisasamplecovariancematrix.PartialCCAhasvariousap-plicationsinareassuchassocialscience(Kowalskietal.,2003),andcanbeusedasacausalitymeasure.Causalitymeasuresareindicesthatmeasuretheinuenceofonetimeseriesonanother.Transferentropy(Schreiber,2000)isameasurebasedoninformationtheory.Itmea-suresthemagnitudeofachangetotheconditionaldistribu-tionofygivenx,andiscalculatedusingTx!y=ZZZp(y;y(l) 1;x(k) 1)log2p(yjy(l) 1;x(k) 1) p(yjy(l) 1)dydy(l) 1dx(k) 1;(2)wherekandldenotetheembeddingdimensions,y(l) 1=yT 1yT 2yT l+1T;andx(k) 1=xT 1xT 2xT k+1T.Shibuyaetal.(2009)showedthatwhenweassumethatthevariablesarenormallydis-tributedandestimatethemodelparametersusingmaxi-mumlikelihoodestimation,transferentropyisequivalenttoGrangercausality(Granger,1969).Grangercausalityisbasedonchangestotheestimationerrorofanautore-gressivemodel.Shibuyaetal.(2011)showedthatwecanusethepartialcanonicalcorrelations,,calculatedus-ingpartialCCAonyandx(k) 1andeliminatetheeffectofy(l) 1.Then,thetransferentropycanbecalculatedus-ingTx!y=1 2Pmin(d;kdx)=1log21 1 2.Transferentropy ProbabilisticPartialCanonicalCorrelationAnalysis hasmanyapplicationssuchasbrainanalysis(Ch´avezetal.,2003),medicalscience(Verdes,2005),cognitivedevelop-mentmodelling(Sumiokaetal.,2008),anddetectingmo-tioninamovie(Yamashitaetal.,2012).However,partialCCArequirestheinversesofsampleco-variancematrices,sothecalculationisunstablewhenthevariablesarehighlycorrelated,thedimensionofthedataislarge,ortherearenotenoughdata.Yamashitaetal.regu-larizedthecovariancematrixtosolvethisproblem(2012),buttheappropriateoptimizationofthepluralregularizationparametershasnotbeendetermined.Additionally,wecan-notestimatetheproperdimensionofthesubspaceofthemodel.Wehaveaddressedtheseproblemsbyproposingaprob-abilisticinterpretationofpartialCCA,andbyderivingavariationalBayesianestimationalgorithmforthemodelparametersbasedonthisprobabilisticinterpretation.Ourexperimentsshowthattheproposedmethodscanmoreac-curatelyestimatethesubspacedimension,andcanmorestablyestimatethemodelparametersonbothsyntheticandrealdata,eveninhighdimensionsorwhentherearefewsamples.2.CanonicalCorrelationAnalysisanditsExtensionInthissection,wereviewcanonicalcorrelationanalysis,whichisastatisticalmethodsimilartopartialCCA.Wealsoconsideritfromaprobabilisticperspective.Canonicalcorrelationanalysis(CCA)wasproposedbyHotelling(1936).Itisamethodforndingstatisticaldependenciesbetweentwodatasources.Givenvariablesfy1ngNn=12Rd1andfy2ngNn=12Rd2,andthedimensionofthesubspacedzmin(d1;d2),theCCAcanbecalculatedusingthegeneraleigenvalueproblem12 12221u1=211u1;21 11112u2=222u2;(3)wherem1m2representsasamplecovariancematrixbe-tweenym1andym2.Theprojectionisadzd(i=1;2)matrixwiththed-throweigenvectorcorrespondingtothed-thlargesteigenvalue.Eacheigenvalueequalsthecorre-lationineachdimension.NumerousstudieshaveextendedCCA,includinganonlinearextensionusingkernels(Lai&Fyfe,2000;Melzeretal.,2001),onlineinferencesofthemodelparameters(V´aetal.,2007;Ygeretal.,2012),andsparsevariants(Hardoon&Shawe-Taylor,2009).BachandJordangaveaprobabilisticinterpretationofCCA(2005),suchthatthemaximumlikelihoodestimatesofthemodelparameterscanbederivedfromtheCCA.Giventhisprobabilisticinterpretation,wecanextendCCAtoproba-bilisticmodels.Figure1showsagraphicalmodelofthe Figure1.GraphicalmodelforprobabilisticCCA.interpretation,wherefzngNn=12Rdzarethelatentvari-ables.ThegenerativemodelisznN(0;Idz);ymnN(mzn+m; m);(4)whereN(;)denotesthemultivariatenormaldistribu-tionwithmeanandcovariance,andIddenotestheddimensionalidentitymatrix.m2Rdmdzand m2Rdmdmarethemodelparametersthatwemustestimate.WedenetheUmdzmatricesashavingtheird-thcolumnequaltothed-theigenvector,andPdz2Rdmdzasadiag-onalmatrixwithd-thelementequaltothed-theigenvalueofEquation(3).Then,themaximumlikelihoodsolutionism=mmUmdzMm; m=mm m(m)T;m= ym;(5)whereMm2RdmdmarearbitrarymatricessuchthatM1MT2=PdzandthespectralnormsofMmaresmallerthanone. ymisthesamplemean1 NPNn=1ymn.Therearesomeextensionsofthisprobabilisticmodel.Theyincludearobustestimationmethodthatassumesastudentdistri-butionfornoise(Archambeauetal.,2006),andanonlin-earextensionthatusesaGaussianprocesslatentvariablemodel(Leen&Fyfe,2006;Eketal.,2008).BayesianCCA(Klami&Kaski,2007;Wang,2007)as-sumesthatthemodelparametersarealsorandomvariables.WangusedaWishartpriorfortheprecisionmatricesofthenoise,anARDprior(Neal,1995)foreachcolumnoftheprojectionmatrices,andderivedavariationalBayesianes-timationalgorithmfortheposteriordistributionofthepa-rameters.Virtanenetal.(2011)reducedthenumberofmodelparametersbyassumingthatthenoisewasisotropicandbyintroducingnon-sharedlatentvariables.Klamietal.(2013)derivedanalgorithmthatsimultaneouslyinferredtheprojectionmatricesforthesharedandnon-sharedvari-ables.Damianouetal.(2012)studiedaBayesianextensionofaGaussianprocesslatentvariablemodel.Fujiwaraetal.(2009)usedBayesianCCAtoestimateimagebasesfromfMRIdata. ProbabilisticPartialCanonicalCorrelationAnalysis Figure2.GraphicalmodelforprobabilisticpartialCCA.3.ProbabilisticInterpretationofPartialCCAInthissection,weproposeagenerativemodelthates-timatesthemaximumlikelihoodparametersusingpartialCCA.Wealsoderiveanexpectation-maximization(EM)algorithmthatestimatesthemodelparametersandlatentvariables.3.1.GenerativeModelWeconsideragenerativemodelthatcombinestheregres-sionsofvariablesthathaveeffectswewanttoeliminateandsharedlatentvariables,asshowninFigure2.ThemodelisdenedasznN(0;Idz);ymnN(mxxn+mzzn+m; m):(6)WewillshowthatthemaximumlikelihoodsolutionargmaxWx;Wz; logp(yjx;x;Wz; )canbecalculatedusingpartialCCA.Tothisend,weshowthattheproposedmodelcanbereducedtothegenerativemodelofprobabilisticCCAEquation(4).WhenwedenetheloglikelihoodLandC= 100 2+1z2z1zT2zT;itholdsthat@L @= NXn=1C 112 y1ny2n+1x2xxn:BecauseCispositivedenite,thelikelihoodismaxi-mizedwhenissuchthatthepartialderivativeequalszero.Therefore,m= ym mx x:(7)Wedenoteeachdatumminusthesamplemeanasey1n=y1n y1;andsubstituteEquation(7).Then,@L @Wx=NXn=1C 1ey1ney2nexTn 1x2xexnexTn:Wecanalsoshowthatifthedataspaceisspannedbythesamples,Listhenegativedenitequadraticformofx.SoLismaximizedwhenxissuchthatthepartialderiva-tiveiszero.Therefore,mx=mx 1xx:(8)WhenwesubstitutethisintoEquation(6),themodelisequivalenttotheprobabilisticCCAmodelwithinputvari-ablesy0mn=eymn mx 1xxexn:Becausethecovariancematricesofthesedataare1 NNXn=1y0m1ny0m2nT=m1m2 m1x 1xxxm2=m1m2jx;(9)theparameterestimationisreducedtopartialCCA.Tosummarize,themaximumlikelihoodsolutionofthepro-posedmodelcanbewrittenasmx=mx 1xx;mz=mmjxUmdzMm; m=mmjx mzmzT;m= ym mx x;(10)whereUmdzdenotesmatricesthathavetheird-thcolumnequaltothed-theigenvector,Pddenotesthediagonalma-trixwithitsd-thelementequaltothed-thcanonicalcorre-lationofEquation(1),andMmarearbitrarymatricesthatsatisfyM1MT2=Pdzandhavespectralnormssmallerthanone.Fromthispoint,weassumethatsampleshavezeromeanandwedonotinferasamplemean.3.2.EMParameterEstimationAswithCCA,wecanestimatethelatentvariablesusingtheEMalgorithmwithoutintegratingthemout.Inthiscase,znfollowsanormaldistributionandtheupdaterulefortimetis(z)=(I+(z)T( ) 1(z)) 1;hZi=(z)(z)T( ) 1(Y (x)X);m+1=YmXhZiTXXTXhZiThZiXThZZTi 1;(11) m+1=1 NYmYmT +1XhZiYTmm;where isthematrixwith monitsdiagonal,x;andzarethematricesthathavemxandmzintheircolumns,Ammistheblockmatrixcorrespondingtoeachview,Ymisthematrixthathasymninitsrows,andY=Y1Y2:Additionally,XandZarematriceswithxnandynintheirrows,andhiaretheexpectationsoftherandomvariables. ProbabilisticPartialCanonicalCorrelationAnalysis Figure3.GraphicalmodelforBPCCA.4.BayesianPartialCCAToaddressthepreviouslymentionedweaknessofpartialCCA,weproposeahierarchicalBayesianapproachtotheprobabilisticpartialCCAproposedintheprevioussection.4.1.ModelthatDirectlyusesProbabilisticPartialCCAInthissection,wefollowWang'sapproach(2007)andcon-siderthegenerativemodelshowninFigure3.Ittreatsthemodelparametersproposedintheprevioussectionasran-domvariables.WeuseanARDprior(Neal,1995)foreachcolumnoftheprojectionmatrices,andaninverseWishartpriorforthecovariancematricesofthenoise.Thegenera-tivemodelismkGamma(a0;b0);m:;kN(0;(mk) 1Idm); mIW(m0;Km0);znN(0;Idz);ymnN(mxxn+mzzn; m);(12)wherethepriorforthethirdvariablep(x)doesnotaffecttheinferencewhenp(xn)0foreachsample,becauseweconsidertheconditionaldistributiongivenxn.HereGamma(a;b)istheGammadistributionwithshapeparam-eteraandscaleparameterb,andIW(;K)istheinverseWishartdistribution.m=mxmz:Wm:;kisthek-thcolumnofm:Thehyperparametersa0;b0;m0;Km0shouldbesmallsothatthepriorsarebroad,butfromthedenitionoftheWishartdistribution,m0dm 1:Inourexperiments,weseta0;b0=10 14;m0=dm;Km0=10 14Idm:TheARDpriordrivesunnecessarycompo-nentstozero,sowecanestimatethedimensionsofthelatentvariablesbychoosingsufcientlylargedz,orbyrstchoosingasmalldzandthengraduallyincreasingitac-cordingtotheoutputprojectionmatrices.WerefertothismodelasBayesianPCCA(BPCCA).Next,weproposeavariationalBayesianinferencealgo-rithm.Thefullposteriorp(Z;jX;Y)isapproximatedasq(Z;)=q(Z)2Ym=10@q( m)q(m)dmYj=1q(wmj)1A;(13)wherewmjisthej-throwofm.Weapplystandardcycli-calupdatestotheseparatetermsofq.Whenthefactorizeddistributionqhastheformq(),theupdateruleisq()/exphlogp(X;Y;Z;)iZ;k=;q(Z)/exp(hlogp(X;Y;Z;)i):(14)Becausep(X)isindependentoftheothervariables,itfol-lowsthatq()/exphlogp(Y;Z;jX)iZ;k=;q(Z)/exp(hlogp(Y;Z;jX)i);(15)wherehiwithsubscriptsdenotetheexpectationwithre-specttotheapproximateposteriordistributionofthecorre-spondingvariables.Theapproximateposteriordistributionhastheshapeq(zn)=N(zn;zn);q( m)=IW(m;Km);q(wmj)=N(mj;mj);q(m)=YkGamma(am;bmk):(16)Furthermore,theparametersareupdatedaszn= I+Xmh(mz)T( m) 1mzi! 1;zn=znXmh(mz)Tih( m) 1iymn h(mz)T( m) 1mxixn;Km=Km0+Ym(Ym)T+hmXXTXZTZXTZZT(m)Ti YmXThZTih(m)Ti hmiXhZiYm;m=m0+N;mj=diaghmi+h( m) 1j;jiXXTXhZiThZiXThZZTi 1;mj=h( m) 1j;:iYmXTZT Xl=jh( m) 1j;lihml;:iXXTXhZiThZiXThZZTi;am=a0+dm=2;bmk=b0+hkm:;kki=2;(17) ProbabilisticPartialCanonicalCorrelationAnalysis Figure4.GraphicalmodelforGSPCCA.wherediaghmiisthediagonalmatrixwithk-thelementhmki:4.2.ModelwithIsotropicNoiseThemodelproposedintheprevioussubsectionrequiresalargenumberofcalculationstoinfernoiseprecisionmatri-ces.Additionally,thepriordistributionhasalargeinu-encewhenthereareasmallnumberofsamples,becausem0dm 1.Therefore,followingtheapproachusedbyKlamietal.(2013),weproposeamodelthatusesisotropicnoiseandnon-sharedlatentvariables.Thegen-erativemodelisznN(0;Idz);zmnN(0;Idzm);(18)ymnNmxxn+Amzn+Bmzmn;(m) 1Idm:Whenthezmnareintegratedout,thismodelisequiva-lenttothemodelproposedintheprevioussubsectionwith m=Bm(Bm)T+(m) 1Idm.Sowecanconsiderthismodelasequivalenttoimposingalow-rankassump-tiononthecovariancematrices.Tosimultaneouslyesti-mateAandB,wewritez=A(1)B(1)0A(1)0B(1);=xz;andconsiderthemodelmkGamma(a0;b0);m:;kN(0;(mk) 1Idm);mGamma(a0;b0);znN(0;I(dz+dz1+dz2));ymnNmxxn+mzzn;(m) 1Idm;(19)asshowninFigure4.Thisrepresentationreducesthenum-berofmodelparameters.WerefertothismodelasgroupsparsePCCA(GSPCCA).Thismodelalsorequiressmallhyperparameters.Wehaveuseda0;b0=10 14inourex-periments.Additionally,wechoosetheapproximateposte-riorq(Z;)=q(Z)Ym(q(m)q(m)q(m));(20)andtheshapeq(Z)=YnN(zn;z);q(m)=YdN(Wmd;:;Wm);q(m)=YkGamma(am;bmk);q(m)=Gamma(am;bm):(21)TheparametersareupdatedasWm=diaghmi+hmiXXTXhZiThZiXThZZTi 1;Wm=YmXThZTi;z= I+Xmhmih(mz)Tmzi! 1;hZi=z Xmhmih(mz)TiYm h(mz)TmxiX!;am=a0+dm=2;bmk=b0+h(m)Tmik;k=2;am=a0+Ndm=2;bm=b0+1 2 TrYm(Ym)T 2YmXThZTih(m)Ti+Trh(m)TmiXXTXhZiThZiXThZZTi:(22)4.3.OptimizationoftheLinearTransformationoftheLatentVariablesThemaximumlikelihoodsolutionofprobabilisticpartialCCAhasthesamedegreesoffreedomasthelineartrans-formationoflatentvariables.IntheBayesianmodel,weoptimizethistransformationineachiterationtoobtainanapproximatedistributionthatisclosertothepriordistribu-tion.Weexpectthatthisspeedsuptheconvergenceandthatthelatentvariablesaremoreindependent.Thefunc-tiontobemaximizedissimilartothatin(Virtanenetal.,2011),andisdenedasL(R)= Tr(R 1hZZTiR T) 2+(d1+d2 N)logjRj 1 22Xm=1dmdzXk=1log(rTkh(mz)Tmzirk):(23)Tosolvethis,weusetheL-BFGSmethod(Liu&Nocedal,1989)initializedwiththeidentitymatrix.Usingtheopti- ProbabilisticPartialCanonicalCorrelationAnalysis 25 50 100 200 400 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 NW estimation error CV BIC GSPCCA (ours) BPCCA (ours) 25 50 100 200 400 800 0 0.2 0.4 0.6 0.8 1 Naccuracy CV BIC GSPCCA (ours) BPCCA (ours) (a)High-dimensionaldata 25 50 100 200 400 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 NW estimation error CV BIC GSPCCA (ours) BPCCA (ours) 25 50 100 200 400 800 0 0.2 0.4 0.6 0.8 1 Naccuracy CV BIC GSPCCA (ours) BPCCA (ours) (b)Low-dimensionaldataFigure5.ComparisonoftheWxestimationerrorandthemodelaccuracy.TheleftpanelshowstherelativeestimationerrorofWx.Therightpanelshowstheaccuracyofdz.malR,theapproximatedistributionsaretransformedintohZi R 1hZi;ZR 1ZR T;WmzWmzR;WmzRTWmzR:(24)5.ExperimentsWehaveappliedourmethodstosyntheticandrealdata,toverifythattheycanbeusedwithasmallnumberofsamplesorhigh-dimensionaldata.Wecomparedthestabilityofthemodelselectionandthecausalitymeasures.5.1.ModelSelectionWerstinvestigatedtheestimatesofxanddzusingsyntheticdata.Wedidnotconsiderzbecausethemaximumlikelihoodsolutionofzisnotunique.Wecomparedourmethods(BPCCA,GSPCCA)withthemodelselectiontechniquesusingtheBayesianinformationcriterion(BIC)andve-foldcrossvalidation(CV).Inourmethods,weconsideredthatacomponentkofthesolutionwasactivewhenhmki50,andletdzbeanestimateofthenumberofkforthatareactiveforeachview.Wesetd1=5;d2=4;dx=3,anddz=2forlow-dimensionaldata,andd1=50;d2=50;dx=5,anddz=5for 6 7 8 9 10 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 the number of columns of Wzoutput dz divided by true dz N=25 N=50 N=100 N=200 (a)low-dimensionaldata 12 14 16 18 20 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 the number of columns of Wzoutput dz divided by true dz N=25 N=50 N=100 N=200 (b)high-dimensionaldataFigure6.Comparisonoftheestimatesofdz.Theleftpanelshowstheperformanceonlow-dimensionaldata.Therightpanelcorre-spondstohigh-dimensionaldata. 25 50 100 200 400 0 0.5 1 1.5 2 2.5 Ncausality measure x-y y-x (a)=0 25 50 100 200 400 0 0.5 1 1.5 2 2.5 Ncausality measure x-y y-x 25 50 100 200 400 2 4 6 8 10 Ncausality measure x-y y-x (b)=0:01 25 50 100 200 400 0 1 2 3 Ncausality measure x-y y-x 25 50 100 200 400 2 4 6 8 10 Ncausality measure x-y y-x (c)=0:1Figure7.Comparisonofthestabilityofacausalitymeasure.(a)andtheleftpanelsof(b)and(c)showtheperformanceofGSPCCA(ours).Therightpanelsof(b)and(c)showtheper-formanceofPCCA.Thebluelineshowstheestimatedcausalitymeasuresforthetruedirection.Thegreenlineshowstheesti-matesforthereversedirection.high-dimensionaldata.Ineachsetting,wegeneratedN=25;50;100;200;300;400;500;600;700;800;900;1000samplesfromagenerativemodel.EachcolumnoftheprojectionmatrixwassampledfromaNormaldistributionwithzeromeanandunitvariance,andthenoisecovariancematriceswereIdz+Pbz 2c=1uuT;foruN(0;Idz):zhadvecolumnsforlow-dimensionaldata,and10columnsforhigh-dimensionaldata.Weconducted50experimentsforeachparameter.FortheBayesianmeth-ods,wedeterminedthatthemethodhadconvergediftherelativechangeinthevariationallowerboundwasbelow10 4.Asitconvergestoalocalmaxima,weinitializedthemodelbyrandomlysamplingthelatentvariablesfromtheprior,andranthealgorithm10timeschoosingthesolutionwiththebestvariationallowerbound.Foreachmethod,wecalculatedthemeanoftherelativeerrorofxusing ProbabilisticPartialCanonicalCorrelationAnalysis Tr((Wx ^Wx)T(Wx ^Wx)) Tr(WTxWx);where^xisanestimateofx.Wealsorecordedtheaccuracyrateofthesystem,dz.TheresultsarepresentedinFigure5.IntherightsubgureofFigure5(a),theCVresultishiddenbecauseithasbeenoverwrittenbytheBICresult.BecausewecannotstablycalculatetheBICandCVofnon-BayesianmethodswhenD=50andN=25;50,andtheBICandCVofBPCCAcannotbecalculatedwhenD=50andN=25,wehavenotincludedtheseresults.Theseplotsshowthatexistingmodelselectionmethodsperformpoorlyandthattheac-curacydecreasestozeroinhighdimensions.Conversely,thetwoBayesianmethodsareveryaccurate,evenforhigh-dimensionaldata.BPCCA'sperformancedegradeswhenD=50andN=50,butGSPCCA'sperformancede-gradesmoregradually.Theestimateofxfollowsasim-ilartrend.Theseresultsdemonstratethatourmethodscal-culatethemodelselectionandparameterestimationmoreaccuratelythannon-Bayesianmethods,andthatGSPCCAisthebestmethod.Next,wecomparedthemodel-selectionperformanceofGSPCCAbyvaryingthenumberofcolumnsofzto6,7,8,9,and10forlow-dimensionaldata,and12,14,16,18,20forhigh-dimensionaldata,withN=25;50;200;800:Theperformancewasmeasuredusingthemeanofthenum-berofactivecomponentsdividedbythetruedz:TheresultsareshowninFigure6.Inhighdimensions,theperformanceisalmostoneforalltheparameters.Inlowdimensions,ifN=25theperformancedecreasesgradually.However,thiseffectcanbeignoredbecausethetruedzistwo.Theseresultsindicatethatthenumberofcolumnsinzhaslittleeffectontheperformance,ifitissufcientlylarge.5.2.CausalityMeasurewithSyntheticDataToevaluatethestabilityofthecausalitycalculationsforasmallsampleofhigh-dimensionaldata,wegeneratedatimeseriesusingthefollowinglinearmodel.x=0:5x 1+t;x;y2t=0:5y2t 1+Wx 1+t;y2;y=yT2tyT2tT+t;y;(25)wherethersttwocolumnsofaresampledfromN(0;0:5I20)andtheothercolumnsarezero.t;x;t;y2denotesGaussiannoisewithzeromeanandunitvari-ance.t;yis0whenr=0;andisGaussiannoisewithzeromeanandvariancerI40otherwise.Thetruecausalitydirectionisxy.Therstandsec-ondhalvesofyarestronglycorrelated.Thiscorrela-tionisstrongwhenrissmall.Theoptimaldimensionofthelatentvariablesistwo.Usingthismodel,wesettheembeddingdimensionto1,rto0;0:01;0:1,andthesamplesizetoN=25;50;100;200;400,foreachpa-rameter.WeexpectthatthecausalitymeasuresderivedfromPCCAandprobabilisticPCCAareequivalent,sowecomparedPCCAwithGSPCCA(thebestperformingmethod).WeusedP20d=11 2log21 1 2asacausalitymea-sureforPCCA.ForGSPCCA,weletkbethecorrela-tionbetweenhY 1(k;:)jYiandhY 1(k;:)jX 1i,andusedPk1 2log21 1 2kasacausalitymeasure,wherethesumma-tionisovertheactivecomponents.Figure7showsthere-sults.Wehavenotincludedresultsifthesolutioncouldnotbestablyevaluated.ThecausalitymeasureusingPCCAdivergedwhenNwasbelow200,irrespectiveof:Thismeasurealsoincreasedinthedirectionofyx,sothismeasureisunreliablewhenNissmall.However,themea-sureusingGSPCCAwaszerointheyxdirectionwhenNwaslargerthan100,becausetheBayesianmodelmakesdirectionsthathaveanegligibleinuenceconvergetozero.Thisbehaviorhelpseliminatefalsecausalityrelations,butthismodelmayoverlooktruecausalityrelationswhentheinuenceissmall.Insuchcases,wecoulddetectsmallinuencesbymodifyingthehyperparametersoftheARDprior.Thismeasuretendedtodivergewhenwaslessthan0.01andN=50,or=0:1andN=25.However,themeasureinthexydirectionwaslargerthanthatintheyxdirection.TheBayesmodelalsobecomesunstablewhenthereareaninsufcientnumberofsamples.5.3.CausalityMeasurewithRealDataNext,weappliedGSPCCAandPCCAtometeorologicaldata,usingtheGlobalSummaryoftheDay(GSOD)pro-videdbytheNationalClimaticDataCenter(NCDC)onitswebsite.Forthisexperiment,weuseddatafromtheUSAbetweenDecember24,2008andFebruary28,2009.Fig-ure8showstheobservedjetstreamduringthatsamepe-riod.Weselectedseventypesofvariablesthatdidnothaveasubstantialamountofmissingdata:meantemperature,meandewpoint,meanvisibility,meanwindspeed,max-imumsustainedwindspeed,maximumtemperature,andminimumtemperature.Therefore,thetimeserieshassevendimensions.Thelengthwas66.Werandomlychose224targetsbasedondistance,afterexcludingtargetswithmanymissingvalues.Weconductedazero-orderholdformiss-ingvalues.Wesettheembeddingdimensionsto2,3,and4andusedthesamecausalitymeasureasinthesyntheticdataexperiments.Figure9showsourresults.Figure9showsthelargest50indexvalues.BecausethecausalitymeasurethatusedPCCAwiththeembeddingdimensionoffourdivergedinsomepairs,wehaveincludedalltheindexvaluesthatdiverged.Whentheembeddingdimen-sionwastwo,GSPCCAandPCCAhadasimilartendencytoshowastronginformationowfromwesttoeastovertheeasternregion,andfromnorthtosouthinthecentralregion.ThisisconsistentwithFigure8.Whentheembed-dingdimensionswerefour,thearrowsdrawnusingPCCA ProbabilisticPartialCanonicalCorrelationAnalysis (a)19/1/2009 (b)19/2/2009Figure8.WeatherinformationowmapfortheUSA(source:TheCaliforniaRegionalWeatherServer,SanFranciscoUniver-sity).werescatteredoverthemainland,althoughtheindexval-uesusingGSPCCAhadasimilartendencytothosewiththeembeddingdimensionoftwo.ThisresultimpliesthatPCCAovertsthedatawhentheembeddingdimensionishigh.Next,wecalculatedtheaveragearrowlengthusingtheHubenyformula1.Itwas1:0103;1:1103;1:1103km/dayforGSPCCAand9:8102;1:2103;1:7103km/dayforPCCA.ThisshowsthatthecausalitymeasureusingGSPCCAwasmorestableandsimilartotheactualaircurrent,whichwasapproximately8:6102km/day(Shibuyaetal.,2011),evenwhentheembeddingdimen-sionwashigh.Becausethetrueembeddingdimensionisunknown,GSPCCAisamorereliablemethod. 1http://www.kashmir3d.com/kash/manual-e/std_siki.htm GSPCCA(ours)Averagearrowlength:1:0103km/day PCCAAveragearrowlength:9:8102km/day(a)Embeddingdimension=2 GSPCCA(ours)Averagearrowlength:1:1103km/day PCCAAveragearrowlength:1:2103km/day(b)embeddingdimension=3 GSPCCA(ours)Averagearrowlength:1:1103km/day PCCAAveragearrowlength:1:7103km/day(c)Embeddingdimension=4Figure9.WeatherinformationowmapofUSA(2008/12/242009/02/28).MapsontheleftwerecalculatedusingGSPCCA,andmapsontherightwerecalculatedusingPCCA.6.ConclusionWeproposedaprobabilisticinterpretationofpartialCCA.WealsopresentedaBayesianextensionandaninferencealgorithmbasedontheprobabilisticinterpretation.Ourex-perimentshavedemonstratedthattheproposedmethodsaremoreappropriateformodelselectionandestimatingcausalrelationsfromtimeseriesthanexistingmethods,whenthereareasmallnumberofsamplesorinhighdi-mensions.WeexpectthatPCCAandcausalitymeasureswillbeextensivelyappliedtomanyareasusingourmeth-ods.OurBayesianpartialCCAmethodcanbeextendedtoarobustestimationmethodusingaStudentdistributionforthenoise(Archambeauetal.,2006),ortoaninferencemethodusingtheonlinevariationalBayestechnique(Hoff-manetal.,2013).Additionally,byconsideringtheprojec-tionmatricesasrandomvariables,wecanconstructamorecomplexmodelthatallowsthecausalrelationtochangeovertime. ProbabilisticPartialCanonicalCorrelationAnalysis ReferencesArchambeau,C´edric,Delannay,Nicolas,andVerleysen,Michel.Robustprobabilisticprojections.InICML,pp.3340,2006.Bach,FrancisRandJordan,MichaelI.Aprobabilisticin-terpretationofcanonicalcorrelationanalysis.TechnicalReport688,DepartmentofStatistics,UniversityofCal-ifornia,Berkeley,2005.Ch´avez,Mario,Martinerie,Jacques,andLeVanQuyen,Michel.Statisticalassessmentofnonlinearcausality:applicationtoepilepticeegsignals.JournalofNeuro-scienceMethods,124(2):113128,2003.Damianou,Andreas,Ek,Carl,Titsias,MichalisK,andLawrence,NeilD.Manifoldrelevancedetermination.InICML,pp.145152,2012.Ek,CarlHenrik,Rihan,Jon,Torr,PhilipHS,Rogez,Gr´egory,andLawrence,NeilD.Ambiguitymodelinginlatentspaces.InMLMI,pp.6273,2008.Fujiwara,Yusuke,Miyawaki,Yoichi,andKamitani,Yukiyasu.Estimatingimagebasesforvisualimagere-constructionfromhumanbrainactivity.InNIPS,pp.576584,2009.Granger,CliveWJ.Investigatingcausalrelationsbyecono-metricmodelsandcross-spectralmethods.Economet-rica:JournaloftheEconometricSociety,37(3):424438,1969.Hardoon,DavidRandShawe-Taylor,John.Sparsecanon-icalcorrelationanalysis.stat,1050:19,2009.Hoffman,M,Blei,D,Wang,Chong,andPaisley,John.Stochasticvariationalinference.JMLR,14:13031347,2013.Hotelling,Harold.Relationsbetweentwosetsofvariates.Biometrika,28(3/4):321377,1936.Klami,ArtoandKaski,Samuel.Localdependentcompo-nents.InICML,pp.425432,2007.Klami,Arto,Virtanen,Seppo,andKaski,Samuel.Bayesiancanonicalcorrelationanalysis.JMLR,14:9651003,2013.Kowalski,J,Tu,XM,Jia,G,Perlis,M,Frank,E,Crits-Christoph,P,andKupfer,DJ.Generalizedcovariance-adjustedcanonicalcorrelationanalysiswithapplicationtopsychiatry.Statisticsinmedicine,22(4):595610,2003.Lai,PeiLingandFyfe,Colin.Kernelandnonlinearcanon-icalcorrelationanalysis.IJNS,10(05):365377,2000.Leen,GayleandFyfe,Colin.Agaussianprocessla-tentvariablemodelformulationofcanonicalcorrelationanalysis.InESANN,pp.413418,2006.Liu,DongCandNocedal,Jorge.Onthelimitedmemorybfgsmethodforlargescaleoptimization.Mathematicalprogramming,45(1-3):503528,1989.Melzer,Thomas,Reiter,Michael,andBischof,Horst.Ker-nelcanonicalcorrelationanalysis.InICANN,pp.353360,2001.Neal,RadfordM.Bayesianlearningforneuralnetworks.PhDthesis,UniversityofToronto,1995.Rao,BRaja.Partialcanonicalcorrelations.Trabajosdeestadisticaydeinvestigaci´onoperativa,20(2):211219,1969.Schreiber,Thomas.Measuringinformationtransfer.Phys-icalReviewLetters,85(2):461,2000.Shibuya,Takashi,Harada,Tatsuya,andKuniyoshi,Yasuo.Causalityquanticationanditsapplications:structuringandmodelingofmultivariatetimeseries.InKDD,pp.787796,2009.Shibuya,Takashi,Harada,Tatsuya,andKuniyoshi,Yasuo.Reliableindexformeasuringinformationow.PhysicalReviewE,84(6):061109,2011.Sumioka,Hidenobu,Yoshikawa,Yuichiro,andAsada,Mi-noru.Developmentofjointattentionrelatedactionsbasedonreproducinginteractioncontingency.InICDL,pp.256261,2008.Verdes,PF.Assessingcausalityfrommultivariatetimeseries.PhysicalReviewE,72(2):026222.1026222.9,2005.V´a,Javier,Santamar´a,Ignacio,andP´erez,Jes´us.Alearn-ingalgorithmforadaptivecanonicalcorrelationanalysisofseveraldatasets.NeuralNetworks,20(1):139152,2007.Virtanen,Seppo,Klami,Arto,andKaski,Samuel.Bayesianccaviagroupsparsity.InICML,pp.457464,2011.Wang,Chong.Variationalbayesianapproachtocanonicalcorrelationanalysis.IEEETransactionsonNeuralNet-works,18(3):905910,2007.Yamashita,Yuya,Harada,Tatsuya,andKuniyoshi,Yasuo.Causalow.IEEETransactionsonMultimedia,3(3):619629,2012.Yger,Florian,Berar,Maxime,Gasso,Gilles,andRako-tomamonjy,Alain.Adaptivecanonicalcorrelationanal-ysisbasedonmatrixmanifolds.InICML,pp.10711078,2012.