/
Probabilistic Partial Canonical Correlation Analysis Y Probabilistic Partial Canonical Correlation Analysis Y

Probabilistic Partial Canonical Correlation Analysis Y - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
460 views
Uploaded On 2015-05-11

Probabilistic Partial Canonical Correlation Analysis Y - PPT Presentation

Partial CCA is known to be closely related to a causal ity measure between two time series However partial CCA requires the inverses of covariance matrices so the calculation is not stable This is particularly the case for highdimensional data or sm ID: 65152

Partial CCA known

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Probabilistic Partial Canonical Correlat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ProbabilisticPartialCanonicalCorrelationAnalysis YusukeMukutaMUKUTA@MI.T.U-TOKYO.AC.JPGraduateSchoolofInformationScienceandTechnology,TheUniversityofTokyo7–3–1,Hongo,Bunkyo-ku,Tokyo,113-8656,JapanTatsuyaHaradaHARADA@MI.T.U-TOKYO.AC.JPGraduateSchoolofInformationScienceandTechnology,TheUniversityofTokyo7–3–1,Hongo,Bunkyo-ku,Tokyo,113-8656,JapanAbstractPartialcanonicalcorrelationanalysis(partialCCA)isastatisticalmethodthatestimatesapairoflinearprojectionsontoalowdimensionalspace,wherethecorrelationbetweentwomulti-dimensionalvariablesismaximizedafterelimi-natingtheinuenceofathirdvariable.PartialCCAisknowntobecloselyrelatedtoacausal-itymeasurebetweentwotimeseries.However,partialCCArequirestheinversesofcovariancematrices,sothecalculationisnotstable.Thisisparticularlythecaseforhigh-dimensionaldataorsmallsamplesizes.Additionally,wecan-notestimatetheoptimaldimensionofthesub-spaceinthemodel.Inthispaper,wehavead-dressedtheseproblemsbyproposingaproba-bilisticinterpretationofpartialCCAandderiv-ingaBayesianestimationmethodbasedontheprobabilisticmodel.Ournumericalexperimentsdemonstratedthatourmethodscanstablyesti-matethemodelparameters,eveninhighdimen-sionsorwhenthereareasmallnumberofsam-ples.1.IntroductionPartialcanonicalcorrelationanalysis(partialCCA)wasproposedbyRao(1969).Itisastatisticalmethodusedtoestimateapairoflinearprojectionsontoalow-dimensionalspace,wherethecorrelationbetweentwomultidimensionalvariablesismaximizedaftereliminatingtheinuenceofathirdvariable.ThisiscalculatedusingaCCAoftheresidu-alsofalinearregressionofthethirdvariable.Thismethodisageneralizedversionofthepartialcorrelationcoef- Proceedingsofthe31stInternationalConferenceonMachineLearning,Beijing,China,2014.JMLR:W&CPvolume32.Copy-right2014bytheauthor(s).cientformultidimensionaldata.Wedenethevariablesfy1ngNn=12Rd1andfy2ngNn=12Rd2,thethirdvariablefxngNn=12Rdxandthedimensionofthesubspacedz.ThenthepartialCCAiscalculatedusingthegeneraleigen-valueproblem12jx122jx21jxu1=211jxu1;21jx111jx12jxu2=222jxu2;(1)wherem1m2jx=m1xm1x1xxxm2,andabisasamplecovariancematrix.PartialCCAhasvariousap-plicationsinareassuchassocialscience(Kowalskietal.,2003),andcanbeusedasacausalitymeasure.Causalitymeasuresareindicesthatmeasuretheinuenceofonetimeseriesonanother.Transferentropy(Schreiber,2000)isameasurebasedoninformationtheory.Itmea-suresthemagnitudeofachangetotheconditionaldistribu-tionofygivenx,andiscalculatedusingTx!y=ZZZp(y;y(l)1;x(k)1)log2p(yjy(l)1;x(k)1) p(yjy(l)1)dydy(l)1dx(k)1;(2)wherekandldenotetheembeddingdimensions,y(l)1=yT1yT2yTl+1T;andx(k)1=xT1xT2xTk+1T.Shibuyaetal.(2009)showedthatwhenweassumethatthevariablesarenormallydis-tributedandestimatethemodelparametersusingmaxi-mumlikelihoodestimation,transferentropyisequivalenttoGrangercausality(Granger,1969).Grangercausalityisbasedonchangestotheestimationerrorofanautore-gressivemodel.Shibuyaetal.(2011)showedthatwecanusethepartialcanonicalcorrelations,,calculatedus-ingpartialCCAonyandx(k)1andeliminatetheeffectofy(l)1.Then,thetransferentropycanbecalculatedus-ingTx!y=1 2Pmin(d;kdx)=1log21 12.Transferentropy ProbabilisticPartialCanonicalCorrelationAnalysis hasmanyapplicationssuchasbrainanalysis(Ch´avezetal.,2003),medicalscience(Verdes,2005),cognitivedevelop-mentmodelling(Sumiokaetal.,2008),anddetectingmo-tioninamovie(Yamashitaetal.,2012).However,partialCCArequirestheinversesofsampleco-variancematrices,sothecalculationisunstablewhenthevariablesarehighlycorrelated,thedimensionofthedataislarge,ortherearenotenoughdata.Yamashitaetal.regu-larizedthecovariancematrixtosolvethisproblem(2012),buttheappropriateoptimizationofthepluralregularizationparametershasnotbeendetermined.Additionally,wecan-notestimatetheproperdimensionofthesubspaceofthemodel.Wehaveaddressedtheseproblemsbyproposingaprob-abilisticinterpretationofpartialCCA,andbyderivingavariationalBayesianestimationalgorithmforthemodelparametersbasedonthisprobabilisticinterpretation.Ourexperimentsshowthattheproposedmethodscanmoreac-curatelyestimatethesubspacedimension,andcanmorestablyestimatethemodelparametersonbothsyntheticandrealdata,eveninhighdimensionsorwhentherearefewsamples.2.CanonicalCorrelationAnalysisanditsExtensionInthissection,wereviewcanonicalcorrelationanalysis,whichisastatisticalmethodsimilartopartialCCA.Wealsoconsideritfromaprobabilisticperspective.Canonicalcorrelationanalysis(CCA)wasproposedbyHotelling(1936).Itisamethodforndingstatisticaldependenciesbetweentwodatasources.Givenvariablesfy1ngNn=12Rd1andfy2ngNn=12Rd2,andthedimensionofthesubspacedzmin(d1;d2),theCCAcanbecalculatedusingthegeneraleigenvalueproblem1212221u1=211u1;2111112u2=222u2;(3)wherem1m2representsasamplecovariancematrixbe-tweenym1andym2.Theprojectionisadzd(i=1;2)matrixwiththed-throweigenvectorcorrespondingtothed-thlargesteigenvalue.Eacheigenvalueequalsthecorre-lationineachdimension.NumerousstudieshaveextendedCCA,includinganonlinearextensionusingkernels(Lai&Fyfe,2000;Melzeretal.,2001),onlineinferencesofthemodelparameters(V´aetal.,2007;Ygeretal.,2012),andsparsevariants(Hardoon&Shawe-Taylor,2009).BachandJordangaveaprobabilisticinterpretationofCCA(2005),suchthatthemaximumlikelihoodestimatesofthemodelparameterscanbederivedfromtheCCA.Giventhisprobabilisticinterpretation,wecanextendCCAtoproba-bilisticmodels.Figure1showsagraphicalmodelofthe Figure1.GraphicalmodelforprobabilisticCCA.interpretation,wherefzngNn=12Rdzarethelatentvari-ables.ThegenerativemodelisznN(0;Idz);ymnN(mzn+m; m);(4)whereN(;)denotesthemultivariatenormaldistribu-tionwithmeanandcovariance,andIddenotestheddimensionalidentitymatrix.m2Rdmdzand m2Rdmdmarethemodelparametersthatwemustestimate.WedenetheUmdzmatricesashavingtheird-thcolumnequaltothed-theigenvector,andPdz2Rdmdzasadiag-onalmatrixwithd-thelementequaltothed-theigenvalueofEquation(3).Then,themaximumlikelihoodsolutionism=mmUmdzMm; m=mmm(m)T;m= ym;(5)whereMm2RdmdmarearbitrarymatricessuchthatM1MT2=PdzandthespectralnormsofMmaresmallerthanone. ymisthesamplemean1 NPNn=1ymn.Therearesomeextensionsofthisprobabilisticmodel.Theyincludearobustestimationmethodthatassumesastudentdistri-butionfornoise(Archambeauetal.,2006),andanonlin-earextensionthatusesaGaussianprocesslatentvariablemodel(Leen&Fyfe,2006;Eketal.,2008).BayesianCCA(Klami&Kaski,2007;Wang,2007)as-sumesthatthemodelparametersarealsorandomvariables.WangusedaWishartpriorfortheprecisionmatricesofthenoise,anARDprior(Neal,1995)foreachcolumnoftheprojectionmatrices,andderivedavariationalBayesianes-timationalgorithmfortheposteriordistributionofthepa-rameters.Virtanenetal.(2011)reducedthenumberofmodelparametersbyassumingthatthenoisewasisotropicandbyintroducingnon-sharedlatentvariables.Klamietal.(2013)derivedanalgorithmthatsimultaneouslyinferredtheprojectionmatricesforthesharedandnon-sharedvari-ables.Damianouetal.(2012)studiedaBayesianextensionofaGaussianprocesslatentvariablemodel.Fujiwaraetal.(2009)usedBayesianCCAtoestimateimagebasesfromfMRIdata. ProbabilisticPartialCanonicalCorrelationAnalysis Figure2.GraphicalmodelforprobabilisticpartialCCA.3.ProbabilisticInterpretationofPartialCCAInthissection,weproposeagenerativemodelthates-timatesthemaximumlikelihoodparametersusingpartialCCA.Wealsoderiveanexpectation-maximization(EM)algorithmthatestimatesthemodelparametersandlatentvariables.3.1.GenerativeModelWeconsideragenerativemodelthatcombinestheregres-sionsofvariablesthathaveeffectswewanttoeliminateandsharedlatentvariables,asshowninFigure2.ThemodelisdenedasznN(0;Idz);ymnN(mxxn+mzzn+m; m):(6)WewillshowthatthemaximumlikelihoodsolutionargmaxWx;Wz; logp(yjx;x;Wz; )canbecalculatedusingpartialCCA.Tothisend,weshowthattheproposedmodelcanbereducedtothegenerativemodelofprobabilisticCCAEquation(4).WhenwedenetheloglikelihoodLandC= 100 2+1z2z1zT2zT;itholdsthat@L @=NXn=1C112y1ny2n+1x2xxn:BecauseCispositivedenite,thelikelihoodismaxi-mizedwhenissuchthatthepartialderivativeequalszero.Therefore,m= ymmx x:(7)Wedenoteeachdatumminusthesamplemeanasey1n=y1n y1;andsubstituteEquation(7).Then,@L @Wx=NXn=1C1ey1ney2nexTn1x2xexnexTn:Wecanalsoshowthatifthedataspaceisspannedbythesamples,Listhenegativedenitequadraticformofx.SoLismaximizedwhenxissuchthatthepartialderiva-tiveiszero.Therefore,mx=mx1xx:(8)WhenwesubstitutethisintoEquation(6),themodelisequivalenttotheprobabilisticCCAmodelwithinputvari-ablesy0mn=eymnmx1xxexn:Becausethecovariancematricesofthesedataare1 NNXn=1y0m1ny0m2nT=m1m2m1x1xxxm2=m1m2jx;(9)theparameterestimationisreducedtopartialCCA.Tosummarize,themaximumlikelihoodsolutionofthepro-posedmodelcanbewrittenasmx=mx1xx;mz=mmjxUmdzMm; m=mmjxmzmzT;m= ymmx x;(10)whereUmdzdenotesmatricesthathavetheird-thcolumnequaltothed-theigenvector,Pddenotesthediagonalma-trixwithitsd-thelementequaltothed-thcanonicalcorre-lationofEquation(1),andMmarearbitrarymatricesthatsatisfyM1MT2=Pdzandhavespectralnormssmallerthanone.Fromthispoint,weassumethatsampleshavezeromeanandwedonotinferasamplemean.3.2.EMParameterEstimationAswithCCA,wecanestimatethelatentvariablesusingtheEMalgorithmwithoutintegratingthemout.Inthiscase,znfollowsanormaldistributionandtheupdaterulefortimetis(z)=(I+(z)T( )1(z))1;hZi=(z)(z)T( )1(Y(x)X);m+1=YmXhZiTXXTXhZiThZiXThZZTi1;(11) m+1=1 NYmYmT+1XhZiYTmm;where isthematrixwith monitsdiagonal,x;andzarethematricesthathavemxandmzintheircolumns,Ammistheblockmatrixcorrespondingtoeachview,Ymisthematrixthathasymninitsrows,andY=Y1Y2:Additionally,XandZarematriceswithxnandynintheirrows,andhiaretheexpectationsoftherandomvariables. ProbabilisticPartialCanonicalCorrelationAnalysis Figure3.GraphicalmodelforBPCCA.4.BayesianPartialCCAToaddressthepreviouslymentionedweaknessofpartialCCA,weproposeahierarchicalBayesianapproachtotheprobabilisticpartialCCAproposedintheprevioussection.4.1.ModelthatDirectlyusesProbabilisticPartialCCAInthissection,wefollowWang'sapproach(2007)andcon-siderthegenerativemodelshowninFigure3.Ittreatsthemodelparametersproposedintheprevioussectionasran-domvariables.WeuseanARDprior(Neal,1995)foreachcolumnoftheprojectionmatrices,andaninverseWishartpriorforthecovariancematricesofthenoise.Thegenera-tivemodelis mkGamma(a0;b0);m:;kN(0;( mk)1Idm); mIW(m0;Km0);znN(0;Idz);ymnN(mxxn+mzzn; m);(12)wherethepriorforthethirdvariablep(x)doesnotaffecttheinferencewhenp(xn)�0foreachsample,becauseweconsidertheconditionaldistributiongivenxn.HereGamma(a;b)istheGammadistributionwithshapeparam-eteraandscaleparameterb,andIW(;K)istheinverseWishartdistribution.m=mxmz:Wm:;kisthek-thcolumnofm:Thehyperparametersa0;b0;m0;Km0shouldbesmallsothatthepriorsarebroad,butfromthedenitionoftheWishartdistribution,m0�dm1:Inourexperiments,weseta0;b0=1014;m0=dm;Km0=1014Idm:TheARDpriordrivesunnecessarycompo-nentstozero,sowecanestimatethedimensionsofthelatentvariablesbychoosingsufcientlylargedz,orbyrstchoosingasmalldzandthengraduallyincreasingitac-cordingtotheoutputprojectionmatrices.WerefertothismodelasBayesianPCCA(BPCCA).Next,weproposeavariationalBayesianinferencealgo-rithm.Thefullposteriorp(Z;jX;Y)isapproximatedasq(Z;)=q(Z)2Ym=10@q( m)q( m)dmYj=1q(wmj)1A;(13)wherewmjisthej-throwofm.Weapplystandardcycli-calupdatestotheseparatetermsofq.Whenthefactorizeddistributionqhastheformq(),theupdateruleisq()/exphlogp(X;Y;Z;)iZ;k=;q(Z)/exp(hlogp(X;Y;Z;)i):(14)Becausep(X)isindependentoftheothervariables,itfol-lowsthatq()/exphlogp(Y;Z;jX)iZ;k=;q(Z)/exp(hlogp(Y;Z;jX)i);(15)wherehiwithsubscriptsdenotetheexpectationwithre-specttotheapproximateposteriordistributionofthecorre-spondingvariables.Theapproximateposteriordistributionhastheshapeq(zn)=N(zn;zn);q( m)=IW(m;Km);q(wmj)=N(mj;mj);q( m)=YkGamma(am;bmk):(16)Furthermore,theparametersareupdatedaszn= I+Xmh(mz)T( m)1mzi!1;zn=znXmh(mz)Tih( m)1iymnh(mz)T( m)1mxixn;Km=Km0+Ym(Ym)T+hmXXTXZTZXTZZT(m)TiYmXThZTih(m)TihmiXhZiYm;m=m0+N;mj=diagh mi+h( m)1j;jiXXTXhZiThZiXThZZTi1;mj=h( m)1j;:iYmXTZTXl=jh( m)1j;lihml;:iXXTXhZiThZiXThZZTi;am=a0+dm=2;bmk=b0+hkm:;kki=2;(17) ProbabilisticPartialCanonicalCorrelationAnalysis Figure4.GraphicalmodelforGSPCCA.wherediagh miisthediagonalmatrixwithk-thelementh mki:4.2.ModelwithIsotropicNoiseThemodelproposedintheprevioussubsectionrequiresalargenumberofcalculationstoinfernoiseprecisionmatri-ces.Additionally,thepriordistributionhasalargeinu-encewhenthereareasmallnumberofsamples,becausem0�dm1.Therefore,followingtheapproachusedbyKlamietal.(2013),weproposeamodelthatusesisotropicnoiseandnon-sharedlatentvariables.Thegen-erativemodelisznN(0;Idz);zmnN(0;Idzm);(18)ymnNmxxn+Amzn+Bmzmn;(m)1Idm:Whenthezmnareintegratedout,thismodelisequiva-lenttothemodelproposedintheprevioussubsectionwith m=Bm(Bm)T+(m)1Idm.Sowecanconsiderthismodelasequivalenttoimposingalow-rankassump-tiononthecovariancematrices.Tosimultaneouslyesti-mateAandB,wewritez=A(1)B(1)0A(1)0B(1);=xz;andconsiderthemodel mkGamma(a0;b0);m:;kN(0;( mk)1Idm);mGamma(a0;b0);znN(0;I(dz+dz1+dz2));ymnNmxxn+mzzn;(m)1Idm;(19)asshowninFigure4.Thisrepresentationreducesthenum-berofmodelparameters.WerefertothismodelasgroupsparsePCCA(GSPCCA).Thismodelalsorequiressmallhyperparameters.Wehaveuseda0;b0=1014inourex-periments.Additionally,wechoosetheapproximateposte-riorq(Z;)=q(Z)Ym(q(m)q( m)q(m));(20)andtheshapeq(Z)=YnN(zn;z);q(m)=YdN(Wmd;:;Wm);q( m)=YkGamma(am;bmk);q(m)=Gamma(am;bm):(21)TheparametersareupdatedasWm=diagh mi+hmiXXTXhZiThZiXThZZTi1;Wm=YmXThZTi;z= I+Xmhmih(mz)Tmzi!1;hZi=z Xmhmih(mz)TiYmh(mz)TmxiX!;am=a0+dm=2;bmk=b0+h(m)Tmik;k=2;am=a0+Ndm=2;bm=b0+1 2 TrYm(Ym)T2YmXThZTih(m)Ti+Trh(m)TmiXXTXhZiThZiXThZZTi:(22)4.3.OptimizationoftheLinearTransformationoftheLatentVariablesThemaximumlikelihoodsolutionofprobabilisticpartialCCAhasthesamedegreesoffreedomasthelineartrans-formationoflatentvariables.IntheBayesianmodel,weoptimizethistransformationineachiterationtoobtainanapproximatedistributionthatisclosertothepriordistribu-tion.Weexpectthatthisspeedsuptheconvergenceandthatthelatentvariablesaremoreindependent.Thefunc-tiontobemaximizedissimilartothatin(Virtanenetal.,2011),andisdenedasL(R)=Tr(R1hZZTiRT) 2+(d1+d2N)logjRj1 22Xm=1dmdzXk=1log(rTkh(mz)Tmzirk):(23)Tosolvethis,weusetheL-BFGSmethod(Liu&Nocedal,1989)initializedwiththeidentitymatrix.Usingtheopti- ProbabilisticPartialCanonicalCorrelationAnalysis 25 50 100 200 400 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 NW estimation error CV BIC GSPCCA (ours) BPCCA (ours) 25 50 100 200 400 800 0 0.2 0.4 0.6 0.8 1 Naccuracy CV BIC GSPCCA (ours) BPCCA (ours) (a)High-dimensionaldata 25 50 100 200 400 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 NW estimation error CV BIC GSPCCA (ours) BPCCA (ours) 25 50 100 200 400 800 0 0.2 0.4 0.6 0.8 1 Naccuracy CV BIC GSPCCA (ours) BPCCA (ours) (b)Low-dimensionaldataFigure5.ComparisonoftheWxestimationerrorandthemodelaccuracy.TheleftpanelshowstherelativeestimationerrorofWx.Therightpanelshowstheaccuracyofdz.malR,theapproximatedistributionsaretransformedintohZi R1hZi;ZR1ZRT;WmzWmzR;WmzRTWmzR:(24)5.ExperimentsWehaveappliedourmethodstosyntheticandrealdata,toverifythattheycanbeusedwithasmallnumberofsamplesorhigh-dimensionaldata.Wecomparedthestabilityofthemodelselectionandthecausalitymeasures.5.1.ModelSelectionWerstinvestigatedtheestimatesofxanddzusingsyntheticdata.Wedidnotconsiderzbecausethemaximumlikelihoodsolutionofzisnotunique.Wecomparedourmethods(BPCCA,GSPCCA)withthemodelselectiontechniquesusingtheBayesianinformationcriterion(BIC)andve-foldcrossvalidation(CV).Inourmethods,weconsideredthatacomponentkofthesolutionwasactivewhenh mki50,andletdzbeanestimateofthenumberofkforthatareactiveforeachview.Wesetd1=5;d2=4;dx=3,anddz=2forlow-dimensionaldata,andd1=50;d2=50;dx=5,anddz=5for 6 7 8 9 10 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 the number of columns of Wzoutput dz divided by true dz N=25 N=50 N=100 N=200 (a)low-dimensionaldata 12 14 16 18 20 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 the number of columns of Wzoutput dz divided by true dz N=25 N=50 N=100 N=200 (b)high-dimensionaldataFigure6.Comparisonoftheestimatesofdz.Theleftpanelshowstheperformanceonlow-dimensionaldata.Therightpanelcorre-spondstohigh-dimensionaldata. 25 50 100 200 400 0 0.5 1 1.5 2 2.5 Ncausality measure �x-y �y-x (a)=0 25 50 100 200 400 0 0.5 1 1.5 2 2.5 Ncausality measure �x-y �y-x 25 50 100 200 400 2 4 6 8 10 Ncausality measure �x-y �y-x (b)=0:01 25 50 100 200 400 0 1 2 3 Ncausality measure �x-y �y-x 25 50 100 200 400 2 4 6 8 10 Ncausality measure �x-y �y-x (c)=0:1Figure7.Comparisonofthestabilityofacausalitymeasure.(a)andtheleftpanelsof(b)and(c)showtheperformanceofGSPCCA(ours).Therightpanelsof(b)and(c)showtheper-formanceofPCCA.Thebluelineshowstheestimatedcausalitymeasuresforthetruedirection.Thegreenlineshowstheesti-matesforthereversedirection.high-dimensionaldata.Ineachsetting,wegeneratedN=25;50;100;200;300;400;500;600;700;800;900;1000samplesfromagenerativemodel.EachcolumnoftheprojectionmatrixwassampledfromaNormaldistributionwithzeromeanandunitvariance,andthenoisecovariancematriceswereIdz+Pbz 2c=1uuT;foruN(0;Idz):zhadvecolumnsforlow-dimensionaldata,and10columnsforhigh-dimensionaldata.Weconducted50experimentsforeachparameter.FortheBayesianmeth-ods,wedeterminedthatthemethodhadconvergediftherelativechangeinthevariationallowerboundwasbelow104.Asitconvergestoalocalmaxima,weinitializedthemodelbyrandomlysamplingthelatentvariablesfromtheprior,andranthealgorithm10timeschoosingthesolutionwiththebestvariationallowerbound.Foreachmethod,wecalculatedthemeanoftherelativeerrorofxusing ProbabilisticPartialCanonicalCorrelationAnalysis Tr((Wx^Wx)T(Wx^Wx)) Tr(WTxWx);where^xisanestimateofx.Wealsorecordedtheaccuracyrateofthesystem,dz.TheresultsarepresentedinFigure5.IntherightsubgureofFigure5(a),theCVresultishiddenbecauseithasbeenoverwrittenbytheBICresult.BecausewecannotstablycalculatetheBICandCVofnon-BayesianmethodswhenD=50andN=25;50,andtheBICandCVofBPCCAcannotbecalculatedwhenD=50andN=25,wehavenotincludedtheseresults.Theseplotsshowthatexistingmodelselectionmethodsperformpoorlyandthattheac-curacydecreasestozeroinhighdimensions.Conversely,thetwoBayesianmethodsareveryaccurate,evenforhigh-dimensionaldata.BPCCA'sperformancedegradeswhenD=50andN=50,butGSPCCA'sperformancede-gradesmoregradually.Theestimateofxfollowsasim-ilartrend.Theseresultsdemonstratethatourmethodscal-culatethemodelselectionandparameterestimationmoreaccuratelythannon-Bayesianmethods,andthatGSPCCAisthebestmethod.Next,wecomparedthemodel-selectionperformanceofGSPCCAbyvaryingthenumberofcolumnsofzto6,7,8,9,and10forlow-dimensionaldata,and12,14,16,18,20forhigh-dimensionaldata,withN=25;50;200;800:Theperformancewasmeasuredusingthemeanofthenum-berofactivecomponentsdividedbythetruedz:TheresultsareshowninFigure6.Inhighdimensions,theperformanceisalmostoneforalltheparameters.Inlowdimensions,ifN=25theperformancedecreasesgradually.However,thiseffectcanbeignoredbecausethetruedzistwo.Theseresultsindicatethatthenumberofcolumnsinzhaslittleeffectontheperformance,ifitissufcientlylarge.5.2.CausalityMeasurewithSyntheticDataToevaluatethestabilityofthecausalitycalculationsforasmallsampleofhigh-dimensionaldata,wegeneratedatimeseriesusingthefollowinglinearmodel.x=0:5x1+t;x;y2t=0:5y2t1+Wx1+t;y2;y=yT2tyT2tT+t;y;(25)wherethersttwocolumnsofaresampledfromN(0;0:5I20)andtheothercolumnsarezero.t;x;t;y2denotesGaussiannoisewithzeromeanandunitvari-ance.t;yis0whenr=0;andisGaussiannoisewithzeromeanandvariancerI40otherwise.Thetruecausalitydirectionisxy.Therstandsec-ondhalvesofyarestronglycorrelated.Thiscorrela-tionisstrongwhenrissmall.Theoptimaldimensionofthelatentvariablesistwo.Usingthismodel,wesettheembeddingdimensionto1,rto0;0:01;0:1,andthesamplesizetoN=25;50;100;200;400,foreachpa-rameter.WeexpectthatthecausalitymeasuresderivedfromPCCAandprobabilisticPCCAareequivalent,sowecomparedPCCAwithGSPCCA(thebestperformingmethod).WeusedP20d=11 2log21 12asacausalitymea-sureforPCCA.ForGSPCCA,weletkbethecorrela-tionbetweenhY1(k;:)jYiandhY1(k;:)jX1i,andusedPk1 2log21 12kasacausalitymeasure,wherethesumma-tionisovertheactivecomponents.Figure7showsthere-sults.Wehavenotincludedresultsifthesolutioncouldnotbestablyevaluated.ThecausalitymeasureusingPCCAdivergedwhenNwasbelow200,irrespectiveof:Thismeasurealsoincreasedinthedirectionofyx,sothismeasureisunreliablewhenNissmall.However,themea-sureusingGSPCCAwaszerointheyxdirectionwhenNwaslargerthan100,becausetheBayesianmodelmakesdirectionsthathaveanegligibleinuenceconvergetozero.Thisbehaviorhelpseliminatefalsecausalityrelations,butthismodelmayoverlooktruecausalityrelationswhentheinuenceissmall.Insuchcases,wecoulddetectsmallinuencesbymodifyingthehyperparametersoftheARDprior.Thismeasuretendedtodivergewhenwaslessthan0.01andN=50,or=0:1andN=25.However,themeasureinthexydirectionwaslargerthanthatintheyxdirection.TheBayesmodelalsobecomesunstablewhenthereareaninsufcientnumberofsamples.5.3.CausalityMeasurewithRealDataNext,weappliedGSPCCAandPCCAtometeorologicaldata,usingtheGlobalSummaryoftheDay(GSOD)pro-videdbytheNationalClimaticDataCenter(NCDC)onitswebsite.Forthisexperiment,weuseddatafromtheUSAbetweenDecember24,2008andFebruary28,2009.Fig-ure8showstheobservedjetstreamduringthatsamepe-riod.Weselectedseventypesofvariablesthatdidnothaveasubstantialamountofmissingdata:meantemperature,meandewpoint,meanvisibility,meanwindspeed,max-imumsustainedwindspeed,maximumtemperature,andminimumtemperature.Therefore,thetimeserieshassevendimensions.Thelengthwas66.Werandomlychose224targetsbasedondistance,afterexcludingtargetswithmanymissingvalues.Weconductedazero-orderholdformiss-ingvalues.Wesettheembeddingdimensionsto2,3,and4andusedthesamecausalitymeasureasinthesyntheticdataexperiments.Figure9showsourresults.Figure9showsthelargest50indexvalues.BecausethecausalitymeasurethatusedPCCAwiththeembeddingdimensionoffourdivergedinsomepairs,wehaveincludedalltheindexvaluesthatdiverged.Whentheembeddingdimen-sionwastwo,GSPCCAandPCCAhadasimilartendencytoshowastronginformationowfromwesttoeastovertheeasternregion,andfromnorthtosouthinthecentralregion.ThisisconsistentwithFigure8.Whentheembed-dingdimensionswerefour,thearrowsdrawnusingPCCA ProbabilisticPartialCanonicalCorrelationAnalysis (a)19/1/2009 (b)19/2/2009Figure8.WeatherinformationowmapfortheUSA(source:TheCaliforniaRegionalWeatherServer,SanFranciscoUniver-sity).werescatteredoverthemainland,althoughtheindexval-uesusingGSPCCAhadasimilartendencytothosewiththeembeddingdimensionoftwo.ThisresultimpliesthatPCCAovertsthedatawhentheembeddingdimensionishigh.Next,wecalculatedtheaveragearrowlengthusingtheHubenyformula1.Itwas1:0103;1:1103;1:1103km/dayforGSPCCAand9:8102;1:2103;1:7103km/dayforPCCA.ThisshowsthatthecausalitymeasureusingGSPCCAwasmorestableandsimilartotheactualaircurrent,whichwasapproximately8:6102km/day(Shibuyaetal.,2011),evenwhentheembeddingdimen-sionwashigh.Becausethetrueembeddingdimensionisunknown,GSPCCAisamorereliablemethod. 1http://www.kashmir3d.com/kash/manual-e/std_siki.htm GSPCCA(ours)Averagearrowlength:1:0103km/day PCCAAveragearrowlength:9:8102km/day(a)Embeddingdimension=2 GSPCCA(ours)Averagearrowlength:1:1103km/day PCCAAveragearrowlength:1:2103km/day(b)embeddingdimension=3 GSPCCA(ours)Averagearrowlength:1:1103km/day PCCAAveragearrowlength:1:7103km/day(c)Embeddingdimension=4Figure9.WeatherinformationowmapofUSA(2008/12/24–2009/02/28).MapsontheleftwerecalculatedusingGSPCCA,andmapsontherightwerecalculatedusingPCCA.6.ConclusionWeproposedaprobabilisticinterpretationofpartialCCA.WealsopresentedaBayesianextensionandaninferencealgorithmbasedontheprobabilisticinterpretation.Ourex-perimentshavedemonstratedthattheproposedmethodsaremoreappropriateformodelselectionandestimatingcausalrelationsfromtimeseriesthanexistingmethods,whenthereareasmallnumberofsamplesorinhighdi-mensions.WeexpectthatPCCAandcausalitymeasureswillbeextensivelyappliedtomanyareasusingourmeth-ods.OurBayesianpartialCCAmethodcanbeextendedtoarobustestimationmethodusingaStudentdistributionforthenoise(Archambeauetal.,2006),ortoaninferencemethodusingtheonlinevariationalBayestechnique(Hoff-manetal.,2013).Additionally,byconsideringtheprojec-tionmatricesasrandomvariables,wecanconstructamorecomplexmodelthatallowsthecausalrelationtochangeovertime. ProbabilisticPartialCanonicalCorrelationAnalysis ReferencesArchambeau,C´edric,Delannay,Nicolas,andVerleysen,Michel.Robustprobabilisticprojections.InICML,pp.33–40,2006.Bach,FrancisRandJordan,MichaelI.Aprobabilisticin-terpretationofcanonicalcorrelationanalysis.TechnicalReport688,DepartmentofStatistics,UniversityofCal-ifornia,Berkeley,2005.Ch´avez,Mario,Martinerie,Jacques,andLeVanQuyen,Michel.Statisticalassessmentofnonlinearcausality:applicationtoepilepticeegsignals.JournalofNeuro-scienceMethods,124(2):113–128,2003.Damianou,Andreas,Ek,Carl,Titsias,MichalisK,andLawrence,NeilD.Manifoldrelevancedetermination.InICML,pp.145–152,2012.Ek,CarlHenrik,Rihan,Jon,Torr,PhilipHS,Rogez,Gr´egory,andLawrence,NeilD.Ambiguitymodelinginlatentspaces.InMLMI,pp.62–73,2008.Fujiwara,Yusuke,Miyawaki,Yoichi,andKamitani,Yukiyasu.Estimatingimagebasesforvisualimagere-constructionfromhumanbrainactivity.InNIPS,pp.576–584,2009.Granger,CliveWJ.Investigatingcausalrelationsbyecono-metricmodelsandcross-spectralmethods.Economet-rica:JournaloftheEconometricSociety,37(3):424–438,1969.Hardoon,DavidRandShawe-Taylor,John.Sparsecanon-icalcorrelationanalysis.stat,1050:19,2009.Hoffman,M,Blei,D,Wang,Chong,andPaisley,John.Stochasticvariationalinference.JMLR,14:1303–1347,2013.Hotelling,Harold.Relationsbetweentwosetsofvariates.Biometrika,28(3/4):321–377,1936.Klami,ArtoandKaski,Samuel.Localdependentcompo-nents.InICML,pp.425–432,2007.Klami,Arto,Virtanen,Seppo,andKaski,Samuel.Bayesiancanonicalcorrelationanalysis.JMLR,14:965–1003,2013.Kowalski,J,Tu,XM,Jia,G,Perlis,M,Frank,E,Crits-Christoph,P,andKupfer,DJ.Generalizedcovariance-adjustedcanonicalcorrelationanalysiswithapplicationtopsychiatry.Statisticsinmedicine,22(4):595–610,2003.Lai,PeiLingandFyfe,Colin.Kernelandnonlinearcanon-icalcorrelationanalysis.IJNS,10(05):365–377,2000.Leen,GayleandFyfe,Colin.Agaussianprocessla-tentvariablemodelformulationofcanonicalcorrelationanalysis.InESANN,pp.413–418,2006.Liu,DongCandNocedal,Jorge.Onthelimitedmemorybfgsmethodforlargescaleoptimization.Mathematicalprogramming,45(1-3):503–528,1989.Melzer,Thomas,Reiter,Michael,andBischof,Horst.Ker-nelcanonicalcorrelationanalysis.InICANN,pp.353–360,2001.Neal,RadfordM.Bayesianlearningforneuralnetworks.PhDthesis,UniversityofToronto,1995.Rao,BRaja.Partialcanonicalcorrelations.Trabajosdeestadisticaydeinvestigaci´onoperativa,20(2):211–219,1969.Schreiber,Thomas.Measuringinformationtransfer.Phys-icalReviewLetters,85(2):461,2000.Shibuya,Takashi,Harada,Tatsuya,andKuniyoshi,Yasuo.Causalityquanticationanditsapplications:structuringandmodelingofmultivariatetimeseries.InKDD,pp.787–796,2009.Shibuya,Takashi,Harada,Tatsuya,andKuniyoshi,Yasuo.Reliableindexformeasuringinformationow.PhysicalReviewE,84(6):061109,2011.Sumioka,Hidenobu,Yoshikawa,Yuichiro,andAsada,Mi-noru.Developmentofjointattentionrelatedactionsbasedonreproducinginteractioncontingency.InICDL,pp.256–261,2008.Verdes,PF.Assessingcausalityfrommultivariatetimeseries.PhysicalReviewE,72(2):026222.1–026222.9,2005.V´a,Javier,Santamar´a,Ignacio,andP´erez,Jes´us.Alearn-ingalgorithmforadaptivecanonicalcorrelationanalysisofseveraldatasets.NeuralNetworks,20(1):139–152,2007.Virtanen,Seppo,Klami,Arto,andKaski,Samuel.Bayesianccaviagroupsparsity.InICML,pp.457–464,2011.Wang,Chong.Variationalbayesianapproachtocanonicalcorrelationanalysis.IEEETransactionsonNeuralNet-works,18(3):905–910,2007.Yamashita,Yuya,Harada,Tatsuya,andKuniyoshi,Yasuo.Causalow.IEEETransactionsonMultimedia,3(3):619–629,2012.Yger,Florian,Berar,Maxime,Gasso,Gilles,andRako-tomamonjy,Alain.Adaptivecanonicalcorrelationanal-ysisbasedonmatrixmanifolds.InICML,pp.1071–1078,2012.