2Note that1r1otal Sum of Squares TSS of the data set asTSSni1Syiy2note thatTSSSyygression SSR asRSSni1Syiy2and note that sinceyiwill not be exactly on the regression lineTSSx0000RSSunless the poi ID: 867197
Download Pdf The PPT/PDF document "Linear Regression and Correlation NotesS..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 Linear Regression and Correlation NotesS
Linear Regression and Correlation NotesSuppose there is a data set of n data points (xi,yive plotted these using a scatterplot and it appears that a linear relationship between them is reasonable. Then the least-squaresline (regression line) that best ts these data,^y=^+^bgression coefcients ^mand^bchosen so as to minimize the sum of the square errorsni=1S(yi-^yi)2=ni=1S(yi-i+^b2This says that the regression line that "best ts" the data is the line chosen so as to provide thesmallest aference between the data points (yialues predicted by theregression line (^yi).The values of the regression coefcients are calculated from^m=SxySxxwhereSxx=ni=1Sxi2-(ni=1Sxi)2n=ni=1S(xi-x)2andSxy=ni=1Sxiyi-(ni=1Sxini=1Syin=ni=1S(xi-xyi-y)and^b=y-^andxandyare the means dened byx=ni=1Sxinandy=ni=1SyinThe correlation coefcient is dened to be^r=SxyÖ` `SxxSyywhereSyy=ni=1Syi2-(ni=1Syi)2n=ni=1S(yi-y)2 -2-Note that-
2 1£Ãr£1.otal Sum of Squares (TSS) of the
1£Ãr£1.otal Sum of Squares (TSS) of the data set asTSS=ni=1S(yi-y)2(note thatTSS=Syygression (SSR) asRSS=ni=1Syi-y)2and note that sinceyiwill not be exactly on the regression line,TSSRSS(unless the points areactly on a line in which caseTSS=RSS). Then the closer the points are to the regression line,the closer TSS is to RSS. The Coef®cient of Determination is de®ned to ber2=RSS/TSSoasthe data points get close to being exactly on a line, RSS gets close to TSS and sor2gets close to1. Whenr2is close to 1, the points are said to be highly correlated which means that a very largeproportion ot the Total Sum of Squares is accounted for by the regression (SSR). Thus the Coef-®cient of Determination is a measure of the strength of the straight-line relationship.RSS=Sxy2Sxxand so thatr2=RSSTSS=Sxy2SxxSyy=Ãr2so that the correlation coef®cient can be thought of as measuring hogression line ®tsata set