Some Review and Some New Ideas. Remember the concepts of variance and the standard deviation…. Variance is the square of the standard deviation. Standard deviation (s) - the square root of the sum of the squared deviations from the mean divided by the number of cases. . ID: 565948
DownloadNote - The PPT/PDF document "Analysis of Variance:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Analysis of Variance:Some Review and Some New IdeasSlide2
Remember the concepts of variance and the standard deviation…
Variance is the square of the standard deviation
Standard deviation (s) - the square root of the sum of the squared deviations from the mean divided by the number of cases.
See p. 47 in the text.
We now want to use these concepts in regression analysis.
We will be learning a new statistical test, the F test, which we will use to assess the statistical significance of a regression equation (not just the coefficients)Slide3
We will also use Analysis of Variance (ANOVA)…
To compare difference of more than two means….
Which we’ve done to date with a T test.Slide4
MeanVarianceStandard DeviationCoefficient of VariationSlide5
Steps for calculating variance
1. Calculate the mean of a variable
2. Find the deviations from the mean: subtract the variable mean from each case
3. Square each of the deviations of the mean
4. The variance is the mean of the squared deviations from the mean, so sum the squared deviations from step 3 and divide by the number of
(When we did these steps before we were interested in going on to calculate a standard deviation and coefficient of variation. Now we’ll just stick with variance.)Slide6
1. Calculate the mean of a variable2. Find the deviations from the mean: subtract the variable mean from each caseSlide7
Calculating Variance, cont.
3. Square each of the deviations of the mean4. The variance is the mean of the squared deviations from the mean, so sum the squared deviations from step 3 and divide by the number of casesThe Sum of the squared deviations = 198.950Variance = 198.950/20 = 9.948Slide8
A New Concept: Sum of Squares
The sum of the square deviations from the mean is called the Sum of Squares
Remember when we know nothing else about an interval variable, the best estimate of it is its mean.
By extension, the sum of squares is the best estimate of the sum of squared deviations if we know nothing else about the variable.
But….when we have more information, for example in a statistically significant bivariate regression model, we can improve on the best estimate of the dependent variable by using the information from the independent variable to estimate it.Slide9
The regression equation is a better estimator of food costs than the mean of food costs.Slide10
Statistics TOTAL FOOD COSTS N Valid 638 Missing 0Mean 270.2310Variance 8127.019
Calculating Total Sum of Squares
Multiply the variance by N-1, so Total Sum of Squares = 8127.019*(638-1)Slide11
Calculations for the Regression sum of Squares
Regression sum of squares equals the sum of the squares of the deviations between
(predicted y) and
RSS = Ʃ (
Residual Sum of Squares = TSS - RSSSlide12
Now we want to estimate how much better
To do that, we use the sum of squares calculations
We partition the total sum of squares (TSS), e.g., the sum of square deviations from the mean, into two parts
The first part is the sum of squared deviations using the regression equation (Regression Sum of Squares).
The second part is the sum of squared deviations left over, e.g., not accounted for by the regression equation, or more formally, the TSS-Regression Sum of Squares = the Residual Sum of Squares.Slide13
Now let’s look at what we’ve accomplished…
To do that, we’ll calculate an F test
We need to add information about degrees of freedom.
Remember the concept…how many parameters can one change and still calculate the statistic. If we want to know the mean, and the know the values, we can calculate the mean. If we know the mean, and we know all the values but one, we can calculate that last value. So there is 1 degree of freedom.
For the F test, we need information about the degrees of freedom in the regression model. The formula is k-1 (the number of parameters to be estimated). For the bivariate model, that is a and b, so 2-1=1Slide14
Degrees of freedom continued…
For the Residual Sum of Squares, the degrees of freedom is N-k, so for this model, 638-2 = 636.
We then calculate a mean squares, by dividing the degrees of freedom into the Sum of squares.
The F statistic is the regression mean square divided by the residual mean square.
The probability of the F statistic is drawn from the probability table.Slide15
Another Way to Think about R Square
The Regression Sum of Squares divided by the Total Sum of Squares is a measure of the proportion of variance explained by the model.
So 2070301.432/5176911.308 = .399991049 or ~40%.