/
Copyright © Cengage Learning. All rights reserved.   13 Nonlinear and Multiple Regression Copyright © Cengage Learning. All rights reserved.   13 Nonlinear and Multiple Regression

Copyright © Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
344 views
Uploaded On 2019-11-04

Copyright © Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression - PPT Presentation

Copyright Cengage Learning All rights reserved 13 Nonlinear and Multiple Regression Copyright Cengage Learning All rights reserved 134 Multiple Regression Analysis Multiple Regression Analysis ID: 763113

predictors model interaction regression model predictors regression interaction multiple models variable figure values quadratic function data cont

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Copyright © Cengage Learning. All right..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Copyright © Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

Copyright © Cengage Learning. All rights reserved. 13.4 Multiple Regression Analysis

Multiple Regression AnalysisIn multiple regression, the objective is to build a probabilistic model that relates a dependent variable y to more than one independent or predictor variable. Let k represent the number of predictor variables (k  2) and denote these predictors by x1, x 2,..., xk.For example, in attempting to predict the selling price of a house, we might have k = 3 with x1 = size (ft2), x 2 = age (years), and x3 = number of rooms.

Multiple Regression AnalysisDefinition

Multiple Regression AnalysisLet be particular values of x1,...,xk. Then (13.15) implies that Thus just as 0 + 1x describes the mean Y value as a function of x in simple linear regression, the true (or population) regression function  0 +  1 x 1 + . . . +  kxk gives the expected value of Y as a function of x1,..., xk.The i’s are the true (or population) regression coefficients. (13.16)

Multiple Regression AnalysisThe regression coefficient 1 is interpreted as the expected change in Y associated with a 1-unit increase in x1 while x2,..., xk are held fixed. Analogous interpretations hold for 2,..., k.

Models with Interaction and Quadratic Predictors

Models with Interaction and Quadratic PredictorsIf an investigator has obtained observations on y, x1, andx2, one possible model is Y = 0 + 1x1 + 2x2 + . However, other models can be constructed by formingpredictors that are mathematical functions of x1 and/or x 2 . For example, with and x 4 = x 1 x 2, the model Y = 0 + 1x1 + 2x2 + 3 x3 + 4x4 + has the general form of (13.15).

Models with Interaction and Quadratic PredictorsIn general, it is not only permissible for some predictors to be mathematical functions of others but also often highly desirable in the sense that the resulting model may be much more successful in explaining variation in y than any model without such predictors.This discussion also shows that polynomial regression isindeed a special case of multiple regression.For example, the quadratic model Y = 0 + 1x + 2x2 +  has the form of (13.15) with k = 2, x 1 = x , and x 2 = x2.

Models with Interaction and Quadratic PredictorsFor the case of two independent variables, x1 and x2,consider the following four derived models. The first-order model: Y =  0 + 1x1 + 2x2 +  The second-order no-interaction model: 3. The model with first-order predictors and interaction: Y =  0 +  1 x 1 + 2x2 + 3x 1x2 + 

Models with Interaction and Quadratic Predictors The complete second-order or full quadratic model: Understanding the differences among these models is animportant first step in building realistic regression models from the independent variables under study. The first-order model is the most straightforward generalization of simple linear regression.It states that for a fixed value of either variable, theexpected value of Y is a linear function of the other variable and that the expected change in Y associated with a unit increase in x 1 ( x 2 ) is 1(2) independent of the level of x2(x1).

Models with Interaction and Quadratic Predictors Thus if we graph the regression function as a function of x1for several different values of x2, we obtain as contours of the regression function a collection of parallel lines, aspictured in Figure 13.13(a). Contours of four different regression functions Figure 13.13 (a) E ( Y ) = –1 + .5 x 1 – x 2

Models with Interaction and Quadratic PredictorsThe function y = 0 + 1x1 + 2x2 specifies a plane in three-dimensional space; the first-order model says that each observed value of the dependent variable corresponds to a point which deviates vertically from this plane by a random amount .According to the second-order no-interaction model, if x 2 is fixed, the expected change in Y for a 1-unit increase in x 1 is

Models with Interaction and Quadratic Predictors Because this expected change does not depend on x2, thecontours of the regression function for different values of x2 are still parallel to one another. However, the dependence of the expected change on thevalue of x1 means that the contours are now curves rather than straight lines. This is pictured in Figure 13.13(b). Contours of four different regression functions Figure 13.13

Models with Interaction and Quadratic Predictors In this case, the regression surface is no longer a plane inthree-dimensional space but is instead a curved surface.The contours of the regression function for the first-orderinteraction model are nonparallel straight lines. This is because the expected change in Y when x1 is increased by 1 is

Models with Interaction and Quadratic PredictorsThis expected change depends on the value of x2, so eachcontour line must have a different slope, as in Figure 13.13(c). (c) E ( Y ) = –1 + .5 x 1 – x 2 + x 1 x2Contours of four different regression functions Figure 13.13

Models with Interaction and Quadratic PredictorsThe word interaction reflects the fact that an expectedchange in Y when one variable increases in value dependson the value of the other variable. Finally, for the complete second-order model, the expectedchange in Y when x2 is held fixed while x 1 is increased by 1 unit is  1 +  3 + 2 3x1 + 5x2, which is a function of both x1and x2.

Models with Interaction and Quadratic PredictorsThis implies that the contours of the regression function are both curved and not parallel to one another, asillustrated in Figure 13.13(d). Contours of four different regression functions Figure 13.13

Models with Interaction and Quadratic PredictorsSimilar considerations apply to models constructed from more than two independent variables. In general, the presence of interaction terms in the modelImplies that the expected change in Y depends not only onthe variable being increased or decreased but also on thevalues of some of the fixed variables.As in ANOVA, it is possible to have higher-way interaction terms (e.g., x 1 x 2 x 3 ), making model interpretation more difficult.

Models with Interaction and Quadratic PredictorsNote that if the model contains interaction or quadratic predictors, the generic interpretation of a i givenpreviously will not usually apply. This is because it is not then possible to increase xi by 1unit and hold the values of all other predictors fixed.

Models with Predictors for Categorical Variables

Models with Predictors for Categorical VariablesThus far we have explicitly considered the inclusion of only quantitative (numerical) predictor variables in a multipleregression model. Using simple numerical coding, qualitative (categorical) variables, such as bearing material (aluminum or copper/lead) or type of wood (pine, oak, or walnut), can also be incorporated into a model.

Models with Predictors for Categorical VariablesLet’s first focus on the case of a dichotomous variable, one with just two possible categories—male or female,U.S. or foreign manufacture, and so on. With any such variable, we associate a dummy orindicator variable x whose possible values 0 and 1indicate which category is relevant for any particularobservation.

Example 13.11 The article “Estimating Urban Travel Times: A Comparative Study” (Trans. Res., 1980: 173–175) described a study relating the dependent variable y = travel time between locations in a certain city and the independent variable x2 = distance between locations. Two types of vehicles, passenger cars and trucks, were used in the study. Let

Example 13.11 One possible multiple regression model is Y = 0 + 1x1 +  2x2 + The mean value of travel time depends on whether a vehicle is a car or a truck: mean time =  0 +  2 x 2 when x 1 = 0 (cars) mean time = 0 + 1 + 2x2 when x1 = 1 (trucks) cont’d

Example 13.11 The coefficient 1 is the difference in mean times between trucks and cars with distance held fixed; if 1 > 0, on average it will take trucks longer to traverse any particular distance than it will for cars. A second possibility is a model with an interactionpredictor: Y = 0 + 1x1 + 2x2 +  3 x 1x2 +  cont’d

Example 13.11 Now the mean times for the two types of vehicles are mean time = 0 + 2x2 when x1 = 0 mean time = 0 + 1 + (  2 + 3 ) x 2 when x 1 = 1cont’d

Example 13.11 For each model, the graph of the mean time versus distance is a straight line for either type of vehicle, as illustrated in Figure 13.14. Regression functions for models with one dummy variable ( x 1 ) and one quantitative variable x 2 Figure 13.14 (a) no interaction (b) interaction cont’d

Example 13.11 The two lines are parallel for the first (no-interaction)model, but in general they will have different slopes when the second model is correct. For this latter model, the change in mean travel time associated with a 1-mile increase in distance depends on which type of vehicle is involved—the two variables “vehicle type” and “travel time” interact. Indeed, data collected by the authors of the cited article suggested the presence of interaction. cont’d

Models with Predictors for Categorical VariablesYou might think that the way to handle a three-category situation is to define a single numerical variable with codedvalues such as 0, 1, and 2 corresponding to the three categories. This is incorrect, because it imposes an ordering on the categories that is not necessarily implied by the problemcontext. The correct approach to incorporating three categories is todefine two different dummy variables.

Models with Predictors for Categorical VariablesSuppose, for example, that y is the lifetime of a certaincutting tool, x1 is cutting speed, and that there are three brands of tool being investigated. Then letWhen an observation on a brand A tool is made, x 2 = 1 and x 3 = 0, whereas for a brand B tool, x2 = 0 and x 3 = 1.

Models with Predictors for Categorical VariablesAn observation made on a brand C tool has x 2 = x3 = 0, and it is not possible that x2 = x3 = 1 because a tool cannotsimultaneously be both brand A and brand B. The no-interaction model would have only the predictors x1,x 2, and x3. The following interaction model allows the mean change inlifetime associated with a 1-unit increase in speed todepend on the brand of tool: Y =  0 + 1 x 1 +  2 x 2 + 3x3 + 4x1x2 + 5x1x3 + 

Models with Predictors for Categorical VariablesConstruction of a picture like Figure 13.14 with a graph for each of the three possible( x2, x3) pairs gives three nonparallel lines (unless 4 = 5 = 0). Regression functions for models with one dummy variable ( x 1 ) and one quantitative variable x 2 Figure 13.14 (a) no interaction (b) interaction

Models with Predictors for Categorical VariablesMore generally, incorporating a categorical variable with c possible categories into a multiple regression model requires the use of c – 1 indicator variables (e.g., five brands of tools would necessitate using four indicator variables). Thus even one categorical variable can add many predictors to a model.

Estimating Parameters

Estimating ParametersThe data in simple linear regression consists of n pairs (x1, y1), . . . , (xn, yn). Suppose that a multiple regression model contains two predictor variables, x1 and x2 . Then the data set will consist of n triples (x11, x21, y1), (x12 , x 22 , y2), . . . , ( x 1 n , x 2 n , yn). Here the first subscript on x refers to the predictor and the second to the observation number. More generally, with k predictors, the data consists of n (k + 1) tuples (x11, x21,..., xk1, y1), ( x12, x22,..., xk2, y2), . . . , (x1n, x2 n ,. . . , xkn, yn), where xij is the value of the ith predictor xi associated with the observed value y j .

Estimating ParametersThe observations are assumed to have been obtained independently of one another according to the model (13.15). To estimate the parameters 0, 1,..., k using the principle of least squares, form the sum of squared deviations of the observed yj’s from a trial function y = b0 + b 1 x 1 + ... + b k x k : f (b0, b1,..., bk) = [yj – (b0 + b1x1j + b2x 2j + . . . + bkxkj )]2 The least squares estimates are those values of the b i ’s that minimize f ( b 0 ,..., b k ). (13.17)

Estimating ParametersTaking the partial derivative of f with respect to eachbi (i = 0,1, . . . , k) and equating all partials to zero yields the following system of normal equations:b0n + b 1x1j + b2x2j +. . . + bkxkj = y j (13.18)

Estimating ParametersThese equations are linear in the unknowns b0, b1,..., bk. Solving (13.18) yields the least squares estimates .This is best done by utilizing a statistical software package.

Example 13.12 The article “How to Optimize and Control the Wire Bonding Process: Part II” (Solid State Technology, Jan. 1991: 67–72) described an experiment carried out to assess the impact of the variables x1 = force (gm), x2 = power (mW), x3 = tempertaure ( C), and x4 = time (msec) on y = ball bond shear strength (gm).

Example 13.12 cont’d The following data was generated to be consistent with the information given in the article:

Example 13.12A statistical computer package gave the following least squares estimates: –37.48 .2117 .4983 .1297 .2583Thus we estimate that .1297 gm is the average change in strength associated with a 1-degree increase intemperature when the other three predictors are held fixed; the other estimated coefficients are interpreted in a similar manner. cont’d

Example 13.12The estimated regression equation isy = –37.48 + .2117x1 + .4983x2 + .1297x3 + .2583x4A point prediction of strength resulting from a force of 35 gm, power of 75 mW, temperature of 200° degrees, and time of 20 msec is = –37.48 + .2117(35) + .4983(75) + .1297(200) +.2583(20) = 38.41 gmThis is also a point estimate of the mean value of strength for the specified values of force, power, temperature, and time. cont’d

R 2 and  2 ^

R2 and  2Predicted or fitted values, residuals, and the various sums of squares are calculated as in simple linear andpolynomial regression.The predicted value results from substituting the values of the various predictors from the first observation into the estimated regression function: The remaining predicted values come from substituting values of the predictors from the 2nd, 3rd,..., and finally nth observations into the estimated function. ^

For example, the values of the 4 predictors for the last observation in Example 12 are x1,30 = 35, x 2,30 = 75, x3,30 = 200, x4,30 = 20, so = –37.48 + .2117(35) + .4983(75) + .1297(200) +.2583(20) = 38.41 The residuals are the differences between the observed and predicted values.The last residual in Example 13.12 is 40.3 – 38.41 = 1.89. R 2 and  2 ^

The closer the residuals are to 0, the better the job our estimated regression function is doing in making predictions corresponding to observations in the sample.Error or residual sum of squares is SSE = (yi – )2. It is again interpreted as a measure of how much variation in the observed y values is not explained by (not attributed to) the model relationship. The number of df associated with SSE is n – (k + 1) because k + 1 df are lost in estimating the k + 1coefficients. R 2 and  2 ^

Total sum of squares, a measure of total variation in the observed y values, is SST = ( yi – y)2. Regression sum of squares SSR = ( – y)2 = SST – SSE is a measure of explained variation. Then the coefficient of multiple determination R2 is R2 = 1 – SSE/SST = SSR/SST It is interpreted as the proportion of observed y variation that can be explained by the multiple regression model fit to the data. R 2 and  2 ^

R2 and  2Because there is no preliminary picture of multipleregression data analogous to a scatter plot for bivariate data, the coefficient of multiple determination is ourfirst indication of whether the chosen model is successful in explaining y variation. Unfortunately, there is a problem with R2: Its value can beinflated by adding lots of predictors into the model even ifmost of these predictors are rather frivolous. ^

R2 and  2For example, suppose y is the sale price of a house. Then sensible predictors includex1 = the interior size of the house,x2 = the size of the lot on which the house sits,x3 = the number of bedrooms,x4 = the number of bathrooms, andx5 = the house’s age. Now suppose we add in x 6 = the diameter of the doorknob on the coat closet, x 7 = the thickness of the cutting board in the kitchen, x 8 = the thickness of the patio slab, and so on. ^

R2 and  2Unless we are very unlucky in our choice of predictors, using n – 1 predictors (one fewer than the sample size) will yield R2 = 1.So the objective in multiple regression is not simply to explain most of the observed y variation, but to do so using a model with relatively few predictors that are easilyinterpreted.It is thus desirable to adjust R2, as was done in polynomialregression, to take account of the size of the model: ^

R2 and  2Because the ratio in front of SSE/SST exceeds 1, is smaller than R2. Furthermore, the larger the number of predictors k relative to the sample size n, the smaller will be relative to R2. Adjusted R2 can even be negative, whereas R2 itself must be between 0 and 1. A value of that is substantially smaller than R2 itself is a warning that the model maycontain too many predictors. The positive square root of R 2 is called the multiple correlation coefficient and is denoted by R . ^

R2 and  2It can be shown that R is the sample correlation coefficient calculated from the ( , yi) pairs (that is, use in place of xi in the formula for r from Section 12.5).SSE is also the basis for estimating the remaining model parameter: ^

Example 13.13 Investigators carried out a study to see how variouscharacteristics of concrete are influenced by x1 = % limestone powder and x2 = water-cement ratio, resulting in the accompanying data (“Durability of Concrete with Addition of Limestone Powder,” Magazine of Concrete Research, 1996: 131–137).

Example 13.13 Consider first compressive strength as the dependent variable y. Fitting the first order model results iny = 84.82 + .1643x1 – 79.67x2, SSE = 72.52 (df = 6), R2 = .741, = .654whereas including an interaction predictor givesy = 6.22 + 5.779x1 + 51.33x2 – 9.357 x 1 x2 SSE = 29.35 (df = 5) R 2 = .895 = .831 cont’d

Example 13.13 Based on this latter fit, a prediction for compressive strength when % limestone = 14 and water–cement ratio = .60 is = 6.22 + 5.779(14) + 51.33(.60) – 9.357(8.4) = 39.32Fitting the full quadratic relationship results in virtually no change in the R2 value.However, when the dependent variable is adsorbability, the following results are obtained: R2 = .747 when just two predictors are used, .802 when the interaction predictor is added, and .889 when the five predictors for the full quadratic relationship are used. cont’d

R2 and  2In general I ,can be interpreted as an estimate of the average change in Y associated with a 1-unit increase in xi while values of all other predictors are held fixed. Sometimes, though, it is difficult or even impossible to increase the value of one predictor while holding all others fixed. In such situations, there is an alternative interpretation of the estimated regression coefficients. For concreteness, suppose that k = 2, and let denote the estimate of  1 in the regression of y on the two predictors x 1 and x2. ^

R2 and  2Then1. Regress y against just x2 (a simple linear regression) and denote the resulting residuals by g1, g 2, . . . , gn. These residuals represent variation in y after removing or adjusting for the effects of x2. 2. Regress x1 against x 2 (that is, regard x1 as the dependent variable and x 2 as the independent variable in this simple linear regression), and denote the residuals by f 1, . . . , fn. These residuals represent variation in x1 after removing or adjusting for the effects of x2. ^

R2 and  2Now consider plotting the residuals from the first regression against those from the second; that is, plot the pairs(f1, g1),..., (fn, gn).The result is called a partial residual plot or adjusted residual plot. If a regression line is fit to the points in thisplot, the slope turns out to be exactly (furthermore, the residuals from this line are exactly the residuals e1,..., e n from the multiple regression of y on x1 and x 2 ). Thus can be interpreted as the estimated change in y associated with a 1-unit increase in x1 after removing or adjusting for the effects of any other model predictors. ^

R2 and  2The same interpretation holds for other estimated coefficients regardless of the number of predictors in the model (there is nothing special about k = 2; the foregoing argument remains valid if y is regressed against all predictors other than x1 in Step 1 and x1 is regressed against the other k – 1 predictors in Step 2).As an example, suppose that y is the sale price of an apartment building and that the predictors are number of apartments, age, lot size, number of parking spaces, and gross building area (ft2). It may not be reasonable to increase the number of apartments without also increasing gross area. ^

R2 and  2However, if = 16.00, then we estimate that a $16 increase in sale price is associated with each extra square foot of gross area after adjusting for the effects of the other four predictors. ^

A Model Utility Test

A Model Utility Test The absence of an informative of multivariate data and the aforementioned difficulty with provide compelling reasons for seeking a formal test of model utility.  

A Model Utility TestThe model utility test in simple linear regression involved the null hypothesis H 0: 1 = 0, according to which there is no useful relation between y and the single predictor x. Here we consider the assertion that 1 = 0, 2 = 0,...,  k = 0, which says that there is no useful relationship between y and any of the k predictors. If at least one of these  ’s is not 0, the corresponding predictor(s) is (are) useful. The test is based on a statistic that has a particular F distribution when H0 is true.

A Model Utility Test

A Model Utility TestExcept for a constant multiple, the test statistic here is R2/(1 – R2), the ratio of explained to unexplained variation. If the proportion of explained variation is high relative to unexplained, we would naturally want to reject H0 and confirm the utility of the model; this explains why the test is upper-tailed (only large values of f argue against H 0 ). However, if k is large relative to n , the factor [ n – ( k + 1)/k] will decrease f considerably.

Example 13.14 Returning to the bond shear strength data of Example 13.12, a model with k = 4 predictors was fit, so the relevant hypotheses are H0: 1 = 2 = 3 =  4 = 0 H a : at least one of these four  s is not 0 Figure 13.15 shows output from the JMP statistical package. Multiple regression output from JMP for the data of Example 13.14 Figure 13.15

Example 13.14 Multiple regression output from JMP for the data of Example 13.14 Figure 13.15 cont’d

Example 13.14 The values of s (Root Mean Square Error), R2, and adjusted R2 certainly suggest a useful model. The value of the model utility F ratio is cont’d

Example 13.14 This value also appears in the F Ratio column of the ANOVA table in Figure 13.15.The largest F critical value for 4 numerator and 25 denominator df is 6.49, which captures an upper-tail area of .001. Thus P-value < .001. The ANOVA table in the JMP output shows that P -value < .0001. This is a highly significant result. cont’d

Example 13.14 The null hypothesis should be rejected at any reasonable significance level. We conclude that there is a useful linear relationship between y and at least one of the four predictors in the model. This does not mean that all four predictors are useful;we will say more about this subsequently. cont’d

Inferences in Multiple Regression

Inferences in Multiple RegressionBefore testing hypotheses, constructing CIs, and making predictions, the adequacy of the model should be assessed and the impact of any unusual observations investigated. Methods for doing this are described at the end of the present section and in the next section.Because each is a linear function of the yi’s, the standard deviation of each is the product of  and a function of the xij’s. An estimate of this SD is obtained by substituting s for  .

Inferences in Multiple RegressionThe function of the xij ’s is quite complicated, but all standard statistical software packages compute and show the Inferences concerning a single  i are based on thestandardized variable which has a t distribution with n – ( k + 1) df.

Inferences in Multiple RegressionThe point estimate of , the expected value of Y when ,...., isThe estimated standard deviation of the corresponding estimator is again a complicated expression involving the sample xij’s.However, appropriate software will calculate it on request. Inferences about are based on standardizing its estimator to obtain a t variable having n – (k + 1) df.

Inferences in Multiple Regression

Example 13.15The article “Independent but Additive Effects of Fluorine and Nitrogen Substitution on Properties of a Calcium Aluminosilicate Glass” (J. of the Amer. Ceramic Soc., 2012: 600–606) used multiple regression analyses to investigate various properties of glasses in the Ca-Si-Al-O-N-F system.The following data on microhardness (GPa) resulted from various compositions of 28Ca:57:Si :15Al:(100 - x -y)O:xN:yF glasses:

Example 13.15 cont’d

Example 13.15The model fit by the investigators was . Figure 13.16 shows output from Minitab:  

Example 13.15 In addition, when N = 20 and F = 1,And estimated standard deviation of for these values of the predictors is The very high R 2 indicates that almost all of the observed variation in microhardness can be attributed to the model relationship and the fact that both nitrogen % and fluorine % are varying   cont’d

Example 13.15 And the F ratio of 456.28 with a corresponding P-value of .000 in the ANOVA table resoundingly confirms the utility of the fitted model. Inferences about the individual regression coefficients are based on the t distribution with 18 - (2 + 1) = 15 df (degrees of freedom for error in the ANOVA table), and = 2.131. A 95% CI for is   cont’d

Example 13.15We estimate that the expected change in microhardness associated with an increase of 1% in N while holding F fixed is between .0573 GPa and .0663 GPa. A similar calculation gives (2.0626, 2.0148) as a 95% CI for b2. The Bonferroni technique implies that the simultaneous confidence level for both intervals is at least 90%. The t ratio for testing is =.061823/.002099 = 29.46. The corresponding P -value is twice the area under the curve to the right of 29.46 , which according to Minitab output is .000 .  

Example 13.15Thus even with F remaining in the model, the predictor N provides additional useful information about microhardness. The evidence for testing versus is not quite so compelling ; Figure 13.16 shows the P- value to be .004. So at significance level .05 or .01, would be rejected; it appears that F also provides useful information over and above what is contained in N. 

Example 13.15There is no reason to delete either predictor from the model. Since neither 95% CI contains 0, it is no surprise that both null hypotheses are rejected at significance level .05.A 95% CI for true average hardness when N = 20 and F = 1 is

Example 13.15A 95% prediction interval for the hardness resulting from a single observation when N = 20 and F = 1 isThe PI is about three times as wide as the CI, reflecting the extra uncertainty in prediction.

Inferences in Multiple RegressionAn F Test for a Group of Predictors. The model utility F test was appropriate for testing whether there is useful information about the dependent variable in any of the k predictors (i.e., whether 1 = ... = k = 0).In many situations, one first builds a model containing k predictors and then wishes to know whether any of the predictors in a particular subset provide useful information about Y .

Inferences in Multiple RegressionFor example, a model to be used to predict students’ test scores might include a group of background variables such as family income and education levels and also some school characteristic variables such as class size and spending per pupil. One interesting hypothesis is that the school characteristic predictors can be dropped from the model.Let’s label the predictors as x1, x2,..., xl, x l+ 1 , ... , x k , so that it is the last k – l that we are considering deleting.

Inferences in Multiple RegressionThe relevant hypotheses are as follows:

Inferences in Multiple RegressionThe test is carried out by fitting both the full and reduced models. Because the full model contains not only the predictors of the reduced model but also some extra predictors, it should fit the data at least as well as the reduced model. That is, if we let SSEk be the sum of squared residuals forthe full model and SSEl be the corresponding sum for thereduced model, then SSEk  SSEl.

Inferences in Multiple RegressionIntuitively, if SSE k is a great deal smaller than SSEl, the full model provides a much better fit than the reduced model; the appropriate test statistic should then depend on the reduction SSEl – SSEk in unexplained variation.

Example 13.16Soluble dietary fiber (SDF) can provide health benefits by lowering blood cholesterol and glucose levels. The article “Effects of Twin-Screw Extrusion on Soluble Dietary Fiber and Physicochemical Properties of Soybean Residue” (Food Chemistry, 2013: 884–889) reported the following data on y 5 SDF content (%) in soybean residue and the three predictors extrusion temperature (x 1, in °C), feed moisture (x2, in %), and screw speed (x3, in rpm) of a twin-screw extrusion process.

Example 13.16

Example 13.16The authors of the cited article fit the complete second-order model with predictors and . Figure 13.17 shows Minitab output resulting from fitting this model.  

Example 13.16

Example 13.16Note that the .987, so almost all of the observed variation in y can be explained by the model relationship, and adjusted is only slightly smaller than itself . Furthermore, the F ratio for model utility is 59.93 with a corresponding P -value of .000 (the area under the curve to the right of 59.93 ). So the null hypothesis that all nine ’s corresponding to predictors have value 0 is resoundingly rejected. There appears to be a useful relationship between the dependent variable and at least one of the predictors.  

Example 13.16Is the inclusion of the second-order predictors justified? That is, should the reduced model consisting of just the predictors , , and (l = 3) be used? The hypotheses to be tested are versus  

Example 13.16SSE = .2152 for the full model (from the ANOVA table of Figure 13.17). Now we need SSE for the reduced model that contains only the three first-order predictors , , and .It is actually not necessary to fit this model because of the “ Sequential Sums of Squares” information at the bottom of Figure 13.17. Each number in the last column gives the i ncrease in SSR (explained variation) when another predictor is entered into the model.  

Example 13.16So SSR for the reduced model is 1.2561 + 3.4585 + .6555 = 5.3701. Subtracting this from SST = 16.7982 (which is the same for both models) gives SSE = 11.4281. The value of the F statistic is thenThe P-value is the area under the F3,7 curve to the right of this value, which unsurprisingly is 0.

Example 13.16 So the null hypothesis is resoundingly rejected. There is very convincing evidence for concluding that at least one of the second-order predictors is providing useful information over and above what is provided by the three first-order predictors. This conclusion makes intuitive sense because the full model leaves very little variation unexplained (SSE quite close to 0), whereas the reduced model has a rather substantial amount of unexplained variation relative to SST.

Example 13.16The t ratios of Figure 13.17 suggest that perhaps only the three quadratic predictors are useful and that the three interaction predictors can be eliminated. So let’s now consider testing against the alternative that at least one of these three s is not 0 . Again the sequential sums of squares information in Figure 13.17 allows us to obtain SSE for the reduced model (containing just the six predictors , , ,, , ) without actually fitting that model: SSR = the sum of the first six numbers in the Seq SS column = 16.2930, whence SSE = 16.7982 – 16.2930 = .5052.  

Example 13.16Then Table A.9 gives and , implying that the P-value is between .01 and .05. In particular, at a significance level of .01, the null hypothesis would not be rejected. The conclusion at that level is that none of the three interaction predictors provides additional useful information  

Assessing Model Adequacy

Assessing Model AdequacyThe standardized residuals in multiple regression result from dividing each residual by its estimated standard deviation; the formula for these standard deviations is substantially more complicated than in the case of simple linear regression. We recommend a normal probability plot of the standardized residuals as a basis for validating the normality assumption. If the pattern in this plot departs substantially from linearity, the t and F procedures developed in this section should not be used to make inferences.

Assessing Model AdequacyPlots of the standardized residuals versus each predictor and versus should show no discernible pattern. Adjusted residual plots can also be helpful in this endeavor. The book by Neter et al. is an extremely useful reference.

Example 13.17 Figure 13.18 shows a normal probability plot of the standardized residuals for the microhardness data and fitted model given in Example 13.15. The straightness of the plot casts little doubt on the assumption that the random deviation  is normally distributed.

Example 17Figure 13.19 shows the other suggested plots for the microhardness data(fewer than 18 points appear because various observed and calculated values are duplicated).Given the rather small sample size, there is not much evidence of a pattern in any of the first three plots other than randomness. cont’d

Example 13.17 cont’d