/
6-4 	Other Aspects of Regression 6-4 	Other Aspects of Regression

6-4 Other Aspects of Regression - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
349 views
Uploaded On 2018-11-22

6-4 Other Aspects of Regression - PPT Presentation

641 Polynomial Models 64 Other Aspects of Regression 641 Polynomial Models 64 Other Aspects of Regression 641 Polynomial Models Suppose that we wanted to test the contribution of the secondorder terms to this model In other words what is the value of expanding the model t ID: 732811

aspects regression data model regression aspects model data variable sales share potent advert title time proc variables accounts change selection reg quantity

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "6-4 Other Aspects of Regression" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

6-4 Other Aspects of Regression

6-4.1

Polynomial ModelsSlide2

6-4 Other Aspects of Regression

6-4.1

Polynomial ModelsSlide3

6-4 Other Aspects of Regression

6-4.1

Polynomial Models

Suppose that we wanted to test the contribution of the second-order terms to this model. In other words, what is the value of expanding the model to include the additional terms?

 Slide4

6-4 Other Aspects of Regression

6-4.1

Polynomial Models

= 46.72

Tabled F = f

0.05

= 2.44

(p-value < 0.0001

)

 

Full Model:

Reduced Model:

 

= 0

 Slide5

OPTIONS

NOOVP

NODATE NONUMBER;DATA ex69;INPUT YIELD TEMP RATIO;TEMPC=TEMP-1212.5; RATIOC=RATIO-12.444; TEMRATC=TEMPC*RATIOC; TEMPCSQ=TEMPC**2; RATIOCSQ=RATIOC**2;CARDS;49.0 1300 7.550.2 1300 9.050.5 1300 11.048.5 1300 13.547.5 1300 17.0

44.5 1300 23.028.0 1200 5.331.5 1200 7.534.5 1200 11.035.0 1200 13.538.0 1200 17.0

38.5 1200 23.0

15.0 1100 5.3

17.0 1100 7.5

20.5 1100 11.0

29.5 1100 17.0

PROC

REG

DATA

=EX69;

MODEL

YIELD= TEMPC RATIOC TEMRATC TEMPCSQ RATIOCSQ/VIF; TITLE 'QUADRATIC REGRESSION MODEL - FULL MODEL';PROC REG DATA=EX69; MODEL YIELD=TEMPC RATIOC/VIF; TITLE

'LINEAR REGRESSION MODEL - REDUCED MODEL'

;

RUN

;

QUIT

;

Example 6-9

6-4 Other Aspects of RegressionSlide6

6-4 Other Aspects of RegressionSlide7

6-4 Other Aspects of RegressionSlide8

6-4 Other Aspects of RegressionSlide9

6-4 Other Aspects of RegressionSlide10

6-4 Other Aspects of RegressionSlide11

6-4 Other Aspects of RegressionSlide12

Residual Plots

(b) The variance of the observations may by increasing with time or with the magnitude of

y

i

or

x

i

. Data transformation on the response

y

is often used to eliminate this problem (

,

).

(c) Plots of residuals against

and

x

i

also indicate inequality of variance.(d) Indicates model inadequacy; that is, higher-order terms should be added to the model, a transformation on the x

-variable or the

y

-variable (or both) should be considered, or other

regressors

should be considered (quadratic or exponential model)

 

6-4 Other Aspects of RegressionSlide13

OPTIONS

NOOVP

NODATE NONUMBER;DATA BIDS;INFILE 'C:\Users\korea\Desktop\Working Folder 2017\imen214-stats\ch06\data\bids.dat';INPUT PRICE QUANTITY BIDS;LOGPRICE=LOG(PRICE);RECPRICE=1/PRICE;QUANSQ=QUANTITY**2;ODS GRAPHICS ON

;proc sgplot; /* The SG stands for Statistical Graphics. */

scatter

x

= quantity

y=price;

TITLE

'Scatter Plot of PRICE vs. QUANTITY'

;

PROC

REG DATA=BIDS; MODEL PRICE= QUANTITY;TITLE 'LINEAR REGRESSION OF PRICE VS. QUANTITY';PROC REG DATA=BIDS; MODEL LOGPRICE= QUANTITY;

TITLE

'LINEAR REGRESSION OF LOGPRICE VS. QUANTITY'

;

PROC

REG

DATA

=BIDS;

MODEL RECPRICE= QUANTITY;TITLE 'LINEAR REGRESSION OF RECPRICE VS. QUANTITY';PROC REG DATA=BIDS; MODEL PRICE= QUANTITY QUANSQ;

TITLE 'QUADRATIC REGRESSION OF PRICE VS. QUANTITY';

RUN; ods graphics off; QUIT;Example6-4 Other Aspects of Regression153.3214

74.11

7.2

10

29.72

16.7

5

54.67

11.9

5

68.39

9.3

4

119.04

3.7

5

116.14

1.7

6

146.49

0.1

9

81.81

7.8

5

19.58

18.4

9

141.08

2.9

5

101.72

4.7

10

24.88

17.4

10

19.43

18.4

4

39.63

11.2

9

151.13

1.6

7

79.18

7.3

5

204.94

0.2

9

81.06

6.8

4

37.62

11.4

8

17.13

20

3

37.81

13.4

4

130.72

1.8

2

26.07

18.5

2

39.59

14.7

5

66.2

9.1

4Slide14

6-4 Other Aspects of RegressionSlide15

6-4 Other Aspects of RegressionSlide16

6-4 Other Aspects of RegressionSlide17

6-4 Other Aspects of RegressionSlide18

6-4 Other Aspects of RegressionSlide19

6-4 Other Aspects of RegressionSlide20

6-4 Other Aspects of RegressionSlide21

6-4 Other Aspects of RegressionSlide22

6-4 Other Aspects of RegressionSlide23

6-4 Other Aspects of RegressionSlide24

6-4 Other Aspects of RegressionSlide25

6-4 Other Aspects of RegressionSlide26

6-4 Other Aspects of RegressionSlide27

6-4 Other Aspects of RegressionSlide28

6-4 Other Aspects of RegressionSlide29

6-4 Other Aspects of Regression

HW 6-44Slide30

6-4 Other Aspects of Regression

6-4.2

Categorical Regressors

Many problems may involve

qualitative

or

categorical

variables.

The usual method for the different levels of a qualitative variable is to use

indicator

variables.

For example, to introduce the effect of two different operators into a regression model, we could define an indicator variable as follows:Slide31

Example 6-10

Y=gas mileage, x1=engine displacement, x

2=horse powerx3=0 if automatic transmission 1 if manual transmission

if automatic (x

3

=0), then

if manual (x

3

=1), then

+

)+

It is unreasonable because x

1

, x

2

effects to x

3

are not involved in the model

Interaction model:

if automatic (x

3

=0), then

if manual (x

3

=1), then

 

6-4 Other Aspects of RegressionSlide32

Dummy Variables

Many times a qualitative variable seems to be needed in a regression model. This can be accomplished by creating dummy variables or indicator variables.

If a qualitative variable has levels you will need dummy variables. Notice that in ANOVA if a treatment had levels it had degrees of freedom. The ith dummy variable is defined as

This can be done automatically in PROC GLM by using the CLASSS statement as we did in ANOVA. Any dummy variables defined with respect to a qualitative variable must be treated as a group. Individual t-tests are not meaningful. Partial F-tests must be performed on the group of dummy variables.

 

6-4 Other Aspects of RegressionSlide33

OPTIONS

NOOVP

NODATE NONUMBER;DATA EX611;INPUT FORM SCENT COLOR RESIDUE REGION QUALITY @@;IF REGION=1 THEN REGION1=0; ELSE REGION1=1; /* IF REGION=1이면 REGION1=0 이고

THEN EAST IF REGION=2이면 REGION1=1 이고 THEN WEST */

FR=FORM*REGION1; RR=RESIDUE*REGION1;

CARDS

;

6.3 5.3 4.8 3.1 1 91 4.4 4.9 3.5 3.9 1 87

3.9 5.3 4.8 4.7 1 82 5.1 4.2 3.1 3.6 1 83

5.6 5.1 5.5 5.1 1 83 4.6 4.7 5.1 4.1 1 84

4.8 4.8 4.8 3.3 1 90 6.5 4.5 4.3 5.2 1 84

8.7 4.3 3.9 2.9 1 97 8.3 3.9 4.7 3.9 1 93

5.1 4.3 4.5 3.6 1 82 3.3 5.4 4.3 3.6 1 84

5.9 5.7 7.2 4.1 2 87 7.7 6.6 6.7 5.6 2 80

7.1 4.4 5.8 4.1 2 84 5.5 5.6 5.6 4.4 2 84

6.3 5.4 4.8 4.6 2 82 4.3 5.5 5.5 4.1 2 79

4.6 4.1 4.3 3.1 2 81 3.4 5.0 3.4 3.4 2 83

6.4 5.4 6.6 4.8 2 81 5.5 5.3 5.3 3.8 2 844.7 4.1 5.0 3.7 2 83 4.1 4.0 4.1 4.0 2 80PROC REG DATA=EX611; MODEL QUALITY=FORM RESIDUE REGION1; TITLE 'MODEL WITH DUMMY VARIABLE';PROC REG

DATA

=EX611;

MODEL

QUALITY=FORM RESIDUE REGION1 FR RR;

TITLE

'INTERACTION MODEL WITH DUMMY VARIABLE'

;RUN; QUIT;Example 6-116-4 Other Aspects of RegressionSlide34

6-4 Other Aspects of RegressionSlide35

6-4 Other Aspects of RegressionSlide36

6-4 Other Aspects of RegressionSlide37

6-4 Other Aspects of RegressionSlide38

6-4 Other Aspects of RegressionSlide39

6-4 Other Aspects of RegressionSlide40

OPTIONS

NOOVP

NODATE NONUMBER;DATA appraise;INPUT price units age size parking area cond$ @@;IF COND='F‘ THEN COND1=1; ELSE COND1=0;IF COND='G‘ THEN COND2=1;

ELSE COND2=0;CARDS;90300 4 82 4635 0 4266 F 384000 20 13 17798 0 14391 G

157500 5 66 5913 0 6615 G 676200 26 64 7750 6 34144 E

165000 5 55 5150 0 6120 G 300000 10 65 12506 0 14552 G

108750 4 82 7160 0 3040 G 276538 11 23 5120 0 7881 G

420000 20 18 11745 20 12600 G 950000 62 71 21000 3 39448 G

560000 26 74 11221 0 30000 G 268000 13 56 7818 13 8088 F

290000 9 76 4900 0 11315 E 173200 6 21 5424 6 4461 G

323650 11 24 11834 8 9000 G 162500 5 19 5246 5 3828 G

353500 20 62 11223 2 13680 F 134400 4 70 5834 0 4680 E

187000 8 19 9075 0 7392 G 93600 4 82 6864 0 3840 F

110000 4 50 4510 0 3092 G 573200 14 10 11192 0 23704 E

79300 4 82 7425 0 3876 F 272000 5 82 7500 0 9542 E

ods

graphics on;PROC REG DATA=APPRAISE; MODEL PRICE=UNITS AGE AREA COND1 COND2/R;TITLE ‘REDUCED MODEL WITH DUMMY VARIABLE

';

RUN;

ods

graphics off; QUIT;

Example

6-3 Multiple RegressionSlide41

6-3 Multiple Regression

Full Model

0.9801

0.9746

34123

With

dummy

0.9892

0.9845

26682

Reduced Model

0.9771

0.9737

34721

With dummy

0.9860

0.9821

28673

Full Model

0.9801

0.9746

34123

With

dummy

0.9892

0.9845

26682

Reduced Model

0.9771

0.9737

34721

With dummy

0.9860

0.9821

28673Slide42

Example

6-3 Multiple RegressionSlide43

6-3 Multiple RegressionSlide44

6-3 Multiple RegressionSlide45

6-3 Multiple Regression

HW 6-39Slide46

6-4 Other Aspects of Regression

6-4.3

Variable Selection Procedures

Best Subsets Regressions

Selection Techniques

R

2

MSE

Cp

 Slide47

6-4 Other Aspects of Regression

6-4.3

Variable Selection Procedures

Backward Elimination

all

regressors

in the model

t-test: smallest absolute t-value eliminated first

Minitab

for cut-off

form, residue, region

 Slide48

6-4 Other Aspects of Regression

6-4.3

Variable Selection Procedures

Forward Selection

No

regressors

in the model

largest absolute t-value added first

Minitab

for cut-off

form, residue, region, scent

 Slide49

6-4 Other Aspects of Regression

6-4.3

Variable Selection Procedures

Stepwise Regression

begins with forward step, then backward elimination

t

in

=t

out

Minitab

for cut-off

form, residue, region

 Slide50

Example

6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA SALES;INFILE ‘C:\Users\korea\Desktop\Working Folder 2018\imen214-stats\ch06\data\sales.dat';INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING;

PROC CORR DATA

=SALES;

VAR

SALES;

WITH TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING;

TITLE

'CORRELATIONS OF DEPENDENT WITH INDENDENTS

'

;

PROC

CORR DATA=SALES; VAR TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS BETWEEN INDEPENDENT VARIABLES';PROC REG DATA

=SALES;

MODEL

SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/

VIF

R

;

TITLE

'REGRESSION MODEL WITH ALL VARIABLES';RUN; QUIT;Slide51

6-4 Other Aspects of RegressionSlide52

Example

6-4 Other Aspects of RegressionSlide53

6-4 Other Aspects of RegressionSlide54

6-4 Other Aspects of RegressionSlide55

6-4 Other Aspects of RegressionSlide56

6-4 Other Aspects of RegressionSlide57

6-4 Other Aspects of RegressionSlide58

6-4 Other Aspects of RegressionSlide59

All Possible Regressions

This is the brute force method of modeling. It is feasible if the number of independent variables is small (less than 10 or so) and the sample size is not too large. Some of the common quantities to look at are

R-square should be large. Should be adequately increase when an additional variable is added.Adj R-square should not be much less than R-square. It should show an increase if a variable is added.Mallows Cp should be approximately the number of parameters in the model (including the y-intercept). This is a good measure to use to narrow down the possible models quickly, then use 1) and 2) to pick the final models.The model should make sense. Note: Many of the better methods of model selection are to time consuming to use on all possible regressions. A number of good models can be chosen and then use better methods.6-4 Other Aspects of RegressionSlide60

Example

6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA SALES;INFILE ‘C:\Users\korea\Desktop\Working Folder 2018\imen214-stats\ch06\data\sales.dat';INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING;PROC

REG DATA=SALES CP; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING

/SELECTION=RSQUARE ADJRSQ

RMSE SSE SELECT=

10

;

TITLE

'ALL POSSIBLE REGRESSIONS

'

;

RUN

; QUIT;Slide61

6-4 Other Aspects of RegressionSlide62

6-4 Other Aspects of RegressionSlide63

6-4 Other Aspects of RegressionSlide64

6-4 Other Aspects of RegressionSlide65

Stepwise Regression

Forward Selection:

Begins with no variables in the model. Calculates simple linear model for each X and adds most significant. (if above stated p-value).Calculates all models with already added variables and each non-added variable. Most significant is added. (if above sated p-value)This process is continued until no variables can be added.Backward Elimination:Model with all variables is fit. Least significant variable is removed (if p-value is greater than specified limit) and the model is refit without this variable.This process is continued until no variables can be removed.6-4 Other Aspects of RegressionSlide66

Stepwise Regression

Stepwise Technique:This technique is a variation on the forward selection technique. After a variable is added, the least significant is also removed if it has a p-value greater than the specified limit. This accounts for

multicollinearity to some degree.Typically you do not do a stepwise procedure if you do an all possible regressions and vice versa. Stepwise procedures are more economical than all possible regressions in large data sets.There is no guarantee that the stepwise procedures will end up with the same model or the “best” model.6-4 Other Aspects of RegressionSlide67

Example

6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA SALES;INFILE ‘C:\Users\korea\Desktop\Working Folder 2018\imen214-stats\ch06\data\sales.dat';INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING;PROC

REG DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING /selection=FORWARD slentry=0.25;

/* SLENTRY specifies the significance level for entry into the model.

The defaults are 0.50 for FORWARD and 0.15 for STEPWISE. */

TITLE

'STEPWISE REGRESSION USING FORWARD SELECTION'

;

MODEL

SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING /selection=BACKWARD;

/* SLSTAY specifies the significance level for staying in the model.

The defaults are 0.10 for BACKWARD and 0.15 for STEPWISE. */

TITLE

'STEPWISE REGRESSION USING BACKWARD ELIMINATION'; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING /selection=stepwise; TITLE 'STEPWISE REGRESSION THE STEPWISE TECHNIQUE';RUN; QUIT;Slide68

6-4 Other Aspects of RegressionSlide69

6-4 Other Aspects of RegressionSlide70

6-4 Other Aspects of RegressionSlide71

6-4 Other Aspects of RegressionSlide72

6-4 Other Aspects of RegressionSlide73

6-4 Other Aspects of RegressionSlide74

6-4 Other Aspects of RegressionSlide75

6-4 Other Aspects of RegressionSlide76

6-4 Other Aspects of RegressionSlide77

6-4 Other Aspects of RegressionSlide78

6-4 Other Aspects of RegressionSlide79

6-4 Other Aspects of RegressionSlide80

6-4 Other Aspects of RegressionSlide81

6-4 Other Aspects of RegressionSlide82

6-4 Other Aspects of RegressionSlide83

6-4 Other Aspects of RegressionSlide84

6-4 Other Aspects of RegressionSlide85

Press Statistic

The main purpose of many regression analyses is to predict Y for a future set of X’s. The problem is that we have only present Y’s and X’s to use to make a model, but we would like to evaluate the model by how well it estimates Y’s with new X’s.

The Press Statistic tries to overcome this problem. It is similar to the DFFITS in that you remove one observation at a time. The parameters are then calculated and is calculated for the X’s of the observation that is removed. Once the ’s are calculated in this manner for each observation (call them ) the press statistic can be calculated.

Notice that this is very similar to SSE. It is very computation intensive, however. The Press Statistic is obtained in SAS by using the r option on the model statement

.

 

6-4 Other Aspects of RegressionSlide86

Validation Data Split

Split data into a fitting portion and a validation portion. This should be done randomly.

Perform the model fitting routine as discussed earlier using data in the fitting portion only.For each viable model compute the SSE using the observations in the validation data portion. The best model is the one that minimizes the SSE.Recalculate the chosen model using the entire data set.Notice this procedure requires a large enough data set to enable you to split a validation portion off and still have adequate data to evaluate models. The process is tedious in SAS, requiring multiple runs or fancy programming.6-4 Other Aspects of RegressionSlide87

Example

6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA SALES;INFILE ‘C:\Users\korea\Desktop\Working Folder 2018\imen214-stats\ch06\data\sales.dat';INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING;PROC

REG DATA=SALES; MODEL SALES=POTENT ADVERT SHARE ACCOUNTS/

R

;

MODEL

SALES=POTENT ADVERT SHARE CHANGE ACCOUNTS/

R

;

MODEL

SALES=TIME POTENT ADVERT SHARE CHANGE/

R

;

MODEL SALES=TIME POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE WORKLOAD/R;RUN; QUIT;Slide88

6-4 Other Aspects of RegressionSlide89

6-4 Other Aspects of RegressionSlide90

6-4 Other Aspects of RegressionSlide91

6-4 Other Aspects of RegressionSlide92

6-4 Other Aspects of RegressionSlide93

6-4 Other Aspects of RegressionSlide94

6-4 Other Aspects of RegressionSlide95

6-4 Other Aspects of RegressionSlide96

6-4 Other Aspects of RegressionSlide97

6-4 Other Aspects of RegressionSlide98

6-4 Other Aspects of RegressionSlide99

6-4 Other Aspects of RegressionSlide100

6-4 Other Aspects of RegressionSlide101

6-4 Other Aspects of RegressionSlide102

6-4 Other Aspects of RegressionSlide103

6-4 Other Aspects of RegressionSlide104

6-4 Other Aspects of RegressionSlide105

6-4 Other Aspects of RegressionSlide106

6-4 Other Aspects of RegressionSlide107

6-4 Other Aspects of RegressionSlide108

6-4 Other Aspects of RegressionSlide109

6-4 Other Aspects of RegressionSlide110

6-4 Other Aspects of RegressionSlide111

6-4 Other Aspects of RegressionSlide112

6-4 Other Aspects of RegressionSlide113

6-4 Other Aspects of Regression

Model

R2

Adj R2

MSE

PRESS

Potent advert share accounts

0.9004

0.8805

453.8362

5804450

Potent

advert share change accounts

0.9119

0.8888

437.9516

5470022

Time potent advert share change

0.9108

0.8873

440.7473

5681706

Time potent advert share accounts

0.9064

0.8817

451.6049

6339858

Time potent advert share change workload

0.9109

0.8812

452.6253

6286583

No one model could be used

confidence interval might be helpful to decide the best model or parsimony

HW 6-56Slide114