/
Lecture  6 :  Multiple  and Poly Linear Regression Lecture  6 :  Multiple  and Poly Linear Regression

Lecture 6 : Multiple and Poly Linear Regression - PowerPoint Presentation

luna
luna . @luna
Follow
65 views
Uploaded On 2023-10-31

Lecture 6 : Multiple and Poly Linear Regression - PPT Presentation

1 2 Office Hours More office hours schedule will be posted soon Online office hours are for everyone please take advantage of them Projects Project guidelines and project descriptions will be posted Thursday 925 ID: 1027392

linear regression confidence polynomial regression linear polynomial confidence predictors model data models intervals lecture distribution predictor values multi cont

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lecture 6 : Multiple and Poly Linear ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Lecture 6: Multiple and Poly Linear Regression1

2. 2Office Hours: More office hours, schedule will be posted soon. On-line office hours are for everyone, please take advantage of them. Projects: Project guidelines and project descriptions will be posted Thursday 9/25. Milestone-1: Signup for project is Wed 10/2 . ANNOUNCEMENTS

3. Summary from last lectureWe assume a simple form of the statistical model  3

4. Summary from last lectureWe fit the model, i.e. estimatethat minimize the loss function, which we assume to be the MSE:  4

5. Summary from last lectureWe acknowledge that because there are errors in measurements and a limited sample, there is an inherent uncertainty in the estimation of We used bootstrap to estimate the distributions of  52

6. Summary from last lecture6We calculate the confidence intervals, which are the ranges of values such that the true value of is contained in this interval with n percent probability. 68%95%

7. Summary from last lecture7We evaluate the importance of predictors using hypothesis testing, using the t-statistics and p-values.  2

8. Summary from last lectureModel Fitness How does the model perform predicting? Comparison of Two Models How do we choose from two different models?Evaluating Significance of Predictors Does the outcome depend on the predictors?How well do we know The confidence intervals of our  8This lecture

9. SummaryHow well do we know The confidence intervals of our Multi-linear RegressionFormulate it in Linear AlgebraCategorical VariablesInteraction terms Polynomial Regression Linear Algebra Formulation 9

10. SummaryHow well do we know The confidence intervals of our Multi-linear RegressionFormulate it in Linear AlgebraCategorical VariablesInteraction terms Polynomial Regression Linear Algebra Formulation 10

11. How well do we know ?  11Our confidence in is directly connected with the confidence in s. So for each bootstrap sample, we have one which we can use to predict y for all x’s.  

12. How well do we know ?  12Here we show two difference set of models given the fitted coefficients.

13. How well do we know ?  13There is one such regression line for every bootstrapped sample.

14. How well do we know ?  14Below we show all regression lines for a thousand of such bootstrapped samples. For a given , we examine the distribution of , and determine the mean and standard deviation.  

15. How well do we know ?  15Below we show all regression lines for a thousand of such sub-samples. For a given , we examine the distribution of , and determine the mean and standard deviation.  

16. How well do we know ?  16Below we show all regression lines for a thousand of such sub-samples. For a given , we examine the distribution of , and determine the mean and standard deviation.  

17. How well do we know ?  17For every , we calculate the mean of the models, (shown with dotted line) and the 95% CI of those models (shaded area). Estimated  

18. Confidence in predicting  18

19. Confidence in predicting  19 for a given x, we have a distribution of models for each of these the prediction for  

20. Confidence in predicting  20 for a given x, we have a distribution of models for each of these the prediction for The prediction confidence intervals are then  

21. Lecture Outline 21How well do we know The confidence intervals of our Multi-linear RegressionBrute ForceExact methodGradient Descent Polynomial Regression  

22. Multiple Linear Regression If you have to guess someone's height, would you rather be toldTheir weight, onlyTheir weight and genderTheir weight, gender, and incomeTheir weight, gender, income, and favorite numberOf course, you'd always want as much data about a person as possible. Even though height and favorite number may not be strongly related, at worst you could just ignore the information on favorite number. We want our models to be able to take in lots of data as they make their predictions.22

23. Response vs. Predictor VariablesTVradionewspapersales230.137.869.222.144.539.345.110.417.245.969.39.3151.541.358.518.5180.810.858.412.923Youtcomeresponse variabledependent variableXpredictorsfeaturescovariates p predictorsn observations

24. Multilinear ModelsIn practice, it is unlikely that any response variable Y depends solely on one predictor x. Rather, we expect that is a function of multiple predictors . Using the notation we introduced last lecture, , andIn this case, we can still assume a simple form for -a multilinear form:Hence, , has the form  24

25. Multiple Linear RegressionAgain, to fit this model means to compute or to minimize a loss function; we will again choose the MSE as our loss function. Given a set of observations, the data and the model can be expressed in vector notation,  25

26. For our data In linear algebra notation  Multilinear Model, example 26  

27. Multiple Linear RegressionThe model takes a simple algebraic form:Thus, the MSE can be expressed in vector notation asMinimizing the MSE using vector calculus yields, 27

28. As with the simple linear regression, he standard errors can be calculated either using statistical modeling Or bootstrap  Standard Errors for Multiple Linear Regression 28

29. CollinearityCollinearity refers to the case in which two or more predictors are correlated (related). We will re-visit collinearity in the next lecture when we address overfitting, but for now we want to examine how does collinearity affects our confidence on the coefficients and consequently on the importance of those coefficients. 29

30. CollinearityThree individual models30Coef.Std.Err.tP>|t|[0.0250.975]11.550.57620.0361.628e-4910.41412.6880.0740.0145.1346.734e-070.04560.102Coef.Std.Err.tP>|t|[0.0250.975]6.6790.47813.9572.804e-315.7357.6220.0480.002717.3031.802e-410.0420.053Coef.Std.Err.tP>|t|[0.0250.975]9.5670.55317.2792.133e-418.47510.6590.1950.0209.4291.134e-170.1540.236Coef.Std.Err.tP>|t|[0.0250.975]2.6020.3327.8203.176e-131.9453.2580.0460.001529.8876.314e-750.0430.0490.1750.009418.5764.297e-450.1560.1940.0130.0282.3380.02030.0080.035Coef.Std.Err.tP>|t|[0.0250.975]2.6020.3327.8203.176e-131.9453.2580.0460.001529.8876.314e-750.0430.0490.1750.009418.5764.297e-450.1560.1940.0130.0282.3380.02030.0080.035One modelTVRADIONEWS

31. Finding Significant Predictors: Hypothesis TestingFor checking the significance of linear regression coefficients:we set up our hypotheses : we choose the F-stat to evaluate the null hypothesis,  31

32. Finding Significant Predictors: Hypothesis Testingwe can compute the F-stat for linear regression models byIf we consider this evidence for ; if , we consider this evidence against .  3222

33. Qualitative PredictorsSo far, we have assumed that all variables are quantitative. But in practice, often some predictors are qualitative. Example: The Credit data set contains information about balance, age, cards, education, income, limit , and rating for a number of potential customers.33IncomeLimitRatingCardsAgeEducationGenderStudentMarriedEthnicityBalance14.890360628323411 MaleNoYesCaucasian333106.02664548338215FemaleYesYesAsian903104.59707551447111 MaleNoNoAsian580148.92950468133611FemaleNoNoAsian96455.882489735726816 MaleNoYesCaucasian331

34. Qualitative PredictorsIf the predictor takes only two values, then we create an indicator or dummy variable that takes on two possible numerical values.For example for the gender, we create a new variable:We then use this variable as a predictor in the regression equation. 34

35. Qualitative PredictorsQuestion: What is interpretation of and ?  35

36. Qualitative PredictorsQuestion: What is interpretation of and ? is the average credit card balance among males, is the average credit card balance among females, and the average difference in credit card balance between females and males.Example: Calculate and for the Credit data. You should find  36

37. More than two levels: One hot encodingOften, the qualitative predictor takes more than two values (e.g. ethnicity in the credit data). In this situation, a single dummy variable cannot represent all possible values. We create additional dummy variable as: 37

38. More than two levels: One hot encodingWe then use these variables as predictors, the regression equation becomes:Question: What is the interpretation of , , ?  38

39. Beyond linearityIn the Advertising data, we assumed that the effect on sales of increasing one advertising medium is independent of the amount spent on the other media. If we assume linear model then the average effect on sales of a one-unit increase in TV is always , regardless of the amount spent on radio.Synergy effect or interaction effect states that when an increase on the radio budget affects the effectiveness of the TV spending on sales.  39

40. Beyond linearityWe changeTo 40

41. What does it mean?41  

42. Predictors predictors predictorsWe have a lot predictors! Is it a problem? Yes: Computational Cost Yes: Overfitting Wait there is more …42

43. 43

44. ResidualsWe started with We assumed the exact form of to be, then estimated the What if that is not correct? Instead: But we model it asThen the residual  44

45. ResidualsResidual AnalysisWhen we estimated the variance of ϵ, we assumed that the residuals were uncorrelated and normally distributed with mean 0 and fixed variance. These assumptions need to be verified using the data. In residual analysis, we typically create two types of plots: a plot ofwith respect to or . This allows us to compare the distribution of the noise at different values of . a histogram of . This allows us to explore the distribution of the noise independent of or . 45

46. Residual Analysis 46

47. Lecture Outline 47How well do we know The confidence intervals of our Multi-linear RegressionBrute ForceExact methodGradient Descent Polynomial Regression  

48. Polynomial Regression48

49. Polynomial RegressionThe simplest non-linear model we can consider, for a response Y and a predictor X, is a polynomial model of degree M,Just as in the case of linear regression with cross terms, polynomial regression is a special case of linear regression - we treat each as a separate predictor. Thus, we can write 49

50. Polynomial RegressionAgain, minimizing the MSE using vector calculus yields,50

51. Polynomial Regression (cont)51

52. Polynomial Regression (cont)52

53. Polynomial Regression (cont)53

54. Polynomial Regression (cont)54

55. Polynomial Regression (cont)55

56. Polynomial Regression (cont)56

57. OverfittingIn statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably”More on this on Wednesday 57

58. SummaryHow well do we know The confidence intervals of our Multi-linear RegressionFormulate it in Linear AlgebraCategorical VariablesInteraction terms Polynomial Regression Linear Algebra Formulation 58

59. Afternoon ExercisesQuiz - to be completed in the next 10 min: Sway: Lecture 6: Multi and poly RegressionProgrammatic – to be completed by lab time tomorrow: Lessons: Lecture 6: 59