/
Introduction to Statistics for the Social Sciences Introduction to Statistics for the Social Sciences

Introduction to Statistics for the Social Sciences - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
342 views
Uploaded On 2019-12-05

Introduction to Statistics for the Social Sciences - PPT Presentation

Introduction to Statistics for the Social Sciences SBS200 Lecture Section 001 Spring 2018 Room 150 Harvill Building 900 950 Mondays Wednesdays amp Fridays Welcome 41618 Lecturers desk ID: 769322

sales row regression variable row sales variable regression line variance proportion correlation actual increase error calls average deviation income

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to Statistics for the Socia..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Introduction to Statistics for the Social SciencesSBS200 - Lecture Section 001, Spring 2018Room 150 Harvill Building9:00 - 9:50 Mondays, Wednesdays & Fridays. Welcome 4/16/18

Lecturer’s desk Harvill 150 renumbered table Screen 19 5 13 3 Row A Row B Row C Row D Row E Row F Row H Row J Row K Row L Row M Row N Row P Row M Row E Row D Row C Row B Row A Row E Row C Row B Row A 20 6 22 8 23 9 25 24 23 22 21 20 19 18 17 16 15 14 13 12 27 28 24 18 4 15 1 15 1 Screen Projection Booth 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 15 14 23 20 21 25 Row D 23 29 23 31 Row F 10 9 8 7 6 5 4 3 2 11 26 35 34 33 32 31 30 29 28 27 Row G 25 24 23 22 21 20 19 18 17 16 15 14 13 12 Row G 10 9 8 7 6 5 4 3 2 11 26 35 34 33 32 31 30 29 28 27 Row F Row G Row H 12 11 10 9 8 7 6 5 4 3 2 28 37 36 35 34 33 32 31 30 29 Row H 13 26 25 24 23 22 21 20 19 18 17 16 15 14 Row J 13 12 11 10 9 8 7 6 5 4 3 2 14 27 26 25 24 23 22 21 20 19 18 17 16 15 29 41 40 39 38 37 36 35 34 33 32 31 30 Row K 13 12 11 10 9 8 7 6 5 4 3 2 28 14 27 26 25 24 23 22 21 20 19 18 17 16 15 29 41 40 39 38 37 36 35 34 33 32 31 30 Row L 9 8 7 6 5 4 3 2 10 23 22 21 20 19 18 17 16 15 14 13 12 11 25 33 32 31 30 29 28 27 26 Row M 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 19 18 17 16 15 14 13 12 11 10 9 8 7 21 20 19 18 17 16 15 14 13 12 11 10 9 22 21 20 19 18 17 16 15 14 13 12 11 10 17 16 15 14 13 12 11 10 9 8 7 6 5 14 13 12 11 10 9 8 7 6 5 4 3 2 14 13 12 11 10 9 8 7 6 5 4 3 2 3 2 1 2 1 12 11 10 9 8 7 6 5 4 30 29 28 27 26 25 24 28 27 26 25 24 24 23 22 7 6 5 4 3 2 6 5 4 3 2 4 3 2 3 2 1 1 1 1 1 1 1 1 1 1 4 5 7 8 22 21 Row L Left handed desk

Before our fourth and final exam (April 30th ) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions Schedule of readings

Lab sessions Project 4 Continues next week

Homework Review

the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05 The relationship between +0.92 positive strong up down 6.0857 55.286 y ' = 6.0857x + 55.286 207.43 85.71 .846231 or 84% 84% of the total variance of “weekly pay” is accounted for by “hours worked” For each additional hour worked, weekly pay will increase by $6.09

400 380 360 340 320 300 4 8 5 6 7 Number of Operators Wait Time 280

-.73The relationship between wait time and number of operators working is negative and moderate. This correlation is not significant , r(3) = 0.73; n.s . negative strong number of operators increase, wait time decreases 458 -18.5 y' = -18.5x + 458 365 seconds 328 seconds .53695 or 54% The proportion of total variance of wait time accounted for by number of operators is 54%. For each additional operator added, wait time will decrease by 18.5 seconds 0.878 No

39 3633 30 27 24 21 Median Income Percent of BAs 45 48 51 54 57 60 63 66

0.8875The relationship between median income and percent of residents with BA degree is strong and positive. This correlation is significant , r(8) = 0.89; p < 0.05. positive strong median income goes up so does percent of residents who have a BA degree 3.1819 25% of residents 35% of residents .78766 or 78% The proportion of total variance of % of BAs accounted for by median income is 78%. For each additional $1 in income, percent of BAs increases by .0005 Percent of residents with a BA degree 10 8 0.0005 y' = 0.0005x + 3.1819 0.632 Yes

30 2724 21 18 15 12 Median Income Crime Rate 45 48 51 54 57 60 63 66

-0.6293The relationship between crime rate and median income is negative and moderate. This correlation is not significant , r(8) = -0.63; n.s . negative moderate median income goes up, crime rate tends to go down 4662.5 2,417 thefts 1,418.5 thefts .396 or 40% The proportion of variance of thefts accounted for by median income is 40%. For each additional $1 in income, thefts go down by .0499 10 8 -0.0499 y' = -0.0499x + 4662.5 Crime Rate 0.632 No (0.6293 is not bigger than critical of 0.632)

Regression ExampleRory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least) productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.

Rory’s Regression: Predicting sales from number of visits (sales calls) Regression line (and equation) r = 0.71 b = 11.579 (slope ) a = 20.526 ( intercept) Predict using regression line (and regression equation) Slope: as sales calls increase by 1, sales should increase by 11.579 Describe relationship Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems Dependent Variable Independent Variable

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels How did Ava do? Ava sold 14.7 more than expected taking into account how many sales calls she made over performing Ava 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) 70-55.3=14.7

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels How did Jacob do? Jacob sold 23.684 fewer than expected taking into account how many sales calls he made under performing Ava -23.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Jacob 20-43.7=-23.7

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Madison Isabella Ava Emma Emily Jacob Joshua 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) -23.7 -6.8 7.9

Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) How do we find the average amount of error in our prediction The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions How would we find our “average residual”? Step 1: Find error for each value (just the residuals) Y – Y’ Step 2: Find average ∑(Y – Y’) 2 n - 2 √ Diallo is 0” Mike is -4” Hunter is -2 Preston is 2” Deviation scores N Σ x Sound familiar??

These would be helpful to know by heart – please memorize these formula Standard error of the estimate (line) =

Slope doesn’t give “variability” info Intercept doesn’t give “ variability” info Correlation “r” does give “ variability” info How well does the prediction line predict the predicted variable when using the predictor variable? Residuals do give “ variability” info Standard error of the estimate (line) What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines

14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Does the prediction line perfectly the predicted variable when using the predictor variable? The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions No, we are wrong sometimes… How can we estimate how much “error” we have? -23.7 Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

Regression Analysis – Least Squares PrincipleWhen we calculate the regression line we try to: minimize distance between predicted Ys and actual (data) Y points (length of green lines) remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

What is r2?r 2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable If mother’s and daughter’s heights are correlated with an r = .8, then what amount (proportion or percentage) of variance of mother’s height is accounted for by daughter’s height? Examples .64 because (.8) 2 = .64

What is r2?r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If mother’s and daughter’s heights are correlated with an r = .8, then what proportion of variance of mother’s height is not accounted for by daughter’s height? Examples .36 because (1.0 - .64) = .36 or 36% because 100% - 64% = 36%

What is r2?r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? Examples .25 because (.5) 2 = .25

What is r2?r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? Examples .75 because (1.0 - .25) = .75 or 75% because 100% - 25% = 75%

Some useful termsRegression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) Coefficient of correlation is name for “r” Coefficient of determination is name for “r 2 ” (remember it is always positive – no direction info) Standard error of the estimate is our measure of the variability of the dots around the regression line (average deviation of each data point from the regression line – like standard deviation)

Rory’s Regression: Predicting sales from number of visits (sales calls) Regression line (and equation) r = 0.71 b = 11.579 (slope ) a = 20.526 ( intercept) Predict using regression line (and regression equation) Slope: as sales calls increase by 1, sales should increase by 11.579 Describe relationship Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems Review Dependent Variable Independent Variable

Review

Summary Slope: as sales calls increase by one, 11.579 more systems should be sold Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems Review

Pop Quiz – First 5 Questions2. What is a residual? How would you find it? 1. What is regression used for? Include and example 3. What is Standard Error of the Estimate (How is it related to residuals?) 4. Give one fact about r 2 5. How is regression line like a mean?

Pop Quiz – First 5 Questions1. What is regression used for? Include and example Regressions are used to take advantage of relationships between variables described in correlations. We choose a value on the independent variable (on x axis) to predict values for the dependent variable (on y axis).

Pop Quiz – First 5 Questions2. What is a residual? How would you find it? Residuals are the difference between our predicted y (y’) and the actual y data points. Once we choose a value on our independent variable and predict a value for our dependent variable, we look to see how close our prediction was. We are measuring how “wrong” we were, or the amount of “error” for that guess. Y – Y’

Pop Quiz – First 5 Questions3. What is Standard Error of the Estimate (How is it related to residuals?) The average length of the residuals The average error of our guess The average length of the green lines The standard deviation of the regression line

Pop Quiz – First 5 Questions4. Give one fact about r 2 5. How is regression line like a mean? Calculate it by squaring “r” It is the proportion of variance accounted for The regression line attempts to hit ever mean for each subgroup

Thank you!See you next time!!