/
Cluster Analysis Grouping Cases or Variables Cluster Analysis Grouping Cases or Variables

Cluster Analysis Grouping Cases or Variables - PowerPoint Presentation

emery
emery . @emery
Follow
65 views
Uploaded On 2023-09-25

Cluster Analysis Grouping Cases or Variables - PPT Presentation

Clustering Cases Goal is to cluster cases into groups based on shared characteristics Start out with each case being a onecase cluster The clusters are located in kdimensional space where k ID: 1021078

cases cluster stage work cluster cases work stage solution salary case faculty publications variables clustered clusters number rank icicle

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Cluster Analysis Grouping Cases or Varia..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Cluster AnalysisGrouping Cases or Variables

2. Clustering CasesGoal is to cluster cases into groups based on shared characteristics.Start out with each case being a one-case cluster.The clusters are located in k-dimensional space, where k is the number of variables.Compute the squared Euclidian distance between each case and each other case.

3. Squared Euclidian Distancethe sum across variables (from i = 1 to v) of the squared difference between the score on variable i for the one case (Xi) and the score on variable i for the other case (Yi)

4. AgglomerateThe two cases closest to each other are agglomerated into a cluster.The distances between entities (clusters and cases) are recomputed.The two entities closest to each other are agglomerated.This continues until all cases end up in one cluster.

5. What is the Correct Solution?You may have theoretical reasons to expect a certain k cluster solution.Look at that solution and see if it matches your expectations.Alternatively, you may try to make sense out of solutions at two or more levels of the analysis.

6. Faculty SalariesSubjects were faculty in Psychology at ECU.Variables were rank, experience, number of publications, course load, and salary.Data are at ClusterAnonFaculty.savAlso see the statistical output

7. Analyze, Classify, Hierarchical Cluster

8. Statistics

9. Plots

10. Method

11. Save

12. Proximity MatrixWe did not request this, but if we had it would display a measure of dissimilarity for each pair of entities.The pair of cases with the smallest squared Euclidian distance are clustered.

13. Stage Cluster CombinedCoefficients Cluster 1 Cluster 2Cluster 113233.000Look at the Agglomeration Schedule.Cases 32 and 33 are clustered. They are very similar (distance = 0.000)

14. Agglomeration Schedule StageCluster CombinedCoefficientsStage Cluster First AppearsNext StageCluster 1Cluster 2Cluster 1Cluster 2Cluster 1Cluster 213233.00000924142.00000634344.00000643738.00000553739.00140764143.0022327Steps 2 Through 5

15. Stages 2-5The agglomeration schedule show that in Stage 2 cases 41 and 42 are clustered.In Stage 3 cases 43 and 44 are clustered.In Stage 4 cases 37 and 38 are clustered.In Stage 5 case 39 is added to the cluster that contains cases 37 and 38.And so on.

16. Vertical Icicle, Two ClustersLook at the top of the display (next slide).You can see two clustersOn the left Boris through WillyOn the right, Deanna through SunilaThe 2 cluster solution was adjuncts versus full time faculty.

17.

18. Vertical Icicle, Three ClustersLook at the icicle second highest white bar.Now there are three clustersAdjunctsJunior faculty (Deanna through Mickey)Senior faculty (Lawrence through Roslyn)

19.

20. Vertical Icicle, Four ClustersLook at the white bar furthest to the right.Now there are four clustersAdjunctsJunior faculty The acting chair (Lawrence)The rest of the senior faculty (Catalina through Roslyn)

21.

22. The DendogramAt the very bottom you can see the two cluster solution – adjuncts vs full time.The next step up shows the three cluster solution.The next step up shows the four cluster solution.And so on.Truncated and rotated dendogram on next slide.

23.

24. Compare Two ClustersThe 2 cluster solution was adjuncts versus everybody else.Look at the t tests in the outputAdjuncts had lower rank, experience, number of publications, course load, and salary.

25.

26. Compare Three ClustersThree way ANOVAs comparing the three clusters were significant for every variable.The senior faculty had higher salary, experience, rank, and number of pubs.

27.   N Mean Std. DeviationSalarySenior Faculty1080277.408018259.10829Others2451672.182510875.28739Adjuncts105956.40802101.01288FTESenior Faculty101.0000.00000Others241.0000.00000Adjuncts10.3750.13176RankSenior Faculty104.80.422Others243.00.885Adjuncts101.00.000

28.   N Mean Std. DeviationArticlesSenior Faculty1032.9017.483Others247.428.577Adjuncts101.904.771ExperienceSenior Faculty1026.805.534Others246.967.178Adjuncts104.7010.688

29. Compare Four ClustersThe acting chair had a higher salary and number of publications.

30. I Could Not Help MyselfWith these data on hand, I could not resist predicting salary from the other variables.Salary was well correlated with Rank, FTEs, Publications, and Experience.In the multiple regression, only Rank and FTEs had significant unique effects.The residuals suggest who was being overpaid and who underpaid.

31. Split by SexFor men, the unique effect of number of publications was positive – more publications, higher salary.For women it was negative – more publications, lower salary.Curious.

32. WorkaholismAziz & Zickar (2005)Workaholics may be defined as thoseHigh in work involvement,High in drive to work, andLow in work enjoyment.For each case, a score was obtained for each of these three dimensions.

33. The Three Cluster SolutionWorkaholicsHigh work involvementHigh drive to workLow work enjoymentPositively engaged workersHigh work involvementMedium drive to workHigh work enjoyment

34. Unengaged workersLow work involvementLow drive to workLow work enjoymentPast research/theory indicated there should be six clusters, but the theorized six clusters were not obtained.

35. Clustering VariablesFactBeer.savThe statistical output.Analyze, Classify, Hierarchical Cluster

36.

37. Statistics

38. Plots

39. Method

40. Proximity MatrixIs simply the intercorrelation matrixThe two most correlated variables are Color and Aroma (r = .909) – they are clustered on the first step.Stage 2: Size and Alcohol (r = .904) are clustered.Stage 3: Taste added to the cluster that already contains Color(r = .903) and Aroma (r = .87)

41. Stage 4: Cost added to the cluster that already contains Size (r = .832) and Alcohol (r = .767).Stage 5: The two clusters are combinedBut they are not very similar (similarity coefficient = .038)Now we have one cluster with six variables and one with one (Reputation)

42. My DecisionThe three cluster solution makes most sense to me.Cheap Drunk: Cost, Size, AlcoholAesthetic Quality: Aroma, Taste, ColorReputation