/
Advanced Classification techniques Advanced Classification techniques

Advanced Classification techniques - PowerPoint Presentation

queenie
queenie . @queenie
Follow
66 views
Uploaded On 2023-10-29

Advanced Classification techniques - PPT Presentation

David Kauchak CS 159 Fall 2014 Admin Quiz 3 m ean 2525 87 m edian 26 90 Assignment 5 graded ML lab next Tue there will be candy to be won Admin Project proposal tonight at 11 ID: 1026688

margin data hyperplane line data margin line hyperplane model support label distribution generating distance linear vectors nearest defines origin

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Advanced Classification techniques" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Advanced Classification techniquesDavid KauchakCS 159 – Fall 2014

2. AdminQuiz #3mean: 25.25 (87%)median: 26 (90%)Assignment 5 gradedML lab next Tue (there will be candy to be won )

3. AdminProject proposal: tonight at 11:59pmAssignment 7: Friday at 5pmSee my e-mail (Wednesday)Both p(*|positive) and p(*|negative) should use exactly the same set of featuresspecifically, all the words that were seen during training (with either label)this is one of the main reasons we need smoothing!)

4. Machine Learning: A Geometric View

5. Apples vs. BananasWeightColorLabel4RedApple5YellowApple6YellowBanana3RedApple7YellowBanana8YellowBanana6YellowAppleCan we visualize this data?

6. Apples vs. BananasWeightColorLabel40Apple51Apple61Banana30Apple71Banana81Banana61AppleTurn features into numerical valuesWeightColor01001BAABABAWe can view examples as points in an n-dimensional space where n is the number of features called the feature space

7. Examples in a feature spacelabel 1label 2label 3feature1feature2

8. Test example: what class?label 1label 2label 3feature1feature2

9. Test example: what class?label 1label 2label 3feature1feature2closest to red

10. Another classification algorithm?To classify an example d:Label d with the label of the closest example to d in the training set

11. What about this example?label 1label 2label 3feature1feature2

12. What about this example?label 1label 2label 3feature1feature2closest to red, but…

13. What about this example?label 1label 2label 3feature1feature2Most of the next closest are blue

14. k-Nearest Neighbor (k-NN)To classify an example d:Find k nearest neighbors of dChoose as the label the majority label within the k nearest neighbors

15. k-Nearest Neighbor (k-NN)To classify an example d:Find k nearest neighbors of dChoose as the label the majority label within the k nearest neighborsHow do we measure “nearest”?

16. Euclidean distanceEuclidean distance! (or L1 or …)(a1, a2, …, an)(b1, b2,…,bn)

17. Decision boundarieslabel 1label 2label 3The decision boundaries are places in the features space where the classification of a point/example changesWhere are the decision boundaries for k-NN?

18. k-NN decision boundariesk-NN gives locally defined decision boundaries between classeslabel 1label 2label 3

19. K Nearest Neighbour (kNN) ClassifierK = 1What is the decision boundary for k-NN for this one?

20. K Nearest Neighbour (kNN) ClassifierK = 1

21. Machine learning modelsSome machine learning approaches make strong assumptions about the data If the assumptions are true this can often lead to better performanceIf the assumptions aren’t true, they can fail miserablyOther approaches don’t make many assumptions about the dataThis can allow us to learn from more varied dataBut, they are more prone to overfittingand generally require more training data

22. What is the data generating distribution?

23. What is the data generating distribution?

24. What is the data generating distribution?

25. What is the data generating distribution?

26. What is the data generating distribution?

27. What is the data generating distribution?

28. Actual model

29. Model assumptionsIf you don’t have strong assumptions about the model, it can take you a longer to learnAssume now that our model of the blue class is two circles

30. What is the data generating distribution?

31. What is the data generating distribution?

32. What is the data generating distribution?

33. What is the data generating distribution?

34. What is the data generating distribution?

35. Actual model

36. What is the data generating distribution?Knowing the model beforehand can drastically improve the learning and the number of examples required

37. What is the data generating distribution?

38. Make sure your assumption is correct, though!

39. Machine learning modelsWhat were the model assumptions (if any) that k-NN and NB made about the data?Are there training data sets that could never be learned correctly by these algorithms?

40. k-NN modelK = 1

41. Linear modelsA strong assumption is linear separability:in 2 dimensions, you can separate labels/classes by a linein higher dimensions, need hyperplanesA linear model is a model that assumes the data is linearly separable

42. HyperplanesA hyperplane is line/plane in a high dimensional spaceWhat defines a line?What defines a hyperplane?

43. Defining a lineAny pair of values (w1,w2) defines a line through the origin: f1f2

44. Defining a lineAny pair of values (w1,w2) defines a line through the origin: f1f2What does this line look like?

45. Defining a lineAny pair of values (w1,w2) defines a line through the origin: -2-101210.50-0.5-1f1f2

46. Defining a lineAny pair of values (w1,w2) defines a line through the origin: -2-101210.50-0.5-1f1f2

47. Defining a lineAny pair of values (w1,w2) defines a line through the origin: We can also view it as the line perpendicular to the weight vectorw=(1,2)(1,2)f1f2

48. Classifying with a linew=(1,2)Mathematically, how can we classify points based on a line?BLUERED(1,1)(1,-1)f1f2

49. Classifying with a linew=(1,2)Mathematically, how can we classify points based on a line?BLUERED(1,1)(1,-1)(1,1):(1,-1):The sign indicates which side of the linef1f2

50. Defining a lineAny pair of values (w1,w2) defines a line through the origin: How do we move the line off of the origin?f1f2

51. Defining a lineAny pair of values (w1,w2) defines a line through the origin: -2-1012f1f2

52. Defining a lineAny pair of values (w1,w2) defines a line through the origin: -2-10120.50-0.5-1-1.5f1f2Now intersects at -1

53. Linear modelsA linear model in n-dimensional space (i.e. n features) is define by n+1 weights:In two dimensions, a line:In three dimensions, a plane:In n-dimensions, a hyperplane(where b = -a)

54. Classifying with a linear modelWe can classify with a linear model by checking the sign:Negative examplePositive exampleclassifierf1, f2, …, fm

55. Learning a linear modelGeometrically, we know what a linear model representsGiven a linear model (i.e. a set of weights and b) we can classify examplesTrainingData(data with labels)learnHow do we learn a linear model?

56. Which hyperplane would you choose?

57. Large margin classifiersChoose the line where the distance to the nearest point(s) is as large as possiblemarginmargin

58. Large margin classifiersThe margin of a classifier is the distance to the closest points of either classLarge margin classifiers attempt to maximize thismarginmargin

59. Large margin classifier setupSelect the hyperplane with the largest margin where the points are classified correctly!Setup as a constrained optimization problem:subject to:what does this say?yi: label for example i, either 1 (positive) or -1 (negative)xi: our feature vector for example i

60. Large margin classifier setupsubject to:subject to:Are these equivalent?

61. Large margin classifier setupsubject to:subject to:w=(0.5,1)w=1,2)w=(2,4)…

62. Large margin classifier setupsubject to:We’ll assume c =1, however, any c > 0 works

63. Measuring the marginHow do we calculate the margin?

64. Support vectorsFor any separating hyperplane, there exist some set of “closest points”These are called the support vectors

65. Measuring the marginThe margin is the distance to the support vectors, i.e. the “closest points”, on either side of the hyperplane

66. Distance from the hyperplanew=(1,2)f1f2(-1,-2)How far away is this point from the hyperplane?

67. Distance from the hyperplanef1f2(-1,-2)How far away is this point from the hyperplane?w=(1,2)

68. Distance from the hyperplanef1f2(1,1)How far away is this point from the hyperplane?w=(1,2)

69. Distance from the hyperplanef1f2(1,1)How far away is this point from the hyperplane?w=(1,2)length normalized weight vectors

70. Distance from the hyperplanef1f2(1,1)How far away is this point from the hyperplane?w=(1,2)

71. Distance from the hyperplanef1f2(1,1)Why length normalized?w=(1,2)length normalized weight vectors

72. Distance from the hyperplanef1f2(1,1)Why length normalized?w=(2,4)length normalized weight vectors

73. Distance from the hyperplanef1f2(1,1)Why length normalized?w=(0.5,1)length normalized weight vectors

74. Measuring the marginmarginThought experiment:Someone gives you the optimal support vectorsWhere is the max margin hyperplane?

75. Measuring the marginMargin = (d+-d-)/2 marginMax margin hyperplane is halfway in between the positive support vectors and the negative support vectorsWhy?

76. Measuring the marginMargin = (d+-d-)/2 marginMax margin hyperplane is halfway in between the positive support vectors and the negative support vectorsAll support vectors are the same distanceTo maximize, hyperplane should be directly in between

77. Measuring the marginMargin = (d+-d-)/2 What is wx+b for support vectors?Hint:subject to:

78. Measuring the marginsubject to:The support vectors have Otherwise, we could make the margin larger!

79. Measuring the marginMargin = (d+-d-)/2 negative example

80. Maximizing the marginsubject to:Maximizing the margin is equivalent to minimizing ||w||! (subject to the separating constraints)

81. Maximizing the marginsubject to:Maximizing the margin is equivalent to minimizing ||w||! (subject to the separating constraints)

82. Maximizing the marginsubject to:The constraints:make sure the data is separableencourages w to be larger (once the data is separable)The minimization criterion wants w to be as small as possible

83. Maximizing the margin: the real problemsubject to:What’s the difference?

84. Maximizing the margin: the real problemsubject to:Why the squared?

85. Maximizing the margin: the real problemsubject to:subject to:Minimizing ||w|| is equivalent to minimizing ||w||2The sum of the squared weights is a convex function!

86. Support vector machine problemsubject to:This is a version of a quadratic optimization problemMaximize/minimize a quadratic functionSubject to a set of linear constraintsMany, many variants of solving this problem (we’ll see one in a bit)

87. Support vector machinesOne of the most successful (if not the most successful) classification approach:Support vector machinek nearest neighbordecision treeNaïve Bayes

88. Trends over time

89. Other successful classifiers in NLPPerceptron algorithmLinear classifierTrains “online”Fast and easy to implementOften used for tuning parameters (not necessarily for classifying)Logistic regression classifier (aka Maximum entropy classifier)Probabilistic classifierDoesn’t have the NB constraintsPerforms very wellMore computationally intensive to train than NB

90. ResourcesSVMSVM light: http://svmlight.joachims.org/Others, but this one is awesome!Maximum Entropy classifierhttp://nlp.stanford.edu/software/classifier.shtmlGeneral ML frameworks:Python: scikit-learn, MLpyJava: Weka (http://www.cs.waikato.ac.nz/ml/weka/)Many others…