MODULE  Nearest Neighbour Classier and its variants LESSON  Nearest Neighbour Classier Keywords K Neighbours Weighted Nearest Neighbour  Nearest neighbour classiers This is amongst the simplest of al
176K - views

MODULE Nearest Neighbour Classier and its variants LESSON Nearest Neighbour Classier Keywords K Neighbours Weighted Nearest Neighbour Nearest neighbour classiers This is amongst the simplest of al

This is a method of classifying patterns based on the class la bel of the closest training patterns in the feature space The common algorithms used here are the nearest neighbourNN al gorithm the knearest neighbourkNN algorithm and the mod i64257ed

Tags : This method
Download Pdf

MODULE Nearest Neighbour Classier and its variants LESSON Nearest Neighbour Classier Keywords K Neighbours Weighted Nearest Neighbour Nearest neighbour classiers This is amongst the simplest of al




Download Pdf - The PPT/PDF document "MODULE Nearest Neighbour Classier and i..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "MODULE Nearest Neighbour Classier and its variants LESSON Nearest Neighbour Classier Keywords K Neighbours Weighted Nearest Neighbour Nearest neighbour classiers This is amongst the simplest of al"— Presentation transcript:


Page 1
MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11 Nearest Neighbour Classifier Keywords: K Neighbours, Weighted, Nearest Neighbour
Page 2
Nearest neighbour classifiers This is amongst the simplest of all classification algorithm s in super- vised learning. This is a method of classifying patterns based on the class la bel of the closest training patterns in the feature space. The common algorithms used here are the nearest neighbour(NN) al- gorithm, the k-nearest neighbour(kNN) algorithm, and the mod ified k-nearest

neighbour (mkNN) algorithm. These are non-parametric methods where no model is fitted usi ng the training patterns. The accuracy using nearest neighbour classifiers is good. It is guaran- teed to yield an error rate no worse than twice the Bayes error rate (explained in Module 10) which is the optimal error rate. There is no training time required for this classifier. In oth er words, there is no design time for training the classifier. Every time a test pattern is to be classified, it has to be compa red with all the training patterns, to find the closest

pattern. This classifica- tion time could be large if the training patterns are large in number or if the dimensionality of the patterns is high. Nearest neighbour algorithm If there are n patterns ,X ,...,X in the training data, and a test pattern P, If is the most similar pattern to P from X, then the class of P is the class of The similarity is usually measured by computing the distanc e from P to the patterns ,X ,...,X . If P,X ) is the distance from P to then P is the assigned the class label of where
Page 3
P,X ) = min P,X where i = 1 ... n k-Nearest Neighbour (kNN)

classification algorithm An object is classified by a majority vote of the class of its nei ghbours. The object is assigned to the class most common amongst its nearest neighbours. If k=1, this becomes the nearest neighbour algorithm. This algorithm may give a more correct classification for bou ndary patterns than the NN algorithm. The value of has to be specified by the user and the best choice depends on the data. Larger values of reduce the effect of noise on the classification. The value of can be arbitrary increased when the training data set is larg

in size. The value can be chosen by using a validation set and choosing the value giving best accuracy on the validation set. The main disadvantage of kNN algorithm is that it is very time c on- suming especially when the training data is large. To overcome this problem, a number of algorithms have been pr oposed to access the k nearest patterns as fast as possible. Modified k-Nearest Neighbour (mkNN) classifier The contribution of the neighbours to the classification is w eighted according to its distance from the test pattern. Hence, the nearest neighbour contributes more

to the classi cation de- cision than the neighbours further away.
Page 4
One weighting scheme would be to give each neighbour a weight of where d is the distance from P to the neighbour. Another weighting scheme finds the weight from the neighbour a if 1 if where i=1,...,k . The value of varies from 1 for the closest pattern to 0 for the farthest pattern among the k closest patterns. This modification would mean that outliers will not affect the classifi- cation as much as the kNN classifier. Example f1 f2 Figure 1: Two class problem Consider the two

class problem shown in Figure 1. There are fo ur pat- terns in Class 1 marked as X and there are five patterns in Clas s 2 marked as +. The test pattern is is P. Using the nearest neighbour al gorithm, the
Page 5
closest pattern to P is whose class is 1. Therefore P will be assigned to Class 1. If kNN algorithm is used, after , P is closest to and . So, if k=3, P will be assigned to Class 2. It can be seen that the value of k i s crucial to the classification. If k=1, it reduces to the NN classifier. In t his case, if k=4, the next closest pattern could be . If

k=5 and is closer to P than then again due to majority vote, P will be assigned to Class 1. This shows how important the value of k is to the classification. If P is an outlier of one class but is closest to a pattern of another class, by taking m ajority vote, the misclassification of P can be prevented. 5 8 10 f1 f2 Figure 2: Another two class problem For example, if we consider Figure 2 we can see that the test pa ttern P is closest to which belongs to Class 1 and therefore, it would be classified as belonging to Class 1 if NN classifier is used. is an outlier of

Class 1 and it can be seen that classifying P as belonging to Class 2 wo uld be more meaningful. If kNN algorithm is used with k=3, then P would be c lassified as belonging to Class 2. Using mkNN, the classification depends on the distances of the cl osest patterns from the test pattern. In the kNN algorithm, all the patterns
Page 6
will have equal importance. In mkNN, the closest pattern is giv en more significance than the farthest pattern. The weightage given to the class of the first closest pattern is more than for the second closest p attern and so on.

For example, if the 5 nearest neighbours to P are ,X ,X ,X and , where ,P ) = 1 ,d ,P ) = 2 ,d ,P ) = 2 ,d ,P ) = 4 and ,P ) = 5, and if and belong to Class 1 and ,X and belong to Class 2, then the weight given to Class1 by will be 11 = 1 The weight given to Class 1 by will be 14 = 0 25 The total weight of Class 1 will be = 1 0+0 25 = 1 25 The weight given to Class 2 by will be 22 = 0 75 The weight given to Class 2 by will be 23 = 0 625 The weight given to Class 2 by which is 25 will be 0 since it is the farthest of the 5 neighbours. The total weight of Class 2 will be 22 23 25 = 0 75+0 625+0

= 1 375 Since > w , P is classified as belonging to Class 2. If we consider Figure 1, the closest points to P are ,X ,X ,X and . If the distances from P to ,X ,X ,X and are 0.3, 1.0,1.1,1.5 and
Page 7
1.6, then calculating the weight given to the two classes The weight given to Class 1 by will be 11 = 1 The weight given to Class 1 by will be 12 = 0 077 The total weight of Class 1 will be 11 12 = 1+0 077 = 1 077 The weight given to Class 2 by will be 26 = 0 462 The weight given to Class 2 by will be 27 = 0 385 The weight given to Class 2 by which is 25 is 0, since is the

farthest of the 5 neighbours. Then the total weight for Class 2 will be 26 27 25 = 0 462+0 385+0 = 0 847 Since > w , P is classified as belonging to Class 1. One point to be noted here is that while kNN algorithm classifie s P as belonging to Class 2, mkNN algorithm classifies P as belonging to Class 1. It can therefore be seen that the classification decision using kNN and mkNN may vary.