This is a method of classifying patterns based on the class la bel of the closest training patterns in the feature space The common algorithms used here are the nearest neighbourNN al gorithm the knearest neighbourkNN algorithm and the mod i64257ed ID: 21788 Download Pdf

176K - views

Published byellena-manuel

This is a method of classifying patterns based on the class la bel of the closest training patterns in the feature space The common algorithms used here are the nearest neighbourNN al gorithm the knearest neighbourkNN algorithm and the mod i64257ed

Download Pdf

Download Pdf - The PPT/PDF document "MODULE Nearest Neighbour Classier and i..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Page 1

MODULE 7 Nearest Neighbour Classiﬁer and its variants LESSON 11 Nearest Neighbour Classiﬁer Keywords: K Neighbours, Weighted, Nearest Neighbour

Page 2

Nearest neighbour classiﬁers This is amongst the simplest of all classiﬁcation algorithm s in super- vised learning. This is a method of classifying patterns based on the class la bel of the closest training patterns in the feature space. The common algorithms used here are the nearest neighbour(NN) al- gorithm, the k-nearest neighbour(kNN) algorithm, and the mod iﬁed k-nearest

neighbour (mkNN) algorithm. These are non-parametric methods where no model is ﬁtted usi ng the training patterns. The accuracy using nearest neighbour classiﬁers is good. It is guaran- teed to yield an error rate no worse than twice the Bayes error rate (explained in Module 10) which is the optimal error rate. There is no training time required for this classiﬁer. In oth er words, there is no design time for training the classiﬁer. Every time a test pattern is to be classiﬁed, it has to be compa red with all the training patterns, to ﬁnd the closest

pattern. This classiﬁca- tion time could be large if the training patterns are large in number or if the dimensionality of the patterns is high. Nearest neighbour algorithm If there are n patterns ,X ,...,X in the training data, and a test pattern P, If is the most similar pattern to P from X, then the class of P is the class of The similarity is usually measured by computing the distanc e from P to the patterns ,X ,...,X . If P,X ) is the distance from P to then P is the assigned the class label of where

Page 3

P,X ) = min P,X where i = 1 ... n k-Nearest Neighbour (kNN)

classiﬁcation algorithm An object is classiﬁed by a majority vote of the class of its nei ghbours. The object is assigned to the class most common amongst its nearest neighbours. If k=1, this becomes the nearest neighbour algorithm. This algorithm may give a more correct classiﬁcation for bou ndary patterns than the NN algorithm. The value of has to be speciﬁed by the user and the best choice depends on the data. Larger values of reduce the eﬀect of noise on the classiﬁcation. The value of can be arbitrary increased when the training data set is larg

in size. The value can be chosen by using a validation set and choosing the value giving best accuracy on the validation set. The main disadvantage of kNN algorithm is that it is very time c on- suming especially when the training data is large. To overcome this problem, a number of algorithms have been pr oposed to access the k nearest patterns as fast as possible. Modiﬁed k-Nearest Neighbour (mkNN) classiﬁer The contribution of the neighbours to the classiﬁcation is w eighted according to its distance from the test pattern. Hence, the nearest neighbour contributes more

to the classi cation de- cision than the neighbours further away.

Page 4

One weighting scheme would be to give each neighbour a weight of where d is the distance from P to the neighbour. Another weighting scheme ﬁnds the weight from the neighbour a if 1 if where i=1,...,k . The value of varies from 1 for the closest pattern to 0 for the farthest pattern among the k closest patterns. This modiﬁcation would mean that outliers will not aﬀect the classiﬁ- cation as much as the kNN classiﬁer. Example f1 f2 Figure 1: Two class problem Consider the two

class problem shown in Figure 1. There are fo ur pat- terns in Class 1 marked as ’X’ and there are ﬁve patterns in Clas s 2 marked as ’+’. The test pattern is is P. Using the nearest neighbour al gorithm, the

Page 5

closest pattern to P is whose class is 1. Therefore P will be assigned to Class 1. If kNN algorithm is used, after , P is closest to and . So, if k=3, P will be assigned to Class 2. It can be seen that the value of k i s crucial to the classiﬁcation. If k=1, it reduces to the NN classiﬁer. In t his case, if k=4, the next closest pattern could be . If

k=5 and is closer to P than then again due to majority vote, P will be assigned to Class 1. This shows how important the value of k is to the classiﬁcation. If P is an outlier of one class but is closest to a pattern of another class, by taking m ajority vote, the misclassiﬁcation of P can be prevented. 5 8 10 f1 f2 Figure 2: Another two class problem For example, if we consider Figure 2 we can see that the test pa ttern P is closest to which belongs to Class 1 and therefore, it would be classiﬁed as belonging to Class 1 if NN classiﬁer is used. is an outlier of

Class 1 and it can be seen that classifying P as belonging to Class 2 wo uld be more meaningful. If kNN algorithm is used with k=3, then P would be c lassiﬁed as belonging to Class 2. Using mkNN, the classiﬁcation depends on the distances of the cl osest patterns from the test pattern. In the kNN algorithm, all the patterns

Page 6

will have equal importance. In mkNN, the closest pattern is giv en more signiﬁcance than the farthest pattern. The weightage given to the class of the ﬁrst closest pattern is more than for the second closest p attern and so on.

For example, if the 5 nearest neighbours to P are ,X ,X ,X and , where ,P ) = 1 ,d ,P ) = 2 ,d ,P ) = 2 ,d ,P ) = 4 and ,P ) = 5, and if and belong to Class 1 and ,X and belong to Class 2, then the weight given to Class1 by will be 11 = 1 The weight given to Class 1 by will be 14 = 0 25 The total weight of Class 1 will be = 1 0+0 25 = 1 25 The weight given to Class 2 by will be 22 = 0 75 The weight given to Class 2 by will be 23 = 0 625 The weight given to Class 2 by which is 25 will be 0 since it is the farthest of the 5 neighbours. The total weight of Class 2 will be 22 23 25 = 0 75+0 625+0

= 1 375 Since > w , P is classiﬁed as belonging to Class 2. If we consider Figure 1, the closest points to P are ,X ,X ,X and . If the distances from P to ,X ,X ,X and are 0.3, 1.0,1.1,1.5 and

Page 7

1.6, then calculating the weight given to the two classes The weight given to Class 1 by will be 11 = 1 The weight given to Class 1 by will be 12 = 0 077 The total weight of Class 1 will be 11 12 = 1+0 077 = 1 077 The weight given to Class 2 by will be 26 = 0 462 The weight given to Class 2 by will be 27 = 0 385 The weight given to Class 2 by which is 25 is 0, since is the

farthest of the 5 neighbours. Then the total weight for Class 2 will be 26 27 25 = 0 462+0 385+0 = 0 847 Since > w , P is classiﬁed as belonging to Class 1. One point to be noted here is that while kNN algorithm classiﬁe s P as belonging to Class 2, mkNN algorithm classiﬁes P as belonging to Class 1. It can therefore be seen that the classiﬁcation decision using kNN and mkNN may vary.

Â© 2020 docslides.com Inc.

All rights reserved.