/
Questions and Topics Review Nov. 22, 2011 Questions and Topics Review Nov. 22, 2011

Questions and Topics Review Nov. 22, 2011 - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
425 views
Uploaded On 2015-09-26

Questions and Topics Review Nov. 22, 2011 - PPT Presentation

Assume you have to do feature selection for a classification task What are the characteristics of features attributes you might remove from the dataset prior to learning the classification algorithm ID: 140770

attributes approach spatial lazy approach attributes lazy spatial clustering hyperplane separable space support vector svm model slack decision objective

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Questions and Topics Review Nov. 22, 201..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Questions and Topics Review Nov. 22, 2011

Assume you have to do feature selection for a classification task.

What

are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm

?

How

is region discovery in spatial datasets different from traditional clustering?

What are the unique characteristics of hierarchical clustering?

Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0,1), (2,2)} {(3,2), (3,3)}.

To be discussed in review on Dec. 1!

Silhouette: For an individual point,

i

Calculate

a

= average distance of

i

to the points in its cluster

Calculate

b

= min (average distance of

i

to points in another cluster)

The silhouette coefficient for a point is then given by:

s = (b-a)/max(

a,b

)

Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses!

K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy

approach?

Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space?

What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of

hyperplanes

are maximized by the objective function f(w) (on page 268) in the approach? Slide2

Support Vector Machines

What if the problem is not linearly separable?

Square

hyperplane

Circle

hyperplaneSlide3

Linear SVM for Non-linearly Separable Problems

What if the problem is not linearly separable?

Introduce slack variables

Need to minimize:

Subject to (i=1,..,N):

C is chosen using a validation set trying to keep the margins wide while keeping the training error low.

Measures

error

Inverse size of margin

between hyperplanes

Parameter

Slack variable

allows constraint violation

to a certain degree Slide4

Answers Review Nov. 22, 2011

Assume you have to do feature selection for a classification task. What are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm

?

Redundant attributes; irrelevant attributes

How is region discovery in spatial datasets different from traditional clustering

? Clustering is performed in the subspace of spatial attributes; e.g. clusters are contiguous in the space of the spatial attributes; separation between spatial and non-spatial attributes---non-spatial attributes are on used by objective functions/cluster evaluation measures. What are the unique characteristics of hierarchical clustering? Computes a dendrogram and multiple

clusterings

; a dendrogram captures hierarchical relationships between clusters; HC relies on agglomerative/divisive approaches to compute clusters based on an union operator: C=C1

C2

…Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses!

DT: many, rectangular for numerical attributes K-NN: many, convex polygons (Voronoi cells), SVM: one, a single

hyperplaneWhy do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space?

To make them linearly separable. Slide5

Answers2 Review Nov. 22, 2011

6. K-NN

is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach.

Lazy: postpones “learning a model”

disadvantages: slow as the model is “obtained” when classifying an object and not beforehand…; (other disadvantage (not asked for here)): there is no explicit model and therefore no way to show it to a domain expert which makes it difficult to establish trust into the learnt model.

Advantage of “being lazy”: for quickly changing streaming data learning the model might be a waste of time as the model changes quickly over time and a lazy approach might be better…8. What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of

hyperplanes are maximized by the objective function f(w) (on page 268) in the approach?

The slack variables 

i measure the distance of an object

i to the object’s class hyperplane

if it is on the wrong side of the object’s class hyperplane

, and 0 if the example is on the correct side of the hyperplane.

i

measures the error associated with the i’s example. Soft Margin SVM solve a 2-objective optimization problem, trying to minimize the errors associated with the examples (

i) while keeping the margins as wide as possible (minimizing

||w||)—the parameter C determines how much emphasis is put on each of the two objectives.