/
Questions Review October Questions Review October

Questions Review October - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
389 views
Uploaded On 2016-11-23

Questions Review October - PPT Presentation

12 2010 How does post decision tree postpruning work What is the purpose of applying postpruning in decision tree learning What are the characteristics of representativebased prototypebased clustering algorithmswhat do they all have in common ID: 492639

clustering clusters means dbscan clusters clustering dbscan means point post dataset points popular characteristics matrix distance spatial decision core complexity medoids pruning

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Questions Review October" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Questions Review October 12, 2010

How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision tree learning?

What are the characteristics of representative-based/ prototype-based clustering algorithms—what do they all have in common?

K-means is one of the most popular clustering algorithms. Give reasons why K-means is that popular!

What of the following cluster shapes K-means is capable to discover? a) triangles b) clusters inside clusters

c) the letter ‘T ‘d) any polygon of 5 points e) the letter ’I’

5. Assume we apply K-

medoids

for k=3 to a dataset consisting of 5 objects numbered 1,..5 with the following distance matrix:

Distance Matrix:

0 2 4 5 1

object1

0 2 3 3

0 1 5

0 2

0

The current set of representatives is {1,3,4}; indicate all computations k-

medoids

(PAM) performs in its next iteration

What are the characteristics of a border point in DBSCAN?

If you increase the

MinPts

parameter of DBSCAN; how will this affect the clustering results?

DBSCAN supports the notion of outliers. Why is this desirable?

What are the

How is region discovery in spatial datasets different from traditional clustering

?

What are the unique characteristics of hierarchical clustering?Slide2

SomeAnswers Review October

12, 2010

How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision tree learning?

No answer first question!

To obtain a low generalization error! To find the correct amount of model complexity that leads to a low generalization error.

What are the characteristics of representative-based/ prototype-based clustering algorithms—what do they all have in common?

a) The form clusters by assigning objects in the dataset to the closest prototype/representative.(using 1-NN queries b) They are iterative algorithms that change the current partitioning until a predefined termination condition is met [c)cluster shapes are limited to convex polygons]

K-means is one of the most popular clustering algorithms. Give reasons why K-means is that popular!

K-means is popular because it is relatively efficient (runtime complexity is basically O(n)and storage complexity is O(n)) and easy to use. It uses implicit fitness function (SSE) and terminates at local optimal for this fitness function. Its properties are well understood.

What of the following cluster shapes K-means is capable to discover? a) triangles b) clusters inside clusters

c) the letter ‘T ‘d) any polygon of 5 points e) the letter ’I’

Only a and e!!

5

. Assume we apply K-

medoids

for k=3 to a dataset consisting of 5 objects numbered 1,..5 with the following distance matrix:

The current set of representatives is {1,3,4}; indicate all computations k-

medoids

(PAM) performs in its next iteration

Distance Matrix:

0 2 4 5 1

object1

0 2 3 3

0 1 5

0 2

0

Slide3

Answers Review October 12, 2010 Cont

.

What are the characteristics of a border point in DBSCAN?

It is not a core point but it is within the radius

of one or more core points.

If you increase the

MinPts

parameter of DBSCAN; how will this affect the clustering results?

There will be more outliers! It is hard to say if the number of clusters will increase or decrease (two effects interact: some clusters die(

less clusters)

; some other bigger clusters will be split into multiple smaller clusters(

more clusters

))

DBSCAN supports the notion of outliers. Why is this desirable?

a) More descriptive and compact clusters b)no need to remove outliers prior to clustering

DBSCAN has a complexity of O(n**2) which can be reduced by using spatial index structures to O(log(n)*n). Explain!

For each point in the dataset we have to decide if it is a core point or not, which takes O(n) without supportive data structures; because there are n points in the dataset we obtain O(n**2). For each core c point we also have to compute all the points that are density-reachable from c, but this is O(n) or less…

How is region discovery in spatial datasets different from traditional clustering?

a) Supports plug-in fitness functions b) Finds clusters in the subspace of spatial attributes and not in the complete attribute space!