/
 Announcements Midterm Grading over the next few days  Announcements Midterm Grading over the next few days

Announcements Midterm Grading over the next few days - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
344 views
Uploaded On 2020-04-09

Announcements Midterm Grading over the next few days - PPT Presentation

Scores will be included in midsemester grades Assignments HW6 Out late tonight Due date Tue 324 1159 pm Plan Last time Nearest Neighbor Classification kNN Nonparametric vs parametric ID: 776513

decision search tree class decision search tree class buildtree attribute set information entropy path samples trees find mutual lowest

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document " Announcements Midterm Grading over the ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Announcements

Midterm

Grading over the next few days

Scores will be included in mid-semester grades

Assignments:

HW6

Out late tonight

Due date Tue, 3/24, 11:59 pm

Slide2

Plan

Last time

Nearest Neighbor Classification

kNN

Non-parametric vs parametric

Today

Decision Trees!

Slide3

Introduction to Machine Learning

Decision Trees

Instructor: Pat Virtue

Slide4

k-NN classifier (k=5)

4

Test document

Whales

Seals

Sharks

Slide5

k-Nearest Neighbor Classification

Given a training dataset and a test input , predict the class label, :Find the closest points in the training data to Return the class label of that closest point , where is the number of the -neighbors with class label c

 

Slide6

k-NN on Fisher Iris Data

6

Special Case: Nearest

Neighbor

Slide7

k-NN on Fisher Iris Data

7

Slide8

k-NN on Fisher Iris Data

8

Special Case: Majority Vote

Slide9

Decision Trees

First a few toolsMajority vote: Classification error rate: What fraction did we predict incorrectly

 

Slide10

Decision trees

Popular representation for classifiersEven among humans!I’ve just arrived at a restaurant: should I stay (and wait for a table) or go elsewhere?

Slide11

Decision trees

It’s Friday night and you’re hungryYou arrive at your favorite cheap but really cool happening burger placeIt’s full up and you have no reservation but there is a barThe host estimates a 45 minute waitThere are alternatives nearby but it’s raining outside

Decision tree

partitions

the input space, assigns a label to each partition

Slide12

Expressiveness

Discrete decision trees can express any function of the inputE.g., for Boolean functions, build a path from root to leaf for each row of the truth table:True/false: there is a consistent decision tree that fits any training set exactly But a tree that simply records the examples is essentially a lookup tableTo get generalization to new examples, need a compact tree

Slide13

Tree to Predict C-Section Risk

Figure from Tom Mitchell

Slide14

Decision Stumps

Split data based on a single attribute

Dataset: Output Y, Attributes A, B, C

Y

A

B

C

-

1

0

0

-

1

0

1

-

1

0

o

+

0

0

1

+

1

1

0

+

1

1

1

+

1

1

0

+

1

1

1

Slide15

Building a decision tree

Function

BuildTree

(

n,A

) // n: samples, A: set of attributes

If empty(A) or all n(L) are the same

status = leaf

class = most common class in n(L)

else

status = internal

a

bestAttribute

(

n,A

)

LeftNode

=

BuildTree

(n(a=1), A \ {a})

Right

Node

=

BuildTree

(n(a=0), A \ {a})

end

end

Slide16

Building a decision tree

Function BuildTree(n,A) // n: samples, A: set of attributes If empty(A) or all n(L) are the same status = leaf class = most common class in n(L) else status = internal a  bestAttribute(n,A) LeftNode = BuildTree(n(a=1), A \ {a}) RightNode = BuildTree(n(a=0), A \ {a}) endend

n(L): Labels for samples in this set

Decision: Which attribute?

Recursive calls to create left and right subtrees, n(a=1) is the set of samples in n for which the attribute a is 1

Slide17

Decision Trees as a Search Problem

Slide18

Background: Greedy Search

18

Start

State

EndStates

Goal:Search space consists of nodes and weighted edgesGoal is to find the lowest (total) weight path from root to a leafGreedy Search:At each node, selects the edge with lowest (immediate) weightHeuristic method of search (i.e. does not necessarily find the best path)

2

4

3

1

7

3

3

5

4

1

2

2

3

5

6

4

7

8

9

8

Slide19

Background: Greedy Search

19

Start

State

EndStates

Goal:Search space consists of nodes and weighted edgesGoal is to find the lowest (total) weight path from root to a leafGreedy Search:At each node, selects the edge with lowest (immediate) weightHeuristic method of search (i.e. does not necessarily find the best path)

2

4

3

1

7

3

3

5

4

1

2

2

3

5

6

4

7

8

9

8

9

9

1

9

Slide20

Background: Greedy Search

20

Start

State

EndStates

Goal:Search space consists of nodes and weighted edgesGoal is to find the lowest (total) weight path from root to a leafGreedy Search:At each node, selects the edge with lowest (immediate) weightHeuristic method of search (i.e. does not necessarily find the best path)

2

4

3

1

7

3

3

5

4

1

2

2

3

5

6

4

7

8

9

8

9

9

1

9

7

1

3

5

2

1

2

2

5

3

1

5

Slide21

Building a decision tree

Function BuildTree(n,A) // n: samples, A: set of attributes If empty(A) or all n(L) are the same status = leaf class = most common class in n(L) else status = internal a  bestAttribute(n,A) LeftNode = BuildTree(n(a=1), A \ {a}) RightNode = BuildTree(n(a=0), A \ {a}) endend

n(L): Labels for samples in this set

Decision: Which attribute?

Recursive calls to create left and right subtrees, n(a=1) is the set of samples in n for which the attribute a is 1

Slide22

Identifying ‘bestAttribute’

There are many possible ways to select the best attribute for a given set.

We will discuss one possible way which is based on information theory.

Slide23

Entropy

Quantifies the amount of uncertainty associated with a specific probability distributionThe higher the entropy, the less confident we are in the outcomeDefinition

Claude Shannon (1916 – 2001), most of the work was done in Bell labs

Slide24

Entropy

DefinitionSo, if P(X=1) = 1 thenIf P(X=1) = .5 then

H(X)

Slide25

Mutual Information

25

For a decision tree, we can use

mutual information

of the output class

Y

and some attribute

X on which to split as a splitting criterionGiven a dataset D of training examples, we can estimate the required probabilities as…

Slide26

Mutual Information

26

For a decision tree, we can use

mutual information

of the output class

Y

and some attribute

X on which to split as a splitting criterionGiven a dataset D of training examples, we can estimate the required probabilities as…

Informally

, we say that mutual information is a measure of the following: If we know X, how much does this reduce our uncertainty about Y?

Entropy

measures the expected # of bits to code one random draw from X. For a decision tree, we want to reduce the entropy of the random variable we are trying to predict!

Conditional entropy

is the expected value of specific conditional entropy EP(X=x)[H(Y | X = x)]

Slide27

Decision Tree Learning Example

Which attribute would mutual information select for the next split?ABA or B (tie)Neither

27

Dataset: Output Y, Attributes A and B

Y

A

B

-

1

0

-

1

0

+

1

0

+

1

0

+

1

1

+

1

1

+

1

1

+

1

1

Slide28

Decision Tree Learning Example

28

YAB-10-10+10+10+11+11+11+11