CS4445B12 Provided by Kenneth J Loomis Entropy of the original set genre criticsreviews rating IMAX likes comedy thumbsup R FALSE no comedy thumbsup R TRUE no comedy neutral ID: 563437
Download Presentation The PPT/PDF document "Homework 1: Solutions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Homework 1: Solutions
CS4445/B12
Provided by: Kenneth J. LoomisSlide2
Entropy of the original set
genre
critics-reviews
ratingIMAXlikescomedythumbs-upRFALSEnocomedythumbs-upRTRUEnocomedyneutralRFALSEnoactionthumbs-downPG-13TRUEnoactionneutralRTRUEnocomedythumbs-downPG-13FALSEyescomedyneutralPG-13TRUEyesdramathumbs-upRFALSEyesdramathumbs-downPG-13TRUEyesdramaneutralRTRUEyesdramathumbs-upPG-13FALSEyesactionneutralRFALSEyesactionthumbs-downPG-13FALSEyesactionneutralPG-13FALSEyes
Entropy (target attribute)
Slide3
Determine the root node attribute
.6935
genre=comedy=drama=actiongenrecritics-reviewsratingIMAXlikesactionthumbs-downPG-13TRUEnoactionneutralRTRUEnoactionneutralRFALSEyesactionthumbs-downPG-13FALSEyesactionneutralPG-13FALSEyescomedythumbs-upRFALSEnocomedythumbs-upRTRUEnocomedyneutralRFALSEnocomedythumbs-downPG-13FALSEyescomedyneutralPG-13TRUEyesdramathumbs-upR
FALSEyesdramathumbs-downPG-13TRUE
yes
drama
neutral
R
TRUE
yes
drama
thumbs-up
PG-13
FALSE
yesSlide4
Determine the root node attribute
genre
critics-reviews
ratingIMAXlikesactionneutralRTRUEnocomedyneutralRFALSEnoactionneutralRFALSEyesactionneutralPG-13FALSEyescomedyneutralPG-13TRUEyesdramaneutralRTRUEyesactionthumbs-downPG-13TRUEnoactionthumbs-downPG-13FALSEyescomedythumbs-downPG-13FALSEyesdramathumbs-downPG-13TRUEyescomedythumbs-upRFALSEnocomedythumbs-upRTRUEnodramathumbs-upRFALSEyesdramathumbs-upPG-13FALSEyes
.9111
c
ritics-reviews
=thumbs-up
=neutral
=thumbs-downSlide5
Determine the root node attribute
genre
critics-reviews
ratingIMAXlikesactionthumbs-downPG-13TRUEnoactionneutralPG-13FALSEyescomedyneutralPG-13TRUEyesactionthumbs-downPG-13FALSEyescomedythumbs-downPG-13FALSEyesdramathumbs-downPG-13TRUEyesdramathumbs-upPG-13FALSEyesactionneutralRTRUEnocomedyneutralRFALSEnocomedythumbs-upRFALSEnocomedythumbs-upRTRUEnoactionneutralRFALSEyesdramaneutralRTRUEyesdramathumbs-upRFALSEyes
.7885
rating
=PG-13
=RSlide6
Determine the root node attribute
genre
critics-reviews
ratingIMAXlikescomedyneutralRFALSEnocomedythumbs-upRFALSEnoactionneutralPG-13FALSEyesactionthumbs-downPG-13FALSEyescomedythumbs-downPG-13FALSEyesdramathumbs-upPG-13FALSEyesactionneutralRFALSEyesdramathumbs-upRFALSEyesactionthumbs-downPG-13TRUEnoactionneutralRTRUEnocomedythumbs-upRTRUEnocomedyneutralPG-13TRUEyesdramathumbs-downPG-13TRUEyesdramaneutralRTRUEyes
.8922
IMAX
=FALSE
=TRUESlide7
Determine the root node attribute
.6935
.9111
.7885.8922 genre=comedy=drama=actionWe can see that genre provides us with the lowest entropy, thus it becomes the root node of our ID3 tree.Slide8
Determine the left child attribute
genre
=comedy
=drama=actionOptions: critics-reviews rating IMAX?We now move on to the left child node of our tree. What attribute do we choose for this node?Slide9
Determine the left child attribute
genre
=comedy
=drama=actioncritics-reviews=thumbs-up=neutral=thumbs-downgenrecritics-reviewsratingIMAXlikescomedyneutralRFALSEnocomedyneutralPG-13TRUEyescomedythumbs-downPG-13FALSEyescomedythumbs-upRFALSEnocomedythumbs-upRTRUEno.4000 Slide10
Determine the left child attribute
genre
=comedy
=drama=actionrating=R=PG-13 genrecritics-reviewsratingIMAXlikescomedyneutralPG-13TRUEyescomedythumbs-downPG-13FALSEyescomedyneutralRFALSEnocomedythumbs-upRFALSEnocomedythumbs-upRTRUEnoSlide11
Determine the left child attribute
genre
=comedy
=drama=actionIMAX=R=PG-13 genrecritics-reviewsratingIMAXlikescomedyneutralRFALSEnocomedythumbs-upRFALSEnocomedythumbs-downPG-13FALSEyescomedythumbs-upRTRUEnocomedyneutralPG-13TRUEyesSlide12
Determine the left child attribute
genre
=comedy
=drama=actionrating=R=PG-13.4000 We can see that rating provides us with the lowest entropy, thus it becomes the left child node of our ID3 tree.Slide13
Determine the left child attribute
genre
=comedy
=drama=actionrating=R=PG-13This also makes this split homogeneous so we can add our leaf nodes here.[yes][no]genrecritics-reviewsratingIMAXlikescomedyneutralPG-13TRUEyescomedythumbs-downPG-13FALSEyescomedyneutralRFALSEnocomedythumbs-upRFALSEnocomedythumbs-upRTRUEnoSlide14
Determine the center child attribute
genre
=
comedy=drama=actionrating=R=PG-13We can see that genre = drama provides us with a homogeneous sub-set, so we can provide a leaf node here.[yes]genrecritics-reviewsratingIMAXlikesdramathumbs-upRFALSEyesdramathumbs-downPG-13TRUEyesdramaneutralRTRUEyesdramathumbs-upPG-13FALSEyes[yes][no]Slide15
Determine the right child attribute
genre
=
comedy=drama=actionrating=R=PG-13We now move on to the right child node of our tree. What attribute do we choose for this node?Options: critics-reviews rating IMAX?[yes][no][yes]Slide16
Determine the right child attribute
genre
=
comedy=drama=actionrating=R=PG-13Critics-reviews=thumbs-up=neutral=thumbs-down genrecritics-reviewsratingIMAXlikesactionneutralRTRUEnoactionneutralRFALSEyesactionneutralPG-13FALSEyesactionthumbs-downPG-13TRUEnoactionthumbs-downPG-13FALSEyes[yes][no][yes]Slide17
Determine the right child attribute
genre
=
comedy=drama=actionrating=R=PG-13rating=R=PG-13 genrecritics-reviewsratingIMAXlikesactionthumbs-downPG-13TRUEnoactionneutralPG-13FALSEyesactionthumbs-downPG-13FALSEyesactionneutralRTRUEnoactionneutralRFALSEyes[yes][no][yes]Slide18
Determine the right child attribute
genre
=
comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE genrecritics-reviewsratingIMAXlikesactionneutralPG-13FALSEyesactionthumbs-downPG-13FALSEyesactionneutralRFALSEyesactionthumbs-downPG-13TRUEnoactionneutralRTRUEno[yes][no][yes]Slide19
Determine the right child attribute
genre
=
comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSEEntropy (critics-reviews) = .9510 = .9510Entropy (IMAX) = 0.0 We can see that IMAX provides us with the lowest entropy, thus it becomes the right child node of our ID3 tree.[yes][no][yes]Slide20
Determine the right child attribute
genre
=
comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSEThis also makes this split homogeneous so we can add our leaf nodes here.genrecritics-reviewsratingIMAXlikesactionneutralPG-13FALSEyesactionthumbs-downPG-13FALSEyesactionneutralRFALSEyesactionthumbs-downPG-13TRUEnoactionneutralRTRUEno[yes][no][yes][yes][no]Slide21
ID3 Decision tree is complete
genre
=
comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSESince we have only leaf nodes remaining we are finished building our tree.[yes][no][yes][yes][no]Slide22
Handling
missing
values during prediction
genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSEHow can we handle missing values using this decision tree?Given an instance:Genre = actionCritics-reviews = ?Rating = RIMAX = ? How do we classify it?[yes][no][yes][yes][no]Slide23
Handling missing
values during prediction: a
solution
Consider adding frequency counts to each leaf node:shown here in curly braces.genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide24
Handling missing
values during prediction: a
solution
Genre = actionCritics-reviews = ?Rating = RIMAX = ?Traverse the tree.genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide25
Handling missing
values during
prediction:
a solutionGenre = actionCritics-reviews = ?Rating = RIMAX = ?Traverse the decision tree normally when the attribute value is known.genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide26
Handling missing
values during
prediction:
a solutionGenre = actionCritics-reviews = ?Rating = RIMAX = ?Traverse every possible path when a missing value is encountered.genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide27
Handling missing
values during
prediction:
a solutionGenre = actionCritics-reviews = ?Rating = RIMAX = ?Traverse every possible path when a missing value is encountered.Sum the frequency counts of all like leaf nodes that are reached: genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide28
Handling missing
values during
prediction:
a solutionGenre = actionCritics-reviews = ?Rating = RIMAX = ?like = yesFollow every possible path when a missing value is encountered.Determine the frequency count by summing like classification frequencies:Classify based on the highest frequency count. genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide29
Handling missing
values during
prediction:
2nd exampleGenre = ?Critics-reviews = ?Rating = RIMAX = TRUElike = noConsider this 2nd example: genre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Slide30
Handling missing
values during
prediction:
3rd exampleGenre = ?Critics-reviews = ?Rating = ?IMAX = ?likes = yesgenre=comedy=drama=actionrating=R=PG-13IMAX=TRUE=FALSE[yes] {2}[no] {3}[yes] {4}[yes] {3}[no] {2}Consider if all attribute values are unknown: