Shiqin Yan Objective Utilize the already existed database of the mushrooms to build a decision tree to assist the process of determine the whether the mushroom is poisonous DataSet Existing record ID: 269476
Download Presentation The PPT/PDF document "Distinguish Wild Mushrooms with Decision..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Distinguish Wild Mushrooms with Decision Tree
Shiqin YanSlide2
Objective
Utilize the already existed database of the mushrooms to build a decision tree to assist the process of determine the whether the mushroom is
poisonous
.Slide3
DataSet
Existing record
drawn
from the Audubon Society Field Guide to North American Mushrooms (1981) .
G. H.
Lincoff
(Pres. ),
NewYork
: Alfred A. Knopf
Number of Instances: 8124 (classified as either edible or poisonous)
Number of Attributes: 22
Training: 5416,
Tuning
: 1354, Testing:
1354
Missing attribute values: 2480 (denoted by “?”), all for attribute 11Slide4
Mushroom Features
1. cap-shape: bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken = s
2. cap-surface: fibrous=f, grooves=g, scaly=y, smooth=s
3. cap-color: brown=n, buff=b, cinnamon=c, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y
4. bruise?: bruises=t, no=f
5. odor: almond=a, anise=l, creosote=c, fishy=y, foul=f
… Slide5Slide6
Approach
Mutual information to determine the features used to split the tree.
Mutual information:
Y: label, X: feature
Choose feature X which maximizes I(Y;X
)
Slide7Slide8
Most informative features extracted from decision tree:
odor
spore-print-color
habitat
populationSlide9
Prior Research
b
y
Wlodzislaw
Duch
, Department of Computer Methods, Nicholas Copernicus University Slide10
Add cross-validation to improve the accuracyPrune the tree to avoid over-fitting
Future