/
Model Evaluation Metrics for Performance Evaluation Model Evaluation Metrics for Performance Evaluation

Model Evaluation Metrics for Performance Evaluation - PowerPoint Presentation

bety
bety . @bety
Follow
66 views
Uploaded On 2023-06-25

Model Evaluation Metrics for Performance Evaluation - PPT Presentation

How to evaluate the performance of a model Methods for Performance Evaluation How to obtain reliable estimates Methods for Model Comparison How to compare the relative performance of different models ID: 1003105

model performance roc class performance model class roc cost methods classifier evaluationhow positive noc accuracy weights noclass yesclass rate

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Model Evaluation Metrics for Performance..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Model EvaluationMetrics for Performance EvaluationHow to evaluate the performance of a model?Methods for Performance EvaluationHow to obtain reliable estimates?Methods for Model ComparisonHow to compare the relative performance of different models?

2. Metrics for Performance EvaluationFocus on the predictive capability of a modelRather than how fast it takes to classify or build models, scalability, etc.Confusion Matrix:PREDICTED CLASSACTUALCLASSClass=YesClass=NoClass=Yesa: TPb: FNClass=Noc: FPd: TNa: TP (true positive)b: FN (false negative)c: FP (false positive)d: TN (true negative)

3. Metrics for Performance Evaluation…Most widely-used metric:PREDICTED CLASSACTUALCLASSClass=YesClass=NoClass=Yesa(TP)b(FN)Class=Noc(FP)d(TN)

4. Limitation of AccuracyConsider a 2-class problemNumber of Class 0 examples = 9990Number of Class 1 examples = 10If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 %Accuracy is misleading because model does not detect any class 1 example

5. Cost Matrix PREDICTED CLASSACTUALCLASSC(i|j)Class=YesClass=NoClass=YesC(Yes|Yes)C(No|Yes)Class=NoC(Yes|No)C(No|No)C(i|j): Cost of misclassifying class j example as class i

6. Computing Cost of ClassificationCost MatrixPREDICTED CLASSACTUALCLASSC(i|j)+-+-1100-10Model M1PREDICTED CLASSACTUALCLASS+-+15040-60250Model M2PREDICTED CLASSACTUALCLASS+-+25045-5200Accuracy = 80%Cost = 3910Accuracy = 90%Cost = 4255

7. Cost vs AccuracyCountPREDICTED CLASSACTUALCLASSClass=YesClass=NoClass=YesabClass=NocdCostPREDICTED CLASSACTUALCLASSClass=YesClass=NoClass=YespqClass=NoqpN = a + b + c + dAccuracy = (a + d)/NCost = p (a + d) + q (b + c) = p (a + d) + q (N – a – d) = q N – (q – p)(a + d) = N [q – (q-p)  Accuracy] Accuracy is proportional to cost if1. C(Yes|No)=C(No|Yes) = q 2. C(Yes|Yes)=C(No|No) = p

8. Cost-Sensitive MeasuresPrecision is biased towards C(Yes|Yes) & C(Yes|No)Recall is biased towards C(Yes|Yes) & C(No|Yes)F-measure is biased towards all except C(No|No)

9. Model EvaluationMetrics for Performance EvaluationHow to evaluate the performance of a model?Methods for Performance EvaluationHow to obtain reliable estimates?Methods for Model ComparisonHow to compare the relative performance of different models?

10. Methods for Performance EvaluationHow to obtain a reliable estimate of performance?Performance of a model may depend on other factors besides the learning algorithm:Class distributionCost of misclassificationSize of training and test sets

11. Learning CurveLearning curve shows how accuracy changes with varying sample sizeRequires a sampling schedule for creating learning curveEffect of small sample size:Bias in the estimateVariance of estimate

12. Methods of EstimationHoldoutReserve 2/3 for training and 1/3 for testing Random subsamplingRepeated holdoutCross validationPartition data into k disjoint subsetsk-fold: train on k-1 partitions, test on the remaining oneLeave-one-out: k=nBootstrapSampling with replacement

13. Model EvaluationMetrics for Performance EvaluationHow to evaluate the performance of a model?Methods for Performance EvaluationHow to obtain reliable estimates?Methods for Model ComparisonHow to compare the relative performance of different models?

14. ROC (Receiver Operating Characteristic)Developed in 1950s for signal detection theory to analyze noisy signals Characterize the trade-off between positive hits and false alarmsROC curve plots TPR (on the y-axis) against FPR (on the x-axis)PREDICTED CLASSActualYesNoYesa(TP)b(FN)Noc(FP)d(TN)

15. ROC (Receiver Operating Characteristic)Performance of each classifier represented as a point on the ROC curvechanging the threshold of algorithm, sample distribution or cost matrix changes the location of the point

16. ROC CurveAt threshold t:TP=0.5, FN=0.5, FP=0.12, FN=0.88- 1-dimensional data set containing 2 classes (positive and negative)- any points located at x > t is classified as positive

17. ROC Curve(TP,FP):(0,0): declare everything to be negative class(1,1): declare everything to be positive class(1,0): idealDiagonal line:Random guessingBelow diagonal line: prediction is opposite of the true classPREDICTED CLASSActualYesNoYesa(TP)b(FN)Noc(FP)d(TN)

18. Using ROC for Model ComparisonNo model consistently outperform the otherM1 is better for small FPRM2 is better for large FPRArea Under the ROC curveIdeal: Area = 1Random guess: Area = 0.5

19. How to Construct an ROC curveInstanceP(+|A)True Class10.95+20.93+30.87-40.85-50.85-60.85+70.76-80.53+90.43-100.25+ Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)

20. How to construct an ROC curveThreshold >= ROC Curve:

21. Ensemble MethodsConstruct a set of classifiers from the training dataPredict class label of previously unseen records by aggregating predictions made by multiple classifiers

22. General Idea

23. Why does it work?Suppose there are 25 base classifiersEach classifier has error rate,  = 0.35Assume classifiers are independentProbability that the ensemble classifier makes a wrong prediction:

24. Examples of Ensemble MethodsHow to generate an ensemble of classifiers?BaggingBoosting

25. BaggingSampling with replacementBuild classifier on each bootstrap sampleEach sample has probability (1 – 1/n)n of being selected

26. BoostingAn iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified recordsInitially, all N records are assigned equal weightsUnlike bagging, weights may change at the end of boosting round

27. BoostingRecords that are wrongly classified will have their weights increasedRecords that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

28. Example: AdaBoostBase classifiers: C1, C2, …, CTData pairs: (xi,yi)Error rate:Importance of a classifier:

29. Example: AdaBoostClassification: Weight update for every iteration t and classifier j :If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n

30. Illustrating AdaBoostData points for trainingInitial weights for each data point

31. Illustrating AdaBoost