Model Evaluation - PowerPoint Presentation

399 views
Uploaded On 2016-03-03

Model Evaluation - PPT Presentation

CRISPDM CRISPDM Phases Business Understanding Initial phase Focuses on Understanding the project objectives and requirements from a business perspective Converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives ID: 239720

model data positive roc data model roc positive test rate validation disease cross true set 100 false classifier curve

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/239720" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Model Evaluation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Model EvaluationSlide2

CRISP-DMSlide3

CRISP-DM Phases

Business Understanding

Initial phase

Focuses on:

Understanding the project objectives and requirements from a business perspective

Converting this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectivesData UnderstandingStarts with an initial data collectionProceeds with activities aimed at:Getting familiar with the dataIdentifying data quality problemsDiscovering first insights into the dataDetecting interesting subsets to form hypotheses for hidden informationSlide4

CRISP-DM Phases

Data Preparation

Covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data

Data preparation tasks are likely to be performed multiple times, and not in any prescribed order

Tasks include table, record, and attribute selection, as well as transformation and cleaning of data for modeling tools

ModelingVarious modeling techniques are selected and applied, and their parameters are calibrated to optimal valuesTypically, there are several techniques for the same data mining problem typeSome techniques have specific requirements on the form of data, therefore, stepping back to the data preparation phase is often neededSlide5

CRISP-DM Phases

Evaluation

At this stage, a model (or models) that appears to have high quality, from a data analysis perspective, has been built

Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives

A key objective is to determine if there is some important business issue that has not been sufficiently considered

At the end of this phase, a decision on the use of the data mining results should be reachedSlide6

CRISP-DM Phases

Deployment

Creation of the model is generally not the end of the project

Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it

Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process

In many cases it will be the customer, not the data analyst, who will carry out the deployment stepsHowever, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created modelsSlide7

Evaluating Classification Systems

Two issues

What evaluation measure should we use?How do we ensure reliability of our model?Slide8

EvaluAtion

How do we ensure reliability of our model?Slide9

How do we ensure reliability?

Heavily dependent on

trainingSlide10

Data Partitioning

Randomly

partition data into

training

and

test setTraining set – data used to train/build the model.Estimate parameters (e.g., for a linear regression), build decision tree, build artificial network, etc.Test set – a set of examples not used for model induction. The model’s performance is evaluated on unseen data. Aka out-of-sample

data

Generalization Error: Model error on the test data.

Set of training examples

Set of test

examplesSlide11

Complexity and Generalization

train

(

)

test

(

)

Complexity = degrees

of freedom in the model

(e.g., number of variables)

Score Function

e.g., squared

error

Optimal model

complexitySlide12

Holding out data

The

holdout

method reserves a certain amount for testing and uses the remainder for training

Usually: one third for testing, the rest for training

For “unbalanced” datasets, random samples might not be representativeFew or none instances of some classesStratified sample: Make sure that each class is represented with approximately equal proportions in both subsets

12Slide13

Repeated holdout method

Holdout estimate can be made more reliable by repeating the process with different subsamples

In each iteration, a certain proportion is randomly selected for training (possibly with stratification)

The error rates on the different iterations are averaged to yield an overall error rate

This is called the

repeated holdout method

13Slide14

Cross-validation

Most popular and effective type of repeated holdout is cross-validation

Cross-validation

avoids overlapping test sets

First step: data is split into

k subsets of equal sizeSecond step: each subset in turn is used for testing and the remainder for trainingThis is called k-fold cross-validationOften the subsets are stratified before the cross-validation is performed

14Slide15

Cross-validation example:

Slide16

Model Evaluation - PowerPoint Presentation

Model Evaluation - PPT Presentation

Share:

Link:

Embed:

Related Contents