/
Classification: Hands-on Activity Classification: Hands-on Activity

Classification: Hands-on Activity - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
402 views
Uploaded On 2016-05-02

Classification: Hands-on Activity - PPT Presentation

Handson activity Predicting gaming the system in RapidMiner Open RapidMiner And open classifierxml We currently have the model set up to build a decision tree with actionlevel crossvalidation ID: 302658

model student run set student model set run click features j48 xvalidation file validation data fitting check goodness operator

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Classification: Hands-on Activity" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Classification: Hands-on ActivitySlide2

Hands-on activity

Predicting gaming the system in

RapidMinerSlide3

Open RapidMiner

And open classifier.xml

We currently have the model set up to build a decision tree with action-level cross-validation

Click the Run button (blue triangle)Slide4

Check model goodness

Go to AUC (AUC is another way of referring to A’)

Pretty good, eh?

What’s the problem?Slide5

Check model goodness

This model uses data from the same student both to train and test

And more seriously…Slide6

Re-run

Move W-J48 above

XValidation

Disable

XValidation

Right-click

Xvalidation

and click on Enable Operator

Run the model

Click on text view

What do you see?Slide7

Check model goodness

W-J48

J48 pruned tree ------------------

pknow

<= 0.071168 |

student = N46z59pQP58: N (10.0) |

student = N5LMy832c47: N (5.0) |

student = N668lBbaKFE: N (16.0) |

student = N6O31vedZbI: N (1.0) |

student = tB4vqSxzqo: G (10.0) |

student = N3hSu07XfGd: N (6.0) |

student = N6bJ4auIa8L: N (10.0) |

…Slide8

What’s wrong with this?Slide9

What’s wrong with this?

It’s fitting to the student!

Definitely not a model that could be used with new students!Slide10

Which features should we remove

For a model that has

some

hope of being

generalizable

(Open the data set in Excel to take a look)

WEKA-CTA1Z04-fordev.csv on your USB flash driveSlide11

Which features should we remove

For a model that has

some

hope of being

generalizable

Num

Student

Lesson

Lspair

Skill

Cell

Leave group in, it’s a special caseSlide12

So…

Let’s quickly go into excel, do that, and re-save (with a new file name)

Now go back to

RapidMiner

Change the file name in

CSVExampleSource

to your new file name

Run againSlide13

So…

Anything wrong here?Slide14

So…

Yup, no model.

So everything we had was over-fittingSlide15

Let’s take

A more extensive data set

Specifically,

one with additional distilled

features

WEKA-CTA1Z04-allfeatures.csvSlide16

What to do

Remove the same over-fitting features as before

Change the file name in

CSVExampleSource

to your new file name

Run!

What A’ do you get?Slide17

But wait…

We still are using data from the same student in both the training set and test set

Replace

XValidation

with

BatchXValidation

Right-click on

XValidation

, click Replace Operator, go to Validation, go to Other

If you look at

ChangeAttributeRole

, you’ll see that we are using “group” to set up the cross-validation level (and group refers to a pre-chosen group of students, approximately equal in number of students and number of actions)

Run!

What A’ do you get?Slide18

A’

What A’ do you get?

For me, the incorrect cross-validation did not actually make a difference… this time.

Did it make a difference for any of you?

(It does make a difference sometimes!)Slide19

More things worth trying

Adding interaction features

F1 * F2…

You just need to enable a disabled operator…Slide20

That takes a while!

Can you filter out the interaction parameters that are closely correlated to each other?Slide21

Trying out other algorithms

What if you want to use step regression instead of J48?

Or

logistic regression?

Or decision stumps?

Can you replace the W-J48 operator with these?Slide22

Thanks!