/
Feature Engineering Studio Feature Engineering Studio

Feature Engineering Studio - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
363 views
Uploaded On 2018-03-13

Feature Engineering Studio - PPT Presentation

September 23 2013 Welcome to Mucking Around Day Sort into pairs Partner with the person next to you One group of 3 is allowed Sort into pairs Do we have a group of 3 One of the 3 will work with me ID: 649895

features feature show data feature features data show histogram time excel distribution don

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Feature Engineering Studio" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Feature Engineering Studio

September 23, 2013Slide2

Welcome to

Mucking Around DaySlide3

Sort into pairs

Partner with the person next to you

One group of 3 is allowedSlide4

Sort into pairs

Do we have a group of 3?

One of the 3 will work with meSlide5

Sort into pairs

Go over your reports together

A maximum of 5 minutes apieceSlide6

5 minutes for first personSlide7

5 minutes for second personSlide8

Re-assemble into one big groupSlide9

Who here found something really cool while mucking around?

Show

us, tell usSlide10

Who here found a histogram with a normal distribution?

Show us, tell usSlide11

Who here found a histogram with a

hypermode

?

Show us, tell usSlide12

Who here found a histogram with a flat distribution?

Show us, tell usSlide13

Who here found a histogram with a skewed distribution?

Show us, tell usSlide14

Who here found a histogram with a bimodal distribution?

Show us, tell usSlide15

Who here found a histogram with something else interesting?

Show us, tell usSlide16

Who here found something surprising with their min, max, average,

stdev

?Slide17

Categorical variables

Who here found something curious, weird, or interesting in the distribution of their categorical variables?Slide18

Who here hasn’t spoken yet?

(and analyzed data)

Tell us something interesting you found in your dataSlide19

Who here played with pivot tables?

What did you learn?Slide20

My turn to play with pivot tables

Who wants to volunteer their data?

(I might request a 2

nd

or 3

rd

data set, depending on how the 1

st

one goes)Slide21

Who here played with vlookup

?

What did you learn?Slide22

My turn to play with vlookup

Using the same volunteered data set(s)Slide23

Other cool things you can create with a few simple formulas (plus demos!)Slide24

Identifying specific cases of interestSlide25

Did event of interest ever occur for student?Slide26

Counts-so-far

(and total value for student)Slide27

Counts-last-N-actionsSlide28

First attemptsSlide29

Ratios between events of interestSlide30

How many students had 3 (or 4, 5, 2,…) of an eventSlide31

Times-so-farSlide32

Cutoff-based featuresSlide33

Unitized actions (such as unitized time)Slide34

Last 3 or 5 unitizedSlide35

Comparing earlier behaviors to later behaviors through cachingSlide36

Counts-ifSlide37

Percentages of action typeSlide38

Percentages of time spent per action/location/KC/etc.Slide39

Questions? Comments?Slide40

Other cool ideas?Slide41

Assignment 3

Feature Engineering 1

“Bring Me a Rock”

Get your data set

Open it in Excel

Create as many features as you feel inspired to create

Features should be created with the goal of predicting your ground truth variable

At least 12 separate features that are not just variations on a theme (e.g. “time for last 3 actions” and “time for last 4 actions” are variations on a theme; but

“time for last 3 actions” and “total time between help requests and next action” are two separate features

)

For each feature, write a 1-3 sentence “just so story” for why it might work

Test how good each features isSlide42

Testing Feature Goodness

For this assignment, there are a bunch of ways to test feature goodness

Single-feature prediction models in data mining or stats package, giving correlation or kappa (special session this Wednesday)

Compute correlation in Excel (want to see?)

You can do this with binaries variables too, although it’s not really optimal

Compute t-test in Excel (want to see?)

Compute kappa in Excel (if you don’t know how, easier to do in

RapidMiner

)Slide43

Were you right?

Which of your “just so stories” seem to be correct?

Did

any of your feature correlate in the opposite direction from what you expected?Slide44

Assignment 3

Write a brief report for me

Email me an excel sheet with your features

You don’t need to prepare a presentation

But be ready to discuss your features in classSlide45

Next Classes

9/25 Special Session

Using

RapidMiner

to Produce Prediction Models

Come to this if you’ve never built a classifier or

regressor

in

RapidMiner

(or a similar tool)

Statistical significance tests using linear regression don’t count…

9/30 Advanced Feature Distillation in Excel

Assignment 3 due

Online Equation Solver Tutorials should be in your INBOXSlide46

Upcoming Classes

10/2

Special session on prediction models

Come to this if you don’t know why student-level cross-validation is important, or if you don’t know what J48

is

10/7 Advanced Feature Distillation in Google Refine

10/9 Special session? TBD.