February 2 2015 Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class Rules for Presenters Talk for 3 minutes on Data set What variable will you predict What kind of variables will you use to predict it ID: 347499
Download Presentation The PPT/PDF document "Feature Engineering Studio" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Feature Engineering Studio
February 2, 2015Slide2
Welcome to
Problem Proposal Day
Rules for Presenters
Rules for the Rest of the ClassSlide3
Rules for Presenters
Talk for 3 minutes on:
Data set
What variable will you predict?
What kind of variables will you use to predict it?
Why is this worth doing?
And please email
me your slides (if any)Slide4
Rules for Audience
After the presentation
Ask quick questions
Give quick suggestionsSlide5
Criteria
Everyone
Is the problem genuinely important? (usable or publishable)
Is there a good measure of ground truth?
Only if you know what you’re talking about
Is there rich enough data to distill meaningful features?
Is there enough data to be able to take advantage of data mining?Slide6
Rules for Audience
Be polite!
No interrupting
No rambling
No being meanSlide7
Presentations
Alphabetical Order Based on Last Name
Tie-Breaker: First NameSlide8
For next week
Think about how to improve your problem proposal
Rewrite your problem proposal based on the feedback you got today
Then email it to me for further feedback and a “thumbs-up” before the next classSlide9
Assignment 2
Data Familiarization
“Mucking Around”
Get your data set
Open it in
Excel (or another tool you prefer)
Look at your ground truth label (if you have one)
Look at other key variables
What does each variable mean semantically?
If numerical, what are its max, min, average,
stdev
? Create histograms of key variables.
If categorical, what is the distribution of each value?Slide10
Assignment 2
Data Familiarization
“Mucking Around”
Write a brief report for me
You don’t need to prepare a presentation
But be ready to discuss what you learn about your
data, in classSlide11
What if you don’t have data yet?
Get your
dataSlide12
What if you don’t have data yet?
Get your data
If you
don’t have your data yet, email
me
at least 48 hours before the assignment is due and
I’ll send you a practice data setSlide13
How to compute in Excel
If numerical, what are its max, min, average,
stdev
?
If
categorical, what is the distribution of each value?
Using Class2DataSlide14
How to do a histogram in Excel
Using
Class2DataSlide15
Next Session
2/4 Lab Session: Using
RapidMiner
If you don’t know how to build a prediction model in
RapidMiner
, you should attend this session
If you
do
know how to build a prediction model in
RapidMiner
, you don’t have to attendSlide16
Next Class After That
2/16 Data Cleaning (Asgn.2
due)
Do the assignment
Read the readingsSlide17
Note
2/9 No class
2/11 No classSlide18
Questions? Comments? Concerns?