Spring 2013 Rong Jin 2 CSE847 Machine Learning Instructor Rong Jin Office Hour Tuesday 400pm500pm TA Qiaozi Gao Thursday 400pm500pm Textbook Machine Learning The Elements of Statistical Learning ID: 140255
Download Presentation The PPT/PDF document "1 Machine Learning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
Machine Learning
Spring
2013
Rong
JinSlide2
2
CSE847 Machine Learning
Instructor:
Rong
Jin
Office Hour:
Tuesday 4:00pm-5:00pm
TA,
Qiaozi
Gao
,
Thursday 4:00pm-5:00pm
Textbook
Machine Learning
The Elements of Statistical Learning
Pattern Recognition and Machine Learning
Many subjects are from papers
Web site: http://www.cse.msu.edu/~cse847Slide3
3
Requirements
~
10
homework assignments
Course project
Topic: visual object recognition
Data: over one million images with extracted visual features
Objective: build a classifier that automatically
identifies
the class of objects
in
images
Midterm exam & final examSlide4
4
Goal
Familiarize you with the state-of-art in Machine Learning
Breadth: many different techniques
Depth: Project
Hands-on experience
Develop the way of machine learning thinking
Learn how to model
real-world
problems
by machine
learning techniques
Learn how to deal with
practical issuesSlide5
5
Course Outline
Theoretical Aspects
Information
Theory
Optimization Theory
Probability Theory
Learning Theory
Practical Aspects
Supervised Learning Algorithms
Unsupervised Learning Algorithms
Important Practical Issues
ApplicationsSlide6
6
Today’s Topics
Why is machine learning?
Example: learning to play backgammon
General issues in machine learningSlide7
7
Why Machine Learning?
Past: most computer programs are mainly made by hand
Future: Computers should be able to program themselves by the interaction with their environmentSlide8
8
Recent Trends
Recent progress in algorithm and theory
Growing flood of online data
Computational power is available
Growing industrySlide9
Big Data Challenge
2.7 Zetabytes (10
21
) of data exists in the digital universe today.
Huge amount of data generated on the Internet every minute
YouTube users upload 48 hours of video,
Facebook
users share 684,478 pieces of content,
Instagram
users share 3,600 new photos,
http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/Slide10
Big Data Challenge
High dimensional data appears in many applications of machine learning
Fine grained visual classification [1]
250,000 featuresSlide11
Why Data Size Matters ?
Matrix completionClassification, clustering, recommender systemsSlide12
Why Data Size Matters ?
Matrix can be perfectly recovered provided
the number of observed entries
O(
rn
log
2
(
n
))Slide13
Why Data Size Matters ?
The recovery error can be arbitrarily large if the number of observed entries <
O(
rn
log
(
n
))Slide14
Why Data Size Matters ?
error
# observed entries
O(
rn
log
(
n
))
O(
rn
log
2
(
n
))
UnknownSlide15
Difficult to access finance for small & medium business
Minimum loan
Tedious loan approval procedure
Low approval rate
Long cycle
Completely big data driven
Leverage e-commerce data to financial services
Alibaba Small and Micro Financial ServicesSlide16
Insurance contracts has year-on-year growth rate of 100%.
Over 1 billion contracts in 2013Over 100 million contracts one day on November 11, 2013
Shipping Insurance for Returned ProductsSlide17
Uniform 5% fixed rate
Fixed rate
Solely based on historical data and demographics
Actuarial approach
Simple
Easy to explain
Pricing model based on a few couple parameters
Data based pricing
Relatively accurate
Millions of features, real time pricing
Machine learned model
Dynamic pricing
Highly accurate
Shipping Insurance for Returned ProductsSlide18
18
Three Niches for Machine Learning
Data mining: using historical data to improve decisions
Medical records
medical knowledge
Software applications that are difficult to program by hand
Autonomous driving
Image Classification
User modeling
Automatic recommender systemsSlide19
19
Typical Data Mining Task
Given:
9147 patient records, each describing pregnancy and birth
Each patient contains 215 features
Task:
Classes of future patients at high risk for Emergency Cesarean SectionSlide20
20
Data Mining Results
One of 18 learned rules
:
If
no previous vaginal delivery
abnormal 2
nd
Trimester Ultrasound
Malpresentation at admission
Then
probability of Emergency C-Section is 0.6Slide21
21
Credit Risk Analysis
Learned Rules
:
If
Other-Delinquent-Account > 2
Number-Delinquent-Billing-Cycles > 1
Then
Profitable-Costumer ? = no
If
Other-Delinquent-Account = 0
(Income > $30K or Years-of-Credit > 3)
Then Profitable-Costumer ? = yesSlide22
22
Programs too Difficult to Program By Hand
ALVINN drives 70mph on highwaysSlide23
23
Programs too Difficult to Program By Hand
ALVINN drives 70mph on highwaysSlide24
24
Programs too Difficult to Program By Hand
Visual object recognitionSlide25
25
Image Retrieval using TextsSlide26
26
Software that Models Users
Description:
A homicide detective and a fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings
Rating
:
Description:
Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son.
Rating
:
Description:
A biography of sports legend, Muhammad Ali, from his early days to his days in the ring
Rating
:
History
What to Recommend?
Description:
A high-school boy is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour
.
Recommend
: ?
Description
:
A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis.
Recommend
: ?
No
YesSlide27
27
Netflix ContestSlide28
28
Relevant Disciplines
Artificial Intelligence
Statistics (particularly Bayesian Stat.)
Computational complexity theory
Information theory
Optimization theory
Philosophy
Psychology
…Slide29
29
Today’s Topics
Why is machine learning?
Example: learning to play backgammon
General issues in machine learningSlide30
30
What is the Learning Problem
Learning = Improving with experience at some task
Improve over task T
With respect to performance measure P
Based on experience E
Example: Learning to Play Backgammon
T: Play backgammon
P: % of games won in world tournament
E: opportunity to play against itselfSlide31
31
Backgammon
More than 10
20
states (boards)
Best human players see only small fraction of all board during lifetime
Searching is hard because of dice (branching factor > 100)
Slide32
32
TD-Gammon by Tesauro (1995)
Trained by playing with itself
Now approximately equal to the best human playerSlide33
33
Learn to Play Chess
Task T: Play chess
Performance P: Percent of games won in the world tournament
Experience E:
What experience?
How shall it be represented?
What exactly should be learned?
What specific algorithm to learn it?Slide34
34
Choose a Target Function
Goal:
Policy:
: b
m
Choice of value function
V: b, m
B = board
= real valuesSlide35
35
Choose a Target Function
Goal:
Policy:
: b
m
Choice of value function
V: b, m
V: b
B = board
= real valuesSlide36
36
Value Function V(b): Example Definition
If b final board that is won: V(b) = 1
If b final board that is lost: V(b) = -1
If b not final board V(b) = E[V(b*)] where b* is final board after playing optimallySlide37
37
Representation of Target Function V(b)
Same value
for each board
Lookup table
(one entry for each board)
No Learning
No Generalization
Summarize experience into
Polynomials
Neural NetworksSlide38
38
Example: Linear Feature Representation
Features:
p
b
(b), p
w
(b) = number of black (white) pieces on board b
u
b
(b), u
b
(b) = number of unprotected pieces
t
b
(b), t
b(b) = number of pieces threatened by opponentLinear function:V(b) = w0pb(b)+ w1pw(b)+ w2
ub(b)+ w3
u
w
(b)+ w
4
t
b
(b)+ w
5
t
w
(b)
Learning:
Estimation of parameters w
0
, …, w
5Slide39
39
Given:board b
Predicted value V(b)
Desired value V*(b)
Calculate
error(b) = (V*(b) – V(b))
2
For each board feature f
i
w
i
w
i
+ c
error(b)fiStochastically minimizesb (V*(b)-V(b))2
Tuning Weights
Gradient Descent OptimizationSlide40
40
Obtain Boards
Random boards
Beginner plays
Professionals playsSlide41
41
Obtain Target Values
Person provides value V(b)
Play until termination. If outcome is
Win: V(b)
1 for all boards
Loss: V(b) -1 for all boards
Draw: V(b) 0 for all boards
Play one move: b
b’
V(b)
V(b’)
Play n moves: b
b’… b
(n)
V(b)
V(b(n))Slide42
42
A General Framework
MathematicalModeling
Finding Optimal Parameters
Statistics
Optimization
+
Machine LearningSlide43
43
Today’s Topics
Why is machine learning?
Example: learning to play backgammon
General issues in machine learningSlide44
44
Importants Issues in Machine Learning
Obtaining experience
How to obtain experience?
Supervised learning vs. Unsupervised learning
How many examples are enough?
PAC learning theory
Learning algorithms
What algorithm can approximate function well, when?
How does the complexity of learning algorithms impact the learning accuracy?
Whether the target function is learnable?
Representing inputs
How to represent the inputs?
How to remove the irrelevant information from the input representation?
How to reduce the redundancy of the input representation?