/
TOMER BEN MOSHE May 2017 TOMER BEN MOSHE May 2017

TOMER BEN MOSHE May 2017 - PowerPoint Presentation

cady
cady . @cady
Follow
66 views
Uploaded On 2023-07-22

TOMER BEN MOSHE May 2017 - PPT Presentation

MACHINE LEARNING GOALS Define the problem Prove some basic bounds MACHINE LEARNING What is machine learning Core PROBLEM Or The labeled training examples Main goal Classification rule ID: 1010146

concept hypothesis 010 target hypothesis concept target 010 101 emails error 110 0100 001 010010001010101101110101 theorem 101emails disjunction bits

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "TOMER BEN MOSHE May 2017" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. TOMER BEN MOSHEMay 2017MACHINE LEARNING

2. GOALSDefine the problem.Prove some basic bounds.

3. MACHINE LEARNINGWhat is machine learning?

4. Core PROBLEM = Or – The labeled training examplesMain goal: Classification rule 

5. DEFINITIONSDistribution D over X.S is taken independently at random from D.Main goal : Predict well on new points that are also drawn from D.

6. DEFINITIONS - Target Concept.- hypothesisMain goal: Produce hypothesis, close as possible to c* (with respect to D). 

7. DEFINITIONS - True error of h. Main goal: Produce h with true error as low as possible. - Training error of h.Overfitting. 

8. Formalizing the problemH – Hypothesis class over XMain goal : Given H and S, find the hypothesis in H that most closely agree with c* over D.Assume H to be finite. 

9. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

10. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

11. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

12. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

13. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)10101Emails(X)Target Concept(c*)01001000101010110111010101

14. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

15. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

16. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

17. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

18. Emails(X)Target Concept(c*)Hypothesis Hypothesis (0,0,0)010(0,0,1)0100,1,0)001(0,1,1)010(1,0,0)101(1,0,1)101(1,1,0)110(1,1,1)101Emails(X)Target Concept(c*)010010001010101101110101

19. overfittingGood training error but bad true errorHow big should the sample group be in order to promise good true error?

20. thEorEm 5.1Let H be a hypothesis class and let ε, > 0.If S of size , is drawn from distribution D.Then w.p ≥ , every with , has . 

21. Emails(X)Target Concept(c*)Hypothesis (0,0,0)01(0,0,1)010,1,0)00(0,1,1)01(1,0,0)10(1,0,1)10(1,1,0)11(1,1,1)10Emails(X)Target Concept(c*)0101000110101110

22. Emails(X)Target Concept(c*)Hypothesis (0,0,0)01(0,0,1)010,1,0)00(0,1,1)01(1,0,0)10(1,0,1)10(1,1,0)11(1,1,1)10Emails(X)Target Concept(c*)0101000110101110

23. Emails(X)Target Concept(c*)Hypothesis (0,0,0)01(0,0,1)010,1,0)00(0,1,1)01(1,0,0)10(1,0,1)10(1,1,0)11(1,1,1)10Emails(X)Target Concept(c*)0101000110101110

24. overfittingPAC-Learning Guarantee.It guarantees a Hypothesis that is Probably Approximately Correct.The theorem only addresses What about ? 

25. OVERFITTINGIf S is Sufficiently large, with high Probability, Good Performance on S will Translate to Good Performance on D.

26. thEoREM 5.3Let H be a hypothesis class and let .S of size , is drawn from distribution D.W.p ≥ , every H satisfies  

27. Hoeffding boundLet be independent {0,1}-valued randomly variables w.p p that = 1.Let .For any : 

28. Learning disjunctionsX=H be the class of all possible disjunctions.But how can we efficiently build a consistent disjunction when one exists? 

29. Simple disjunction learnerGiven sample S, discard all features that are set to 1 in any negative example in S. Output the concept h that is the OR of all features that remain.

30. Simple disjunction learner(Lemma 5.4)The simple disjunction learner produces a disjunction h that is consistent with the sample S () whenever the target concept is indeed a disjunction. 

31. Simple disjunction learnerAnd it is proven that this algorithm is efficient for PAC-Learning the class of disjunction.

32. Occam’s razorOne should prefer simpler explanations over more complicated ones.But what does “Simple” mean?Fewer bits.

33. Occam’s razorWhat is the most simple method?All in h or nothing in h.Just one bit.Bad performance on S.What is the most complicated method?Specify any input in h.No training error.Bad true error.

34. Occam’s razorLet set H to be the set of all h that can be described using at most b bits.Plugging that into theorem 5.1. 

35. Reminder - thEorEm 5.1Let H be a hypothesis class and let ε, > 0.If S of size , is drawn from distribution D.Then w.p ≥ , every with , has . 

36. Occam’s razor(theorem 5.5)W.p at Least , any rule h consistent with S that can be described in this language using fewer than b bits will have for .In other words, w.p at least , all rules consistent with S that can be described in fewer than b Bits will have  

37. Occam’s razorNo matter what method of ordering the bits is being used, the theorem can be applied.Good approach of picking good and simple rules.

38.  10 10 10 10DECISION TREE+-++

39. Learning decision treesFinding the smallest decision tree is NP-Hard.There are heuristic methods.Suppose that we run such a method and got a tree with k nodes. O(k log d) bits to describe such a tree.By theorem 5.5 – good true error if wecan find consistent tree withfewer than nodes. 

40. RegularizationSuppose we have very simple rule with training error of 20%, or a complicated rule with 10%.We need something that’s called “Regularization”, also called “Complexity Penalization”.In general, we need to penalize any complication.

41. RegularizationLet denote The hypotheses that can be describes in i bits.Let  

42. Reminder - thEoREM 5.3Let H be a hypothesis class and let .S of size , is drawn from distribution D.W.p ≥ , every H satisfies  

43. RegularizationRemember By all that and the union bound gives us the following corollary: 

44. Regularization(Corollary 5.6) W.p ≥ , all hypotheses h satisfy:That gives us a good tradeoff between complexity and training error. 

45. The END Any Questions?