Through this class we will be relying on concepts from probability theory for deriving machine learning algorithms These notes attempt to cover the basics of probability theory at a level appropriate for CS 229 The mathematical theory of probability ID: 23112 Download Pdf
Predictability of Change. Alberto . Montanari. (1). and . Guenter. . Bloeschl. (2). (1) . University of Bologna, . alberto.montanari@unibo.it. (2) . Vienna University of Technology, . bloeschl@hydro.tuwien.ac.at.
uncertainty assessment. in hydrological modelling. Alberto . Montanari. Department DICAM – University of Bologna. alberto.montanari@unibo.it. Premise: the problem is not new..... “. It seems to me that the condition of confidence or otherwise forms a very important part of the prediction, and ought to find expression.
. Decision. . Theory. Wolfgang Spohn. Multi-. disciplinary. . approaches. . to. . reasoning. . with. . imperfect. . information. . and. . knowledge. . – . a . synthesis. . and. a . roadmap.
Jake Blanchard. Spring 2010. Uncertainty Analysis for Engineers. 1. Introduction. Interpretations of Probability. Classical – If an event can occur in N equally likely and different ways, and if n of these have an attribute A, then the probability of the occurrence of A, denoted Pr(A), is defined as n/N.
tunity to of six a total of the of being a winner of the and "the last of the of other think that on each is essentially what we mean "heads on toss" are event occurred of the have no of the a poker i
,..., A is mutually exclusive and exhaustive if each is disjoint from the others, and together they cover all possibilities. If A, ..., A are mutually exclusive and exhaustive, then Pr(B) = Pr(B
Machine Learning. Chapter 1: Introduction. Example. Handwritten Digit Recognition. Polynomial Curve Fitting . Sum-of-Squares Error Function. 0. th. Order Polynomial. 1. st. Order Polynomial. 3. rd.
Hedonism. . Key players and ideas?. B’s Theory of Motivation. W. hat . is it?. Moral Fact. What is it?. Initial . Ideas. Derivation. : How is the value or norm (idea of goodness which will come from it) derived?.
Slide . 2. Probability - Terminology. Events are the . number. of possible outcome of a phenomenon such as the roll of a die or a fillip of a coin.. “trials” are a coin flip or die roll. Slide .
March 23, 2010. Outline. Intro & Definitions. Why learn about probabilities and risk?. What is learned?. Expected Utility. Prospect Theory. Scalar Utility Theory. Choices, choices, choices.... In the lab, reinforcement is often uniform.
Published bypamella-moone
Through this class we will be relying on concepts from probability theory for deriving machine learning algorithms These notes attempt to cover the basics of probability theory at a level appropriate for CS 229 The mathematical theory of probability
Download Pdf - The PPT/PDF document "Review of Probability Theory Arian Malek..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1.1ConditionalprobabilityandindependenceLetBbeaneventwithnon-zeroprobability.TheconditionalprobabilityofanyeventAgivenBisdenedas,P(AjB),P(A\B) P(B)Inotherwords,P(AjB)istheprobabilitymeasureoftheeventAafterobservingtheoccurrenceofeventB.TwoeventsarecalledindependentifandonlyifP(A\B)=P(A)P(B)(orequivalently,P(AjB)=P(A)).Therefore,independenceisequivalenttosayingthatobservingBdoesnothaveanyeffectontheprobabilityofA.2RandomvariablesConsideranexperimentinwhichweip10coins,andwewanttoknowthenumberofcoinsthatcomeupheads.Here,theelementsofthesamplespace are10-lengthsequencesofheadsandtails.Forexample,wemighthavew0=hH;H;T;H;T;H;H;T;T;Ti2 .However,inpractice,weusuallydonotcareabouttheprobabilityofobtaininganyparticularsequenceofheadsandtails.Insteadweusuallycareaboutreal-valuedfunctionsofoutcomes,suchasthenumberofheadsthatappearamongour10tosses,orthelengthofthelongestrunoftails.Thesefunctions,undersometechnicalconditions,areknownasrandomvariables.Moreformally,arandomvariableXisafunctionX: !R.2Typically,wewilldenoterandomvariablesusinguppercaselettersX(!)ormoresimplyX(wherethedependenceontherandomoutcome!isimplied).Wewilldenotethevaluethatarandomvariablemaytakeonusinglowercaselettersx.Example:Inourexperimentabove,supposethatX(!)isthenumberofheadswhichoccurinthesequenceoftosses!.Giventhatonly10coinsaretossed,X(!)cantakeonlyanitenumberofvalues,soitisknownasadiscreterandomvariable.Here,theprobabilityofthesetassociatedwitharandomvariableXtakingonsomespecicvaluekisP(X=k):=P(f!:X(!)=kg):Example:SupposethatX(!)isarandomvariableindicatingtheamountoftimeittakesforaradioactiveparticletodecay.Inthiscase,X(!)takesonainnitenumberofpossiblevalues,soitiscalledacontinuousrandomvariable.WedenotetheprobabilitythatXtakesonavaluebetweentworealconstantsaandb(whereab)asP(aXb):=P(f!:aX(!)bg):2.1CumulativedistributionfunctionsInordertospecifytheprobabilitymeasuresusedwhendealingwithrandomvariables,itisoftenconvenienttospecifyalternativefunctions(CDFs,PDFs,andPMFs)fromwhichtheprobabilitymeasuregoverninganexperimentimmediatelyfollows.Inthissectionandthenexttwosections,wedescribeeachofthesetypesoffunctionsinturn.Acumulativedistributionfunction(CDF)isafunctionFX:R![0;1]whichspeciesaproba-bilitymeasureas,FX(x),P(Xx):(1)ByusingthisfunctiononecancalculatetheprobabilityofanyeventinF.3Figure??showsasampleCDFfunction.Properties: 2Technicallyspeaking,noteveryfunctionisnotacceptableasarandomvariable.Fromameasure-theoreticperspective,randomvariablesmustbeBorel-measurablefunctions.Intuitively,thisrestrictionensuresthatgivenarandomvariableanditsunderlyingoutcomespace,onecanimplicitlydenetheeachoftheeventsoftheeventspaceasbeingsetsofoutcomes!2 forwhichX(!)satisessomeproperty(e.g.,theeventf!:X(!)3g).3Thisisaremarkablefactandisactuallyatheoremthatisprovedinmoreadvancedcourses.2 E[X2]=Z11x2fX(x)dx=Z10x2dx=1 3:Var[X]=E[X2]E[X]2=1 31 4=1 12:Example:Supposethatg(x)=1fx2AgforsomesubsetA .WhatisE[g(X)]?Discretecase:E[g(X)]=Xx2Val(X)1fx2AgPX(x)dx=Xx2APX(x)dx=P(x2A):Continuouscase:E[g(X)]=Z111fx2AgfX(x)dx=Zx2AfX(x)dx=P(x2A):2.6SomecommonrandomvariablesDiscreterandomvariablesXBernoulli(p)(where0p1):oneifacoinwithheadsprobabilitypcomesupheads,zerootherwise.p(x)=pifp=11pifp=0XBinomial(n;p)(where0p1):thenumberofheadsinnindependentipsofacoinwithheadsprobabilityp.p(x)=nxpx(1p)nxXGeometric(p)(wherep0):thenumberofipsofacoinwithheadsprobabilitypuntiltherstheads.p(x)=p(1p)x1XPoisson()(where0):aprobabilitydistributionoverthenonnegativeintegersusedformodelingthefrequencyofrareevents.p(x)=ex x!ContinuousrandomvariablesXUniform(a;b)(whereab):equalprobabilitydensitytoeveryvaluebetweenaandbontherealline.f(x)=(1 baifaxb0otherwiseXExponential()(where0):decayingprobabilitydensityoverthenonnegativereals.f(x)=exifx00otherwiseXNormal(;2):alsoknownastheGaussiandistributionf(x)=1 p 2e1 22(x)25 3.4ConditionaldistributionsConditionaldistributionsseektoanswerthequestion,whatistheprobabilitydistributionoverY,whenweknowthatXmusttakeonacertainvaluex?Inthediscretecase,theconditionalprobabilitymassfunctionofXgivenYissimplypYjX(yjx)=pXY(x;y) pX(x);assumingthatpX(x)6=0.Inthecontinuouscase,thesituationistechnicallyalittlemorecomplicatedbecausetheprobabilitythatacontinuousrandomvariableXtakesonaspecicvaluexisequaltozero4.Ignoringthistechnicalpoint,wesimplydene,byanalogytothediscretecase,theconditionalprobabilitydensityofYgivenX=xtobefYjX(yjx)=fXY(x;y) fX(x);providedfX(x)6=0.3.5Bayes'sruleAusefulformulathatoftenariseswhentryingtoderiveexpressionfortheconditionalprobabilityofonevariablegivenanother,isBayes'srule.InthecaseofdiscreterandomvariablesXandY,PYjX(yjx)=PXY(x;y) PX(x)=PXjY(xjy)PY(y) Py02Val(Y)PXjY(xjy0)PY(y0):IftherandomvariablesXandYarecontinuous,fYjX(yjx)=fXY(x;y) fX(x)=fXjY(xjy)fY(y) R11fXjY(xjy0)fY(y0)dy0:3.6IndependenceTworandomvariablesXandYareindependentifFXY(x;y)=FX(x)FY(y)forallvaluesofxandy.Equivalently,Fordiscreterandomvariables,pXY(x;y)=pX(x)pY(y)forallx2Val(X),y2Val(Y).Fordiscreterandomvariables,pYjX(yjx)=pY(y)wheneverpX(x)6=0forally2Val(Y).Forcontinuousrandomvariables,fXY(x;y)=fX(x)fY(y)forallx;y2R.Forcontinuousrandomvariables,fYjX(yjx)=fY(y)wheneverfX(x)6=0forally2R. 4Togetaroundthis,amorereasonablewaytocalculatetheconditionalCDFis,FYjX(y;x)=limx!0P(YyjxXx+x):ItcanbeeasilyseenthatifF(x;y)isdifferentiableinbothx;ythen,FYjX(y;x)=Zy1fX;Y(x;) fX(x)dandthereforewedenetheconditionalPDFofYgivenX=xinthefollowingway,fYjX(yjx)=fXY(x;y) fX(x)8 4.1BasicpropertiesWecandenethejointdistributionfunctionofX1;X2;:::;Xn,thejointprobabilitydensityfunctionofX1;X2;:::;Xn,themarginalprobabilitydensityfunctionofX1,andthecondi-tionalprobabilitydensityfunctionofX1givenX2;:::;Xn,asFX1;X2;:::;Xn(x1;x2;:::xn)=P(X1x1;X2x2;:::;Xnxn)fX1;X2;:::;Xn(x1;x2;:::xn)=@nFX1;X2;:::;Xn(x1;x2;:::xn) @x1:::@xnfX1(X1)=Z11Z11fX1;X2;:::;Xn(x1;x2;:::xn)dx2:::dxnfX1jX2;:::;Xn(x1jx2;:::xn)=fX1;X2;:::;Xn(x1;x2;:::xn) fX2;:::;Xn(x1;x2;:::xn)TocalculatetheprobabilityofaneventARnwehave,P((x1;x2;:::xn)2A)=Z(x1;x2;:::xn)2AfX1;X2;:::;Xn(x1;x2;:::xn)dx1dx2:::dxn(4)Chainrule:Fromthedenitionofconditionalprobabilitiesformultiplerandomvariables,onecanshowthatf(x1;x2;:::;xn)=f(xnjx1;x2:::;xn1)f(x1;x2:::;xn1)=f(xnjx1;x2:::;xn1)f(xn1jx1;x2:::;xn2)f(x1;x2:::;xn2)=:::=f(x1)nYi=2f(xijx1;:::;xi1):Independence:Formultipleevents,A1;:::;Ak,wesaythatA1;:::;Akaremutuallyindepen-dentifforanysubsetSf1;2;:::;kg,wehaveP(\i2SAi)=Yi2SP(Ai):Likewise,wesaythatrandomvariablesX1;:::;Xnareindependentiff(x1;:::;xn)=f(x1)f(x2)f(xn):Here,thedenitionofmutualindependenceissimplythenaturalgeneralizationofindependenceoftworandomvariablestomultiplerandomvariables.Independentrandomvariablesariseofteninmachinelearningalgorithmswhereweassumethatthetrainingexamplesbelongingtothetrainingsetrepresentindependentsamplesfromsomeunknownprobabilitydistribution.Tomakethesignicanceofindependenceclear,considerabadtrainingsetinwhichwerstsampleasingletrainingexample(x(1);y(1))fromthesomeunknowndistribu-tion,andthenaddm1copiesoftheexactsametrainingexampletothetrainingset.Inthiscase,wehave(withsomeabuseofnotation)P((x(1);y(1));::::(x(m);y(m)))6=mYi=1P(x(i);y(i)):Despitethefactthatthetrainingsethassizem,theexamplesarenotindependent!Whileclearlytheproceduredescribedhereisnotasensiblemethodforbuildingatrainingsetforamachinelearningalgorithm,itturnsoutthatinpractice,non-independenceofsamplesdoescomeupoften,andithastheeffectofreducingtheeffectivesizeofthetrainingset.10 WewritethisasXN(;).Noticethatinthecasen=1,thisreducestheregulardenitionofanormaldistributionwithmeanparameter1andvariance11.Generallyspeaking,Gaussianrandomvariablesareextremelyusefulinmachinelearningandstatis-ticsfortwomainreasons.First,theyareextremelycommonwhenmodelingnoiseinstatisticalalgorithms.Quiteoften,noisecanbeconsideredtobetheaccumulationofalargenumberofsmallindependentrandomperturbationsaffectingthemeasurementprocess;bytheCentralLimitTheo-rem,summationsofindependentrandomvariableswilltendtolookGaussian.Second,Gaussianrandomvariablesareconvenientformanyanalyticalmanipulations,becausemanyoftheintegralsinvolvingGaussiandistributionsthatariseinpracticehavesimpleclosedformsolutions.Wewillencounterthislaterinthecourse.5OtherresourcesAgoodtextbookonprobablityatthelevelneededforCS229isthebook,AFirstCourseonProba-bilitybySheldonRoss.12
© 2021 docslides.com Inc.
All rights reserved.