/
Contents1.Introduction2.BasicsExercises3.Sumsofindependentrandomvariab Contents1.Introduction2.BasicsExercises3.Sumsofindependentrandomvariab

Contents1.Introduction2.BasicsExercises3.Sumsofindependentrandomvariab - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
401 views
Uploaded On 2016-09-27

Contents1.Introduction2.BasicsExercises3.Sumsofindependentrandomvariab - PPT Presentation

2BasicsTomakethesenotesselfcontainedwerstbrieyintroducesomeofthebasicinequalitiesofprobabilitytheoryFirstofallrecallthatforanynonnegativerandomvariableXEXZ10PfXtgdtThisimpliesMarkovsinequal ID: 470641

2BasicsTomakethesenotesself-contained werstbrieyintroducesomeofthebasicinequalitiesofprobabilitytheory.Firstofall recallthatforanynonnegativerandomvariableX EX=Z10PfXtgdt:ThisimpliesMarkov'sinequal

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Contents1.Introduction2.BasicsExercises3..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Contents1.Introduction2.BasicsExercises3.Sumsofindependentrandomvariables3.1Hoeding'sinequality3.2Bernstein'sinequalityExercises4.TheEfron-Steininequality4.1Functionswithboundeddierences4.2Self-boundingfunctions4.3CongurationfunctionsExercises5.Theentropymethod5.1Basicinformationtheory5.2Tensorizationoftheentropy5.3LogarithmicSobolevinequalities5.4Firstexample:boundeddierencesandmore5.5Exponentialinequalitiesforself-boundingfunctions5.6Combinatorialentropies5.7VariationsonthethemeExercises6.Concentrationofmeasure6.1Boundeddierencesinequalityrevisited6.2Convexdistanceinequality6.3ExamplesExercisesReferences1 2BasicsTomakethesenotesself-contained,werstbrieyintroducesomeofthebasicinequalitiesofprobabilitytheory.Firstofall,recallthatforanynonnegativerandomvariableX,EX=Z10PfXtgdt:ThisimpliesMarkov'sinequality:foranynonnegativerandomvariableX,andt�0,PfXtgEX t:IffollowsfromMarkov'sinequalitythatifisastrictlymonotonicallyincreasingnonnegative-valuedfunctionthenforanyrandomvariableXandrealnumbert,PfXtg=Pf(X)(t)gE(X) (t):Anapplicationofthiswith(x)=x2isChebyshev'sinequality:ifXisanarbitraryrandomvariableandt�0,thenPfjX-EXjtg=PjX-EXj2t2 EhjX-EXj2i t2=VarfXg t2:Moregenerallytaking(x)=xq(x0),foranyq�0wehavePfjX-EXjtgE[jX-EXjq] tq:Inspecicexamplesonemaychoosethevalueofqtooptimizetheob-tainedupperbound.Suchmomentboundsoftenprovidewithverysharpestimatesofthetailprobabilities.ArelatedideaisatthebasisofCher-no'sboundingmethod.Taking(x)=esxwheresisanarbitrarypositivenumber,foranyrandomvariableX,andanyt2R,wehavePfXtg=PfesXestgEesX est:3 Theorem2chebyshev'sassociationinequality.Letfandgbenondecreasingreal-valuedfunctionsdenedontherealline.IfXisareal-valuedrandomvariable,thenE[f(X)g(X)]E[f(X)]E[g(X)]j:IffisnonincreasingandgisnondecreasingthenE[f(X)g(X)]E[f(X)]E[g(X)]j:Proof.LettherandomvariableYbedistributedasXandindependentofit.Iffandgarenondecreasing,(f(x)-f(y))(g(x)-g(y))0sothatE[(f(X)-f(Y))(g(X)-g(Y))]0:Expandthisexpectationtoobtaintherstinequality.Theproofofthesecondissimilar.2AnimportantgeneralizationofChebyshev'sassociationinequalityisdescribedasfollows.Areal-valuedfunctionfdenedonRnissaidtobenondecreasing(nonincreasing)ifitisnondecreasing(nonincreasing)ineachvariablewhilekeepingallothervariablesxedatanyvalue.Theorem3(harris'inequality.)Letf;g:Rn!Rbenondecreasingfunctions.LetX1;:::;Xnbeindependentreal-valuedrandomvariablesanddenetherandomvectorX=(X1;:::;Xn)takingvaluesinRn.ThenE[f(X)g(X)]E[f(X)]E[g(X)]:Similarly,iffisnonincreasingandgisnondecreasingthenE[f(X)g(X)]E[f(X)]E[g(X)]:Proof.Again,itsucestoprovetherstinequality.Weproceedbyin-duction.Forn=1thestatementisjustChebyshev'sassociationinequality.Nowsupposethestatementistrueformn.ThenE[f(X)g(X)]=EE[f(X)g(X)jX1;:::;Xn-1]E[E[f(X)jX1;:::;Xn-1]E[g(X)jX1;:::;Xn-1]]5 Exercise4subgaussianmomentsconverse.LetXbearandomvari-ablesuchthatthereexistsaconstantc�0suchthat(E[Xq+])1=qp cqforeverypositiveintegerq.ShowthatXissubgaussian.Moreprecisely,showthatforanys�0,E[esX]p 2e1=6eces2=2:Exercise5subexponentialmoments.WesaythatarandomvariableXhasasubexponentialdistributionifthereexistsaconstantc�0suchthatforall0s1=c,E[esX]1=(1-cs).ShowthatifXissubexponential,thenforeverypositiveintegerq,(E[Xq+])1=q4c eq:Exercise6subexponentialmomentsconverse.LetXbearandomvariablesuchthatthereexistsaconstantc�0suchthat(E[Xq+])1=qcqforeverypositiveintegerq.ShowthatXissubexponential.Moreprecisely,showthatforany0s1=(ec),E[esX]1 1-ces:7 ables.Thereasonisthatsincetheexpectedvalueofaproductofindepen-dentrandomvariablesequalstheproductoftheexpectedvalues,Cherno'sboundbecomesPfSn-ESntge-stE24exp0@snXi=1(Xi-EXi)1A35=e-stnYi=1Ehes(Xi-EXi)i(byindependence).(2)NowtheproblemofndingtightboundscomesdowntondingagoodupperboundforthemomentgeneratingfunctionoftherandomvariablesXi-EXi.Therearemanywaysofdoingthis.ForboundedrandomvariablesperhapsthemostelegantversionisduetoHoeding[38]:Lemma1hoeffding'sinequality.LetXbearandomvariablewithEX=0,aXb.Thenfors�0,EhesXies2(b-a)2=8:Proof.Notethatbyconvexityoftheexponentialfunctionesxx-a b-aesb+b-x b-aesaforaxb.ExploitingEX=0,andintroducingthenotationp=-a=(b-a)wegetEesXb b-aesa-a b-aesb=1-p+pes(b-a)e-ps(b-a)def=e(u);whereu=s(b-a),and(u)=-pu+log(1-p+peu).Butbystraight-forwardcalculationitiseasytoseethatthederivativeofis0(u)=-p+p p+(1-p)e-u;9 3.2Bernstein'sinequalityAssumenowwithoutlossofgeneralitythatEXi=0foralli=1;:::;n.Ourstartingpointisagain(2),thatis,weneedboundsforEhesXii.Intro-duce2i=E[X2i],andFi=1Xr=2sr-2E[Xri] r!2i:Sinceesx=1+sx+P1r=2srxr=r!,wemaywriteEhesXii=1+sE[Xi]+1Xr=2srE[Xri] r!=1+s22iFi(sinceE[Xi]=0.)es22iFi:NowassumethattheXi'sareboundedsuchthatjXijc.Thenforeachr2,E[Xri]cr-22i:Thus,Fi1Xr=2sr-2cr-22i r!2i=1 (sc)21Xr=2(sc)r r!=esc-1-sc (sc)2:Thus,wehaveobtainedEhesXiies22iesc-1-sc (sc)2:Returningto(2)andusingthenotation2=(1=n)P2i,wegetPnXi=1Xi�ten2(esc-1-sc)=c2-st:Nowwearefreetochooses.Theupperboundisminimizedfors=1 clog1+tc n2:Resubstitutingthisvalue,weobtainBennett'sinequality[9]:11 ExercisesExercise7LetX1;:::;Xnbeindependentrandomvariables,takingtheirvaluesfrom[0;1].Denotingm=ESn,showthatforanytm,PfSntgm ttn-m n-tn-t:Hint:ProceedbyCherno'sbounding.Exercise8continuation.UsethepreviousexercisetoshowthatPfSntgm ttet-m;andforall�0,PfSnm(1+)ge-mh();wherehisthefunctiondenedinBennett'sinequality.Finally,PfSnm(1-)ge-m2=2:(see,e.g.,Karp[40],HagerupandRüb[35]).Exercise9ComparetherstboundofthepreviousexercisewiththebestChernoboundforthetailofaPoissonrandomvariable:letYbeaPoisson(m)randomvariable.ShowthatPfYtginf�s0EhesYi est=m ttet-m:UseStirling'sformulatoshowthatPfYtgPfY=tgm ttet-m1 p 2te-1=(12t+1);Exercise10samplingwithoutreplacement.LetXbeanitesetwithNelements,andletX1;:::;Xnbearandomsamplewithoutreplace-mentfromXandY1;:::;YnarandomsamplewithreplacementfromX.Showthatforanyconvexreal-valuedfunctionf,Ef0@nXi=1Xi1AEf0@nXi=1Yi1A:13 4TheEfron-SteininequalityThemainpurposeofthesenotesistoshowhowmanyofthetailinequali-tiesforsumsofindependentrandomvariablescanbeextendedtogeneralfunctionsofindependentrandomvariables.Thesimplest,yetsurprisinglypowerfulinequalityofthiskindisknownastheEfron-Steininequality.Itboundsthevarianceofageneralfunction.Toobtaintailinequalities,onemaysimplyuseChebyshev'sinequality.LetXbesomeset,andletg:Xn!Rbeameasurablefunctionofnvariables.WederiveinequalitiesforthedierencebetweentherandomvariableZ=g(X1;:::;Xn)anditsexpectedvalueEZwhenX1;:::;Xnarearbitraryindependent(notnecessarilyidenticallydistributed!)randomvariablestakingvaluesinX.Themaininequalitiesofthissectionfollowfromthenextsimpleresult.Tosimplifynotation,wewriteEifortheexpectedvaluewithrespecttothevariableXi,thatis,EiZ=E[ZjX1;:::;Xi-1;Xi+1;:::;Xn].Theorem7Var(Z)nXi=1Eh(Z-EiZ)2i:Proof.Theproofisbasedonelementarypropertiesofconditionalex-pectation.RecallthatifXandYarearbitraryboundedrandomvariables,thenE[XY]=E[E[XYjY]]=E[YE[XjY]].IntroducethenotationV=Z-EZ,anddeneVi=E[ZjX1;:::;Xi]-E[ZjX1;:::;Xi-1];i=1;:::;n:Clearly,V=Pni=1Vi.(Thus,Viswrittenasasumofmartingaledier-15 Proof.ThestatementfollowsbyTheorem7simplybyusing(condition-ally)theelementaryfactthatifXandYareindependentandidenticallydistributedrandomvariables,thenVar(X)=(1=2)E[(X-Y)2],andthereforeEih(Z-EiZ)2i=1 2Eih(Z-Z0i)2i:2Remark.ObservethatinthecasewhenZ=Pni=1Xiisasumofindepen-dentrandomvariables(ofnitevariance)thentheinequalityinTheorem8becomesanequality.Thus,theboundintheEfron-Steininequalityis,inasense,notimprovable.Thisexamplealsoshowsthat,amongallfunc-tionsofindependentrandomvariables,sums,insomesense,aretheleastconcentrated.Belowwewillseeotherevidencesforthisextremalpropertyofsums.AnotherusefulcorollaryofTheorem7isobtainedbyrecallingthat,foranyrandomvariableX,Var(X)E[(X-a)2]foranyconstanta2R.Usingthisfactconditionally,wehave,foreveryi=1;:::;n,Eih(Z-EiZ)2iEih(Z-Zi)2iwhereZi=gi(X1;:::;Xi-1;Xi+1;:::;Xn)forarbitrarymeasurablefunctionsgi:Xn-1!Rofn-1variables.TakingexpectedvaluesandusingTheorem7wehavethefollowing.Theorem9Var(Z)nXi=1Eh(Z-Zi)2i:InthenexttwosectionswespecializetheEfron-SteininequalityanditsvariantTheorem9tofunctionswhichsatisfysomesimpleeasy-to-verifyproperties.4.1FunctionswithboundeddierencesWesaythatafunctiong:Xn!Rhastheboundeddierencespropertyifforsomenonnegativeconstantsc1;:::;cn,supx1;:::;xn;x0i2Xjg(x1;:::;xn)-g(x1;:::;xi-1;x0i;xi+1;:::;xn)jci;1in:17 ThebehaviorofEZhasbeeninvestigatedinmanypapers.ItisknownthatE[Z]=nconvergestosomenumber ,whosevalueisunknown.Itisconjecturedtobe2=(1+p 2),anditisknowntofallbetween0:75796and0:83763.HereweareconcernedwiththeconcentrationofZ.Amo-ment'sthoughtrevealsthatchangingonebitcan'tchangethelengthofthelongestcommonsubsequencebymorethanone,soZsatisestheboundeddierencespropertywithci=1.Consequently,VarfZgn;(seeSteele[73]).Thus,byChebyshev'sinequality,withlargeprobability,Ziswithinaconstanttimesp nofitsexpectedvalue.Inotherwords,itisstronglyconcentratedaroundthemean,whichmeansthatresultsaboutEZreallytellusaboutthebehaviorofthelongestcommonsubsequenceoftworandomstrings.Example.uniformdeviations.Oneofthecentralquantitiesofsta-tisticallearningtheoryandempiricalprocesstheoryisthefollowing:letX1;:::;Xnbei.i.d.randomvariablestakingtheirvaluesinsomesetX,andletAbeacollectionofsubsetsofX.LetdenotethedistributionofX1,thatis,(A)=PfX12Ag,andletndenotetheempiricaldistribution:n(A)=1 nnXi=11fXi2Ag:ThequantityofinterestisZ=supA2Ajn(A)-(A)j:Iflimn!1EZ=0foreverydistributionoftheXi's,thenAiscalledauniformGlivenko-Cantelliclass,andVapnikandChervonenkis[81]gaveabeautifulcombinatorialcharacterizationofsuchclasses.ButregardlessofwhatAis,bychangingoneXi,Zcanchangebyatmost1=n,soregardlessofthebehaviorofEZ,wealwayshaveVar(Z)1 2n:19 Example.firstpassagetimeinorientedpercolation.ConsideradirectedgraphsuchthataweightXiisassignedtoeachedgeeisuchthattheXiarenonnegativeindependentrandomvariableswithsecondmomentEX2i=2.Letv1andv2bexedverticesofthegraph.Weareinterestedinthetotalweightofthepathfromv1tov2withminimumweight.Thus,Z=minPXei2PXiwheretheminimumistakenoverallpathsPfromv1tov2.DenotetheoptimalpathbyP.ByreplacingXiwithX0i,thetotalminimumweightcanonlyincreaseiftheedgeeiisonP,andtherefore(Zi-Z0i)21Z0i�Z(X0i-Xi)21ei2PX0i21ei2P:Thus,Var(Z)EXiX0i21ei2P=2EXi1ei2P2LwhereListhelengthofthelongestpathbetweenv1andv2.Example.minimumoftheempiricalloss.Concentrationinequalitieshavebeenusedasakeytoolinrecentdevelopmentsofmodelselectionmethodsinstatisticallearningtheory.ForthebackgroundwerefertothetherecentworkofKoltchinskiiPanchenko[42],Massart[57],Bartlett,Boucheron,andLugosi[5],LugosiandWegkamp[51],Bousquet[17].LetFdenoteaclassoff0;1g-valuedfunctionsonsomespaceX.ForsimplicityoftheexpositionweassumethatFisnite.Theresultsremaintrueforgeneralclassesaslongasthemeasurabilityissuesaretakencareof.Givenani.i.d.sampleDn=(hXi;Yii)inofnpairsofrandomvariableshXi;YiitakingvaluesinXf0;1g,foreachf2FwedenetheempiricallossLn(f)=1 nnXi=1`(f(Xi);Yi)wherethelossfunction`isdenedonf0;1g2by`(y;y0)=1y6=y0:21 Thisisasignicantimprovementoverthebound1=(2n)wheneverEL(f)ismuchsmallerthan1=2.Thisisveryoftenthecase.Forexample,wehaveL(f)=bL-(Ln(f)-L(f))Z n+supf2F(L(f)-Ln(f))sothatweobtainVar(bL)EbL n+Esupf2F(L(f)-Ln(f)) n:Inmostcasesofinterest,Esupf2F(L(f)-Ln(f))maybeboundedbyaconstant(dependingonF)timesn-1=2(see,e.g.,Lugosi[50])andthenthesecondtermontheright-handsideisoftheorderofn-3=2.ForexponentialconcentrationinequalitiesforbLwerefertoBoucheron,Lugosi,andMassart[15].Example.kerneldensityestimation.LetX1;:::;Xnbei.i.d.samplesdrawnaccordingtosome(unknown)densityfontherealline.Thedensityisestimatedbythekernelestimatefn(x)=1 nhnXi=1K x-Xi h!;whereh�0isasmoothingparameter,andKisanonnegativefunctionwithRK=1.TheperformanceoftheestimateismeasuredbytheL1errorZ=g(X1;:::;Xn)=Zjf(x)-fn(x)jdx:Itiseasytoseethatjg(x1;:::;xn)-g(x1;:::;x0i;:::;xn)j1 nhZ Kx-xi h-K x-x0i h! dx2 n;sowithoutfurtherworkwegetVar(Z)2 n:23 Nextwementionsomeapplicationsofthissimplecorollary.ItturnsoutthatinmanycasestheobtainedboundisasignicantimprovementoverwhatwewouldobtainbyusingsimplyCorollary1.Remark.relativestability.BoundingthevarianceofZbyitsex-pectedvalueimplies,inmanycases,therelativestabilityofZ.Ase-quenceofnonnegativerandomvariables(Zn)issaidtoberelativelystableifZn=EZn!1inprobability.ThispropertyguaranteesthattherandomuctuationsofZnarounditsexpectationareofnegligiblesizewhencom-paredtotheexpectation,andthereforemostinformationaboutthesizeofZnisgivenbyEZn.IfZnhastheself-boundingproperty,then,byChebyshev'sinequality,forall�0,P Zn EZn-1 � Var(Zn) 2(EZn)21 2EZn:Thus,forrelativestability,itsucestohaveEZn!1.Example.empiricalprocesses.Atypicalexampleofself-boundingfunctionsisthesupremumofnonnegativeempiricalprocesses.LetFbeaclassoffunctionstakingvaluesintheinterval[0;1]andconsiderZ=g(X1;:::;Xn)=supf2FPnj=1f(Xj).(Aspecialcaseofthisismen-tionedaboveintheexampleofuniformdeviations.)Deninggi=g0fori=1;:::;nwithg0(x1;:::;xn-1)=supf2FPn-1j=1f(Xj)(sothatZi=supf2FPnj=1j6=if(Xj))andlettingf2FbeafunctionforwhichZ=Pnj=1f(Xj),oneobviouslyhas0Z-Zif(Xi)1andthereforenXi=1(Z-Zi)nXi=1f(Xi)=Z:(Herewehaveassumedthatthesupremumisalwaysachieved.Themod-icationoftheargumentforthegeneralcaseisstraightforward.)Thus,byCorollary2weobtainVar(Z)EZ.NotethatCorollary1impliesVar(Z)n=2.InsomeimportantapplicationsEZmaybesignicantlysmallerthann=2andtheimprovementisessential.25 AssumethatwehaveapropertyPdenedovertheunionofniteproductsofasetX,thatis,asequenceofsetsP1X;P2XX;:::;PnXn.Wesaythat(x1;:::xm)2XmsatisesthepropertyPif(x1;:::xm)2Pm.WeassumethatPishereditaryinthesensethatif(x1;:::xm)satisesPthensodoesanysubsequence(xi1;:::xik)of(x1;:::xm).Thefunctiongnthatmapsanytuple(x1;:::xn)tothesizeofalargestsubsequencesatisfyingPisthecongurationfunctionassociatedwithpropertyP.Corollary2impliesthefollowingresult:Corollary3Letgnbeacongurationfunction,andletZ=gn(X1;:::;Xn),whereX1;:::;Xnareindependentrandomvariables.ThenVar(Z)EZ:Proof.ByCorollary2itsucestoshowthatanycongurationfunctionisselfbounding.LetZi=gn-1(X1;:::;Xi-1;Xi+1;:::;Xn).Thecondition0Z-Zi1istriviallysatised.Ontheotherhand,assumethatZ=kandletfXi1;:::;XikgfX1;:::;Xngbeasubsequenceofcardinalityksuchthatfk(Xi1;:::;Xik)=k.(Notethatbythedenitionofacongurationfunctionsuchasubsequenceexists.)Clearly,iftheindexiissuchthati=2fi1;:::;ikgthenZ=Zi,andthereforenXi=1(Z-Zi)Zisalsosatised,whichconcludestheproof.2Toillustratethefactthatcongurationfunctionsappearrathernatu-rallyinvariousapplications,wedescribesomeexamplesoriginatingfromdierentelds.Example.numberofdistinctvaluesinadiscretesample.LetX1;:::;Xnbeindependent,identicallydistributedrandomvariablestakingtheirvaluesonthesetofpositiveintegerssuchthatPfX1=kg=pk,andletZdenotethenumberofdistinctvaluestakenbythesenrandomvariables.ThenwemaywriteZ=nXi=11fXi6=X1;:::;Xi6=Xi-1g;27 Example.increasingsubsequences.Consideravectorxn1=(x1;:::;xn)ofndierentnumbersin[0;1].Thepositiveintegersi1i2imformanincreasingsubsequenceifxi1xi2xim(wherei11andimn).LetL(xn1)denotethelengthofalongestincreasingsubse-quence.gn(xn1)=L(xn1)isaclearlyacongurationfunction(associatedwiththeincreasingsequenceproperty),andthereforeifX1;:::;Xnareindependentrandomvariablessuchthattheyaredierentwithprobabilityone(itsucesifeveryXihasanabsolutelycontinuousdistribution)thenVar(L(Xn1))EL(Xn1).IftheXi'sareuniformlydistributedin[0;1]thenitisknownthatEL(Xn1)2p n,seeLoganandShepp[48],Groeneboom[34].Theobtainedboundforthevarianceisapparentlyloose.AdicultresultofBaik,Deift,andJohansson[4]impliesthatVar(L(Xn1))=O(n1=3).ForearlyworkontheconcentrationonL(X)werefertoFrieze[33],BollobásandBrightwell[13],andTalagrand[75].ExercisesExercise11AssumethattherandomvariablesX1;:::;Xnareindepen-dentandbinary{0,1}-valuedwithPfXi=1g=piandthatghastheboundeddierencespropertywithconstantsc1;:::;cn.ShowthatVar(Z)nXi=1c2ipi(1-pi):Exercise12CompletetheproofofthefactthattheconditionalRademacheraveragehastheself-boundingproperty.Exercise13Considertheexampleofthenumberofdistinctvaluesinadiscretesampledescribedinthetext.ShowthatE[Z]=n!0asn!1.CalculateexplicitelyVar(Z)andcompareitwiththeupperboundobtainedbyTheorem9.Exercise14LetZbethenumberoftrianglesinarandomgraphG(n;p).CalculatethevarianceofZandcompareitwithwhatyougetbyusingtheEfron-Steininequalitytoestimateit.(IntheG(n;p)modelforrandomgraphs,therandomgraphG=(V;E)withvertexsetV(jVj=n)andedge29 5TheentropymethodIntheprevioussectionwesawthattheEfron-Steininequalityservesasapowerfultoolforboundingthevarianceofgeneralfunctionsofindepen-dentrandomvariables.Then,viaChebyshev'sinequality,onemayeasilyboundthetailprobabilitiesofsuchfunctions.However,justasinthecaseofsumsofindependentrandomvariables,tailboundsbasedoninequalitiesforthevarianceareoftennotsatisfactory,andessentialimprovementsarepossible.Thepurposeofthissectionistopresentamethodologywhichallowsonetoobtainexponentialtailinequalitiesinmanycases.Thepur-suitofsuchinequalitieshasbeenanimportanttopicsinprobabilitytheoryinthelastfewdecades.Originally,martingalemethodsdominatedthere-search(see,e.g.,McDiarmid[58],[59],RheeandTalagrand[67],ShamirandSpencer[70])butindependentlyinformation-theoreticmethodswerealsousedwithsuccess(seeAlhswede,Gács,andKörner[1],Marton[52],[53],[54],Dembo[24],Massart[55],Rio[68],andSamson[69]).Talagrand'sinductionmethod[77],[75],[76]causedanimportantbreakthroughbothinthetheoryandapplicationsofexponentialconcentrationinequalities.Inthissectionwefocusonso-calledentropymethod,basedonlogarithmicSobolevinequalitiesdevelopedbyLedoux[45],[44],seealsoBobkovandLedoux[12],Massart[56],Rio[68],Boucheron,Lugosi,andMassart[14],[15],andBousquet[16].Thismethodmakesitpossibletoderiveexponen-tialanaloguesoftheEfron-Steininequalityperhapsthesimplestway.Themethodisbasedonanappropriatemodicationofthetensoriza-tioninequalityTheorem7.Inordertoprovethismodication,weneedtorecallsomeofthebasicnotionsofinformationtheory.Tokeepthematerialatanelementarylevel,weprovethemodiedtensorizationinequalityfordiscreterandomvariablesonly.Theextensiontoarbitrarydistributionsisstraightforward.5.1BasicinformationtheoryInthissectionwesummarizesomebasicpropertiesoftheentropyofadiscrete-valuedrandomvariable.Foragoodintroductorybookoninfor-mationtheorywerefertoCoverandThomas[21].31 sothattherelativeentropyisalwaysnonnegative,andequalszeroifandonlyifP=Q.Thissimplefacthassomeinterestingconsequences.Forexample,ifXisanitesetwithNelementsandXisarandomvariablewithdistributionPandwetakeQtobetheuniformdistributionoverXthenD(PkQ)=logN-H(X)andthereforetheentropyofXneverexceedsthelogarithmofthecardinalityofitsrange.ConsiderapairofrandomvariablesX;YwithjointdistributionPX;YandmarginaldistributionsPXandPY.NotingthatD(PX;YkPXPY)=H(X)-H(XjY),thenonnegativityoftherelativeentropyimpliesthatH(X)H(XjY),thatis,conditioningreducesentropy.Itissimilarlyeasytoseethatthisfactremainstrueforconditionalentropiesaswell,thatis,H(XjY)H(XjY;Z):NowwemayprovethefollowinginequalityofHan[37]Theorem10han'sinequality.LetX1;:::;Xnbediscreterandomvariables.ThenH(X1;:::;Xn)1 n-1nXi=1H(X1;:::;Xi-1;Xi+1;:::;Xn)Proof.Foranyi=1;:::;n,bythedenitionoftheconditionalentropyandthefactthatconditioningreducesentropy,H(X1;:::;Xn)=H(X1;:::;Xi-1;Xi+1;:::;Xn)+H(XijX1;:::;Xi-1;Xi+1;:::;Xn)H(X1;:::;Xi-1;Xi+1;:::;Xn)+H(XijX1;:::;Xi-1)i=1;:::;n:Summingtheseninequalitiesandusingthechainruleforentropy,wegetnH(X1;:::;Xn)nXi=1H(X1;:::;Xi-1;Xi+1;:::;Xn)+H(X1;:::;Xn)whichiswhatwewantedtoprove.2WenishthissectionbyaninequalitywhichmayberegardedasaversionofHan'sinequalityforrelativeentropies.Asitwaspointedoutby33 andD(Q(i)kP(i))=Xx(i)2Xn-1Q(i)(x(i))logQ(i)(x(i))-Q(i)(x(i))logP(i)(x(i));itsucestoshowthatXxn12XnQ(xn1)logP(xn1)=1 n-1nXi=1Xx(i)2Xn-1Q(i)(x(i))logP(i)(x(i)):ThismaybeseeneasilybynotingthatbytheproductpropertyofP,wehaveP(xn1)=P(i)(x(i))Pi(xi)foralli,andalsoP(xn1)=Qni=1Pi(xi),andthereforeXxn12XnQ(xn1)logP(xn1)=1 nnXi=1Xxn12XnQ(xn1)logP(i)(x(i))+logPi(xi)=1 nnXi=1Xxn12XnQ(xn1)logP(i)(x(i))+1 nXxn12XnQ(xn1)logP(xn1):Rearranging,weobtainXxn12XnQ(xn1)logP(xn1)=1 n-1nXi=1Xxn12XnQ(xn1)logP(i)(x(i))=1 n-1nXi=1Xx(i)2Xn-1Q(i)(x(i))logP(i)(x(i))whereweusedthedeningpropertyofQ(i).25.2TensorizationoftheentropyWearenowpreparedtoprovethemainexponentialconcentrationin-equalitiesofthesenotes.JustasinSection4,weletX1;:::;Xnbein-dependentrandomvariables,andinvestigateconcentrationpropertiesofZ=g(X1;:::;Xn).ThebasisofLedoux'sentropymethodisapowerfulextensionofTheorem7.NotethatTheorem7mayberewrittenasVar(Z)nXi=1EhEi(Z2)-(Ei(Z))2i35 andthestatementfollows.2ThemainideainLedoux'sentropymethodforprovingconcentrationinequalitiesistoapplyTheorem12tothepositiverandomvariableY=esZ.Then,denotingthemomentgeneratingfunctionofZbyF(s)=E[esZ],theleft-handsideoftheinequalityinTheorem12becomessEhZesZi-EhesZilogEhesZi=sF0(s)-F(s)logF(s):Ourstrategy,thenistoderiveupperboundsforthederivativeofF(s)andderivetailboundsviaCherno'sbounding.Todothisinaconvenientway,weneedsomefurtherboundsfortheright-handsideoftheinequalityinTheorem12.Thisisthepurposeofthenextsection.5.3LogarithmicSobolevinequalitiesRecallfromSection4thatwedenoteZi=gi(X1;:::;Xi-1;Xi+1;:::;Xn)wheregiissomefunctionoverXn-1.Belowwefurtherdeveloptheright-handsideofTheorem12toobtainimportantinequalitieswhichserveasthebasisinderivingexponentialconcentrationinequalities.Theseinequal-itiesarecloselyrelatedtotheso-calledlogarithmicSobolevinequalitiesofanalysis,seeLedoux[45,46,47],Massart[56].Firstweneedthefollowingtechnicallemma:Lemma2LetYdenoteapositiverandomvariable.Thenforanyu�0,E[YlogY]-(EY)log(EY)E[YlogY-Ylogu-(Y-u)]:Proof.Asforanyx�0,logxx-1,wehavelogu EYu EY-1;henceEYlogu EYu-EYwhichisequivalenttothestatement.237 Proof.TherstinequalityisprovedexactlyasTheorem13,justbynotingthat,justlikeZi,Z0iisalsoindependentofXi.Toprovethesecondandthirdinequalities,writeesZ (-s(Z-Z0i))=esZ (-s(Z-Z0i))1�ZZ0i+esZ (s(Z0i-Z))1Z0i:Bysymmetry,theconditionalexpectationofthesecondtermmaybewrit-tenasEihesZ (s(Z0i-Z))1Z0ii=EihesZ0i (s(Z-Z0i))1&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;ZZ0ii=EihesZe-s(Z-Z0i) (s(Z-Z0i))1&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;ZZ0ii:Summarizing,wehaveEihesZ (-s(Z-Z0i))i=Eih (-s(Z-Z0i))+e-s(Z-Z0i) (s(Z-Z0i))esZ1&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;ZZ0ii:Thesecondinequalityofthetheoremfollowssimplybynotingthat (x)+ex (-x)=x(ex-1)=(x).Thelastinequalityfollowssimilarly.25.4Firstexample:boundeddierencesandmoreThepurposeofthissectionistoillustratehowthelogarithmicSobolevinequalitiesshownintheprevioussectionmaybeusedtoobtainpowerfulexponentialconcentrationinequalities.Therstresultisrathereasytoobtain,yetitturnsouttobeveryuseful.Also,itsproofisprototypicalinthesensethatitshows,inatransparentway,themainideas.Theorem15AssumethatthereexistsapositiveconstantCsuchthat,almostsurely,nXi=1(Z-Z0i)21&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;ZZ0iC:Thenforallt&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;0,P[Z-EZ&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;t]e-t2=4C:39 Corollary4boundeddifferencesinequality.Assumethefunctiongsatisestheboundeddierencesassumptionwithconstantsc1;:::;cn,thenP[jZ-EZj�t]2e-t2=4CwhereC=Pni=1c2i.Weremarkherethattheconstantappearinginthiscorollarymaybeimproved.Indeed,usingthemartingalemethod,McDiarmid[58]showedthatundertheconditionsofCorollary4,P[jZ-EZj�t]2e-2t2=C(seetheexercises).Thus,wehavebeenabletoextendCorollary1toanexponentialconcentrationinequality.NotethatbycombiningthevarianceboundofCorollary1withChebyshev'sinequality,weonlyobtainedP[jZ-EZj�t]C 2t2andthereforetheimprovementisessential.ThustheapplicationsofCorol-lary1inalltheexamplesshowninSection4.1arenowimprovedinanessentialwaywithoutfurtherwork.Example.hoeffding'sinequalityinhilbertspace.Asasimpleillustrationofthepoweroftheboundeddierencesinequality,wederiveaHoeding-typeinequalityforsumsofrandomvariablestakingvaluesinaHilbertspace.Inparticular,weshowthatifX1;:::;Xnareindependentzero-meanrandomvariablestakingvaluesinaseparableHilbertspacesuchthatkXikci=2withprobabilityone,thenforallt2p C,P24 nXi=1Xi �t35e-t2=2CwhereC=Pni=1c2i.Thisfollowssimplybyobservingthat,bythetriangleinequality,Z=kPni=1Xiksatisestheboundeddierencespropertywith41 TouseTheorem15,considerthesymmetricmatrixA0i;jobtainedbyreplac-ingXi;jinAbytheindependentcopyX0i;j,whilekeepingallothervariablesxed.LetZ0i;jdenotethelargesteigenvalueoftheobtainedmatrix.Thenbytheabove-mentionedpropertyofthelargesteigenvalue,(Z-Z0i;j)1�ZZ0i;jvTAv-vTA0i;jv1�ZZ0i;j=vT(A-A0i;jv1�ZZ0i;j=vivj(Xi;j-X0i;j)+2jvivjj:Therefore,X1ijn(Z-Z0i;j)21�ZZ0i;jX1ijn4jvivjj240@nXi=1v2i1A2=4:TheresultnowfollowsfromTheorem15.NotethatbytheEfron-Steinin-equalitywealsohaveVar(Z)4.Asimilarexponentialinequality,thoughwithasomewhatworstconstantintheexponent,canalsobederivedforthelowertail.Inparticular,Theorem20belowimplies,fort�0,P[ZEZ-t]e-t2=16(e-1):Alsonoticethatthesameproofworksforthesmallesteigenvalueaswell.Alon,Krivelevich,andVu[2]show,withasimpleextensionoftheargu-ment,thatifZisthek-thlargest(ork-thsmallest)eigenvaluethentheupperboundsbecomese-t2=(16k2),thoughitisnotclearwhetherthefactork-2intheexponentisnecessary.5.5Exponentialinequalitiesforself-boundingfunctionsInthissectionweproveexponentialconcentrationinequalitiesforself-boundingfunctionsdiscussedinSection4.2.RecallthatavariantoftheEfron-Steininequality(Theorem2)impliesthatforself-boundingfunctionsVar(Z)E(Z).BasedonthelogarithmicSobolevinequalityofTheorem13wemaynowobtainexponentialconcentrationbounds.ThetheoremappearsinBoucheron,Lugosi,andMassart[14]andbuildsontechniquesdevelopedbyMassart[56].43 IntroduceeZ=Z-E[Z]anddene,foranys,F(s)=EheseZi.Thentheinequalityabovebecomes[s- (-s)]F0(s) F(s)-logF(s)EZ (-s);which,writingG(s)=logF(s),implies(1-e-s)G0(s)-G(s)EZ (-s):NowobservethatthefunctionG0=EZ isasolutionoftheordinarydierentialequation(1-e-s)G0(s)-G(s)=EZ (-s).WewanttoshowthatGG0.Infact,ifG1=G-G0,then(1-e-s)G01(s)-G1(s)0:(3)Hence,deningG(s)=G1(s)=(es-1),wehave(1-e-s)(es-1)G0(s)0:HenceG0isnon-positiveandthereforeGisnon-increasing.Now,sinceeZiscenteredG01(0)=0.Usingthefactthats(es-1)-1tendsto1assgoesto0,weconcludethatG(s)tendsto0assgoesto0.ThisshowsthatGisnon-positiveon(0;1)andnon-negativeover(-1;0),henceG1iseverywherenon-positive,thereforeGG0andwehaveprovedtherstinequalityofthetheorem.TheproofofinequalitiesforthetailprobabilitiesmaybecompletedbyCherno'sbounding:P[Z-E[Z]t]exp"-sup�s0(ts-EZ (s))#andP[Z-E[Z]-t]exp"-sups(-ts-EZ (s))#:Theproofisnowcompletedbyusingtheeasy-to-check(andwell-known)relationssup�]TJ;&#x/F70;&#x 11.;镒&#x Tf ;.4;ʒ ; .51;3 T; [00;s0[ts-EZ (s)]=EZh(t=EZ)fort�]TJ;&#x/F70;&#x 11.;镒&#x Tf ;.4;ʒ ; .51;3 T; [00;0sups[-ts-EZ (s)]=EZh(-t=EZ)for0tEZ.245 whereH(Y1;:::;Yn)isthe(joint)entropyofY1;:::;Yn.Sincetheuniformdistributionmaximizestheentropy,wealsohave,forallin,thath0(x(i))1 ln2H(Y1;:::;Yi-1;Yi+1;:::;Yn):SincebyHan'sinequalityH(Y1;:::;Yn)1 n-1nXi=1H(Y1;:::;Yi-1;Yi+1;:::;Yn)j;wehavenXi=1h(xn1)-h0(x(i))h(xn1)asdesired.2Theabovelemma,togetherwithTheorems2and15immediatelyimplythefollowing:Corollary5LetX1;:::;XnbeindependentrandomvariablestakingtheirvaluesinXandletZ=h(Xn1)denotetherandomvcentropy.ThenVar(Z)E[Z],fort�0P[ZEZ+t]exp"-t2 2EZ+2t=3#;andforevery0tEZ,P[ZEZ-t]exp"-t2 2EZ#:Moreover,fortherandomshattercoecientT(Xn1),wehaveElog2T(Xn1)log2ET(Xn1)log2eElog2T(Xn1):Notethattheleft-handsideofthelaststatementfollowsfromJensen'sinequality,whiletheright-handsidebytakings=ln2intherstinequalityofTheorem16.ThislaststatementshowsthattheexpectedvcentropyElog2T(Xn1)andtheannealedvcentropyaretightlyconnected,regardless47 Moreover,E[logbjtr(Xn1)j]logbE[jtr(Xn1)j]b-1 logbE[logbjtr(Xn1)j]:Example.increasingsubsequences.RecallthesetupoftheexampleofincreasingsubsequencesofSection4.2,andletN(xn1)denotethenumberofdierentincreasingsubsequencesofxn1.Observethatlog2N(xn1)isacombinatorialentropy.ThisiseasytoseebyconsideringY=f0;1g,andbyassigning,toeachincreasingsubsequencei1i2imofxn1,abinaryn-vectoryn1=(y1;:::;yn)suchthatyj=1ifandonlyifj=ikforsomek=1;:::;m(i.e.,theindicesappearingintheincreasingsequencearemarkedby1).NowtheconditionsofTheorem17areobviouslymet,andthereforeZ=log2N(Xn1)satisesallthreeinequalitiesofTheorem17.ThisresultsignicantlyimprovesaconcentrationinequalityobtainedbyFrieze[33]forlog2N(Xn1).5.7VariationsonthethemeInthissectionweshowhowthetechniquesoftheentropymethodforprovingconcentrationinequalitiesmaybeusedinvarioussituationsnotconsideredsofar.TheversionsdierintheassumptionsonhowPni=1(Z-Z0i)2iscontrolledbydierentfunctionsofZ.ForvariousotherversionswithapplicationswerefertoBoucheron,Lugosi,andMassart[15].Inallcasestheupperboundisroughlyoftheforme-t2=2where2isthecorrespondingEfron-SteinupperboundonVar(Z).TherstinequalitymayberegardedasageneralizationoftheuppertailinequalityinTheorem16.Theorem18AssumethatthereexistpositiveconstantsaandbsuchthatnXi=1(Z-Z0i)21&#x-396;ZZ0iaZ+b:Thenfors2(0;1=a),logE[exp(s(Z-E[Z]))]s2 1-as(aEZ+b)49 Theorem19Assumethatforsomenondecreasingfunctiong,nXi=1(Z-Z0i)21Z0ig(Z):Thenforallt&#xZ]TJ;&#x/F69;&#x 5.9;ݶ ;&#xTf 1;.80;G 2;&#x.922; Td;&#x[000;0,P[ZEZ-t]exp -t2 4E[g(Z)]!:Proof.Toprovelower-tailinequalitiesweobtainupperboundsforF(s)=E[exp(sZ)]withs0.BythethirdinequalityofTheorem14,sEhZesZi-EhesZilogEhesZinXi=1EhesZ(s(Z0i-Z))1Z0iinXi=1EhesZs2(Z0i-Z)21Z0ii(usings0andthat(-x)x2forx&#x-278;0)=s2E24esZnXi=1(Z-Z0i)21Z0i35s2EhesZg(Z)i:Sinceg(Z)isanondecreasingandesZisadecreasingfunctionofZ,Cheby-shev'sassociationinequality(Theorem2)impliesthatEhesZg(Z)iEhesZiE[g(Z)]:Thus,dividingbothsidesoftheobtainedinequalitybys2F(s)andwritingH(s)=(1=s)logF(s),weobtainH0(s)E[g(Z)]:integratingtheinequalityintheinterval[s;0)weobtainF(s)exp(s2E[g(Z)]+sE[Z]):51 ExercisesExercise15RelaxtheconditionofTheorem15inthefollowingway.ShowthatifE24nXi=1(Z-Z0i)21�ZZ0i Xn135cthenforallt�0,P[Z�EZ+t]e-t2=4candifE24nXi=1(Z-Z0i)21Z0i�Z Xn135c;thenP[ZEZ-t]e-t2=4c:Exercise16mcdiarmid'sboundeddifferencesinequality.ProvethatundertheconditionsofCorollary4,thefollowingimprovementholds:P[jZ-EZj&#x]TJ/;ྐ ;.9;Ւ ;&#xTf 2;.65; 0;&#x Td[;t]2e-2t2=C(McDiarmid[58]).Hint:WriteZasasumofmartingaledierencesasintheproofofTheorem7.UseCherno'sboundingandproceedasintheproofofHoeding'sinequality,notingthattheargumentworksforsumsofmartingaledierences.Exercise17LetCandadenotetwopositiverealnumbersanddenoteh1(x)=1+x-p 1+2x.Showthatsup2[0;1=a) t-C2 1-a!=2C a2h1at 2Ct2 22C+atandthatthesupremumisattainedat=1 a 1-1+at C-1=2!:Also,sup2[0;1) t-C2 1+a!=2C a2h1-at 2Ct2 4C53 6ConcentrationofmeasureInthissectionweaddresstheisoperimetricapproachtoconcentrationin-equalities,promotedanddeveloped,inlargepart,byTalagrand[75,76,77].Firstwegiveanequivalentformulationoftheboundeddierencesinequal-ity(Corollary4)whichshowsthatanynottoosmallsetinaproductprob-abilityspacehasthepropertythattheprobabilityofthosepointswhoseHammingdistancefromthesetismuchlargerthanp nisexponentiallysmall.Then,usingthefullpowerofTheorem15,weprovideasignicantimprovementofthisconcentration-of-measureresult,knownasTalagrand'sconvexdistanceinequality.6.1BoundeddierencesinequalityrevisitedConsiderindependentrandomvariablesX1;:::;Xntakingtheirvaluesina(measurable)setXanddenotethevectorofthesevariablesbyXn1=(X1;:::;Xn)takingitsvalueinXn.LetAXnbeanarbitrary(measurable)setandwriteP[A]=P[Xn12A].TheHammingdistanced(xn1;yn1)betweenthevectorsxn1;yn12Xnisdenedasthenumberofcoordinatesinwhichxn1andyn1dier.Introduced(xn1;A)=minyn12Ad(xn1;yn1);theHammingdistancebetweenthesetAandthepointxn1.Thebasicresultisthefollowing:Theorem21Foranyt�0,P"d(Xn1;A)t+s n 2log1 P[A]#e-2t2=n:Observethatontheright-handsidewehavethemeasureofthecom-plementofthet-blowupofthesetA,thatis,themeasureofthesetofpointswhoseHammingdistancefromAisatleastt.Ifweconsideraset,say,withP[A]=1=106,weseesomethingverysurprising:themeasureofthesetofpointswhoseHammingdistancetoAismorethan10p nis55 6.2ConvexdistanceinequalityInaremarkableseriesofpapers(see[77],[75],[76]),Talagranddevelopedaninductionmethodtoprovepowerfulconcentrationresultsinmanycaseswhentheboundeddierencesinequalityfails.Perhapsthemostwidelyusedoftheseistheso-calledconvex-distanceinequality,seealsoSteele[74],McDiarmid[59]forsurveyswithseveralinterestingapplications.HereweuseTheorem15toderiveaversionoftheconvexdistanceinequality.ForseveralextensionsandvariationswerefertoTalagrand[77],[75],[76].TounderstandTalagrand'sinequality,weborrowasimpleargumentfrom[59].FirstobservethatTheorem21maybeeasilygeneralizedbyallowingthedistanceofthepointXn1fromthesetAtobemeasuredbyaweightedHammingdistanced (xn1;A)=infyn12Ad (xn1;yn1)=infyn12AXi:xi6=yij ijwhere =( 1;:::; n)isavectorofnonnegativenumbers.RepeatingtheargumentoftheproofofTheorem21,weobtain,forall ,P24d (Xn1;A)t+vuut k k2 2log1 P[A]35e-2t2=k k2;wherek k=q Pni=1 2idenotestheeuclideannormof .Thus,forexam-ple,forallvectors withunitnormk k=1,P"d (Xn1;A)t+s 1 2log1 P[A]#e-2t2:Thus,denotingu=q 1 2log1 P[A],foranytu,P[d (Xn1;A)t]e-2(t-u)2:Ontheonehand,iftq -2logP[A],thenP[A]e-t2=2.Ontheotherhand,since(t-u)2t2=4fort2u,foranytq 2log1 P[A]theinequalityaboveimpliesP[d (Xn1;A)t]e-t2=2.Thus,forallt�0,wehavesup :k k=1P[A]P[d (Xn1;A)t]sup :k k=1min(P[A];P[d (Xn1;A)t])e-t2=2:57 upper-semi-continuouswithrespecttoy,whereXisconvexandcompact,theninfxsupyf(x;y)=supyinfxf(x;y):(WeomitthedetailsofcheckingtheconditionsofSion'stheorem,see[15].)Letnow(b;b )beasaddlepointforxn1.WehaveZ0i=inf2M(A)sup Xj jE[1x(i)j6=Yj]inf2M(A)Xjb jE[1x(i)j6=Yj]wherex(i)j=xjifj6=iandx(i)i=x0i.LetdenotethedistributiononAthatachievestheinmuminthelatterexpression.NowwehaveZ=infXjb jE[1xj6=Yj]Xjb jE[1xj6=Yj]:HencewegetZ-Z0iXjb jE[1xj6=Yj-1x(i)j6=Yj]=b iE[1xi6=Yi-1x(i)i6=Yi]b i:ThereforePni=1(Z-Z0i)21�ZZ0iPib 2i=1.ThusbyTheorem15(moreprecisely,byitsgeneralizationinExercise15),foranyt�0,P[dT(Xn1;A)-EdT(Xn1;A)t]e-t2=4:Similarly,byTheorem20wegetP[dT(Xn1;A)-EdT(Xn1;A)-t]e-t2=(4(e-1))which,bytakingt=EdT(Xn1;A),impliesEdT(Xn1;A)s 4(e-1)log1 P[A]:P"dT(Xn1;A)-s 4(e-1)log1 P[A]t#e-t2=4:59 Toseethisitsucestoshowthatthexiforwhichxi6=yicanbepackedintoatmostj2Pi:xi6=yixik+1bins.Forthisitenoughtondapackingsuchthatatmostonebinislessthanhalffull.Butsuchapackingmustexistbecausewecanalwayspackthecontentsoftwohalf-emptybinsintoone.Denotingby = (xn1)2[0;1)ntheunitvectorxn1=kxn1k,weclearlyhaveXi:xi6=yixi=kxn1kXi:xi6=yi i=kxn1kd (xn1;yn1):LetabeapositivenumberanddenethesetAa=fyn1:g(yn1)ag.Then,bytheargumentaboveandbythedenitionoftheconvexdistance,foreachxn12[0;1]nthereexistsyn12Aasuchthatg(xn1)g(yn1)+2Xi:xi6=yixi+1a+2kxn1kdT(xn1;Aa)+1fromwhichweconcludethatforeacha�0,Za+2kXn1kdT(Xn1;Aa)+1.Thus,writing=q EPni=1X2iforanyt0,P[Za+1+t]P"Za+1+t2kXn1k 2p 22+t#+PkXn1kq 22+tP"dT(Xn1;Aa)t 2p 22+t#+e-(3=8)(2+t)wheretheboundonthesecondtermfollowsbyasimpleapplicationofBernstein'sinequality,seeExercise21.Toobtainthedesiredinequality,weusetheobtainedboundwithtwodierentchoicesofa.ToderiveaboundfortheuppertailofZ,wetakea=MZ.ThenP[Aa]1=2andtheconvexdistanceinequalityyieldsP[ZMZ+1+t]2e-t2=(16(22+t))+e-(3=8)(2+t)4e-t2=(16(22+t)):WeobtainasimilarinequalityinthesamewayforP[ZMZ-1-t]bytakinga=MZ-t-1.261 [6]P.Bartlett,O.Bousquet,andS.Mendelson.LocalizedRademachercomplexities.InProceedingsofthe15thannualconferenceonCom-putationalLearningTheory,pages4448,2002.[7]P.L.BartlettandS.Mendelson.Rademacherandgaussiancomplexi-ties:riskboundsandstructuralresults.JournalofMachineLearn-ingResearch,3:463482,2002.[8]W.Beckner.AgeneralizedPoincaréinequalityforGaussianmeasures.ProceedingsoftheAmericanMathematicalSociety,105:397400,1989.[9]G.Bennett.Probabilityinequalitiesforthesumofindependentran-domvariables.JournaloftheAmericanStatisticalAssociation,57:3345,1962.[10]S.N.Bernstein.TheTheoryofProbabilities.GastehizdatPublishingHouse,Moscow,1946.[11]A.Blumer,A.Ehrenfeucht,D.Haussler,andM.K.Warmuth.Learn-abilityandtheVapnik-Chervonenkisdimension.JournaloftheACM,36:929965,1989.[12]S.BobkovandM.Ledoux.Poincaré'sinequalitiesandTalagrands'sconcentrationphenomenonfortheexponentialdistribution.Probabil-ityTheoryandRelatedFields,107:383400,1997.[13]B.BollobásandG.Brightwell.Theheightofarandompartialorder:Concentrationofmeasure.AnnalsofAppliedProbability,2:10091018,1992.[14]S.Boucheron,G.Lugosi,andP.Massart.Asharpconcentrationinequalitywithapplications.RandomStructuresandAlgorithms,16:277292,2000.[15]S.Boucheron,G.Lugosi,andP.Massart.Concentrationinequalitiesusingtheentropymethod.TheAnnalsProbability,31:15831614,2003.63 [27]L.DevroyeandL.Györ.NonparametricDensityEstimation:TheL1View.JohnWiley,NewYork,1985.[28]L.Devroye,L.Györ,andG.Lugosi.AProbabilisticTheoryofPatternRecognition.Springer-Verlag,NewYork,1996.[29]L.DevroyeandG.Lugosi.CombinatorialMethodsinDensityEs-timation.Springer-Verlag,NewYork,2000.[30]D.DubdashiandD.Ranjan.Ballsandbins:astudyinnegativede-pendence.RandomStructuresandAlgorithms,pages99124,1998.[31]R.M.Dudley.UniformCentralLimitTheorems.CambridgeUni-versityPress,Cambridge,1999.[32]B.EfronandC.Stein.Thejackknifeestimateofvariance.AnnalsofStatistics,9:586596,1981.[33]A.M.Frieze.Onthelengthofthelongestmonotonesubsequenceinarandompermutation.AnnalsofAppliedProbability,1:301305,1991.[34]P.Groeneboom.Hydrodynamicalmethodsforanalyzinglongestin-creasingsubsequences.probabilisticmethodsincombinatoricsandcombinatorialoptimization.JournalofComputationalandAppliedMathematics,142:83105,2002.[35]T.HagerupandC.Rüb.AguidedtourofChernobounds.Informa-tionProcessingLetters,33:305308,1990.[36]G.H.Hall,J.E.Littlewood,andG.Pólya.Inequalities.CambridgeUniversityPress,London,1952.[37]T.S.Han.Nonnegativeentropymeasuresofmultivariatesymmetriccorrelations.InformationandControl,36,1978.[38]W.Hoeding.Probabilityinequalitiesforsumsofboundedrandomvariables.JournaloftheAmericanStatisticalAssociation,58:1330,1963.65 [50]G.Lugosi.Patternclassicationandlearningtheory.InL.Györ,editor,PrinciplesofNonparametricLearning,pages562.Springer,Wien,2002.[51]G.LugosiandM.Wegkamp.Complexityregularizationvialocalizedrandompenalties.AnnalsofStatistics,32:16791697,2004.[52]K.Marton.Asimpleproofoftheblowing-uplemma.IEEETrans-actionsonInformationTheory,32:445446,1986.[53]K.Marton.Bounding d-distancebyinformationaldivergence:awaytoprovemeasureconcentration.AnnalsofProbability,24:857866,1996.[54]K.Marton.AmeasureconcentrationinequalityforcontractingMarkovchains.GeometricandFunctionalAnalysis,6:556571,1996.Erra-tum:7:609613,1997.[55]P.Massart.OptimalconstantsforHoedingtypeinequalities.Tech-nicalreport,Mathematiques,UniversitédeParis-Sud,Report98.86,1998.[56]P.Massart.AbouttheconstantsinTalagrand'sconcentrationinequal-itiesforempiricalprocesses.AnnalsofProbability,28:863884,2000.[57]P.Massart.Someapplicationsofconcentrationinequalitiestostatis-tics.AnnalesdelaFacultédesScienciesdeToulouse,IX:245303,2000.[58]C.McDiarmid.Onthemethodofboundeddierences.InSurveysinCombinatorics1989,pages148188.CambridgeUniversityPress,Cambridge,1989.[59]C.McDiarmid.Concentration.InM.Habib,C.McDiarmid,J.Ramirez-Alfonsin,andB.Reed,editors,ProbabilisticMethodsforAlgorithmicDiscreteMathematics,pages195248.Springer,NewYork,1998.67 [72]J.M.Steele.Longcommonsubsequencesandtheproximityoftworandomstrings.SIAMJournalofAppliedMathematics,42:731737,1982.[73]J.M.Steele.AnEfron-Steininequalityfornonsymmetricstatistics.AnnalsofStatistics,14:753758,1986.[74]J.M.Steele.ProbabilityTheoryandCombinatorialOptimization.SIAM,CBMS-NSFRegionalConferenceSeriesinAppliedMathemat-ics69,3600UniversityCityScienceCenter,Phila,PA19104,1996.[75]M.Talagrand.Concentrationofmeasureandisoperimetricinequalitiesinproductspaces.PublicationsMathématiquesdel'I.H.E.S.,81:73205,1995.[76]M.Talagrand.Newconcentrationinequalitiesinproductspaces.In-ventionesMathematicae,126:505563,1996.[77]M.Talagrand.Anewlookatindependence.AnnalsofProbability,24:134,1996.(SpecialInvitedPaper).[78]A.W.vanderWaartandJ.A.Wellner.Weakconvergenceandem-piricalprocesses.Springer-Verlag,NewYork,1996.[79]V.N.Vapnik.TheNatureofStatisticalLearningTheory.Springer-Verlag,NewYork,1995.[80]V.N.Vapnik.StatisticalLearningTheory.JohnWiley,NewYork,1998.[81]V.N.VapnikandA.Ya.Chervonenkis.Ontheuniformconvergenceofrelativefrequenciesofeventstotheirprobabilities.TheoryofProba-bilityanditsApplications,16:264280,1971.[82]V.N.VapnikandA.Ya.Chervonenkis.TheoryofPatternRecogni-tion.Nauka,Moscow,1974.(inRussian);Germantranslation:The-oriederZeichenerkennung,AkademieVerlag,Berlin,1979.69