Whos Afraid of George Kingsley Zipf Charles Yang Department of Linguistics Comp - Pdf

92K - views

Whos Afraid of George Kingsley Zipf Charles Yang Department of Linguistics Comp

yanglingupennedu June 2010 Abstract We explore the implications of Zipfs law for the understanding of linguistic productivity Focusing on language acquisition we show that the item usage based approach has not been supported by adequate statistical e

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Whos Afraid of George Kingsley Zipf Char..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Whos Afraid of George Kingsley Zipf Charles Yang Department of Linguistics Comp






Presentation on theme: "Whos Afraid of George Kingsley Zipf Charles Yang Department of Linguistics Comp"— Presentation transcript:

AbstractWeexploretheimplicationsofZipf'slawfortheunderstandingoflinguisticproductivity.Focusingonlanguageacquisition,weshowthattheitem/usagebasedapproachhasnotbeensupportedbyadequatestatisticalevidence.Bycontrast,thequantitativepropertiesofaproductivegrammarcanbepreciselyformulated,andareconsistentwithevenveryyoungchildren'slanguage.Moreover,drawingfromre-searchincomputationallinguistics,thestatisticalpropertiesofnaturallanguagestronglysuggestthatthetheoryofgrammarbecomposedofgeneralprincipleswithoverarchingrangeofapplicationsratherthanacollectionofitemandconstructionspeci cexpressions.2 syntacticcompetenceiscomprisedtotallyofverb-speci cconstructionswithopennominalslots”,ratherthanabstractandproductivesyntacticrulesunderwhichpresumablyabroaderrangeofcombinationsisexpected.Limitedmorphologicalinection.AccordingtoastudyofchildItalian(Pizutto&Caselli1994),47%ofallverbsusedby3youngchildren(1;6to3;0)wereusedin1person-numberagreementform,andanadditional40%wereusedwith2or3forms,wheresixformsarepossible(3person2number).Only13%ofallverbsappearedin4ormoreforms.Again,thelowlevelofusagediversityistakentoshowthelimitednessofgeneralizationcharacteristicofitem-basedlearning.Unbalanceddeterminerusage.CitingPine&Lieven(1997)andothersimilarstudies,itisfoundthatwhenchildrenbegantousethedeterminersaandthewithnouns,“therewasalmostnoover-lapinthesetsofnounsusedwiththetwodeterminers,suggestingthatthechildrenatthisagedidnothaveanykindofabstractcategoryofDeterminersthatincludedbothoftheselexicalitems”.This ndingisheldtocontradicttheearlieststudy(Valian1986)whichmaintainsthatchilddeter-mineruseisproductiveandaccuratelikeadultsbytheageof2;0.Sofaraswecantell,however,theseevidenceinsupportforitem-basedlearninghasbeenpresented,andaccepted,onthebasisofintuitiveinspectionsratherthanformalempiricaltests.Forinstance,amongthenumerousexamplesfromchildlanguage,nostatisticaltestwasgiveninthemajortreat-ment(Tomasello1992)wheretheVerbIslandHypothesisandrelatedideasaboutitem-basedlearningareputforward.Speci cally,notesthasbeengiventoshowthattheobservationsabovearestatisticallyinconsistentwiththeexpectationofafullyproductivegrammar,thepositionthatitem-basedlearningopposes.Nor,forthatmatter,aretheseobservationsshowntobeconsistentwithitem-basedlearning,which,asweshallsee,hasnotbeenclearlyenougharticulatedtofacilitatequantitativeevaluation.Inthispaper,weprovidestatisticalanalysisto llthesegaps.Wedemonstratethatchildren'slanguageuseactuallyshowstheoppositeoftheitem-basedview;theproductivityofchildren'sgrammarisinfactcon- rmed.Morebroadly,weaimtodirectresearcherstocertainstatisticalpropertiesofnaturallanguageandthechallengestheyposeforthetheoryoflanguageandlanguagelearning.Ourpointofdepartureisanamethathasbeen,andwillcontinueto,tormenteverystudentoflanguage:GeorgeKingsleyZipf.2Zip anPresence2.1Zip anWordsUndertheso-calledZipf'slaw(Zipf1949),theempiricaldistributionsofwordsfollowacuriouspattern:relativelyfewwordsareusedfrequently—veryfrequently—whilemostwordsoccurrarely,withmanyoccurringonlyonceinevenlargesamplesoftexts.Moreprecisely,thefrequencyofawordtendstobeapproximatelyinverselyproportionaltoitsrankinfrequency.LetfbethefrequencyofthewordwiththerankofrinasetofNwords,then:f=C rwhereCissomeconstant(1)IntheBrowncorpus(Kucera&Francis1967),forinstance,thewordwithrank1is“the”,whichhasthefrequencyofabout70,000,andthewordwithrank2is“of”,withthefrequencyofabout36,000:almostexactlyasZipf'slawentails(i.e.,700001360002).TheZip ancharacterizationofwordfrequency4 2.2Zip anCombinatoricsThe“longtail”ofZipf'slaw,whichisoccupiedbylowfrequencywords,becomesevenmorepronouncedwhenweconsidercombinatoriallinguisticunits.Take,forinstance,n-grams,thesimplestlinguisticcombinationthatconsistsofnconsecutivewordsinatext.2Sincetherearealotmorebigramsandtrigramsthanwords,thereareconsequentlyalotmorelowfrequencybigramsandtrigramsinalinguisticsample,asFigure2illustratesfromtheBrowncorpus(forrelatedstudies,seeTeahan1997,Haetal.2002): 40 50 60 70 80 90 100 200 100 50 40 30 20 10 5 4 3 2 1 Cumulative%oftypesFrequencywords bigrams trigrams Figure2.Thevastmajorityofn-gramsarerareevents.Thex-axisdenotesthefrequencyofthegram,andthey-axisdenotesthecumulative%ofthegramthatappearatthatfrequencyorlower.Forinstance,thereareabout43%ofwordsthatoccuronlyonce,about58%ofwordsthatoccur1-2times,68%ofwordsthatoccur1-3times,etc.The%ofunitsthatoccurmultipletimesdecreasesrapidly,especiallyforbigramsandtrigrams:approximately91%ofdistincttrigramtypesintheBrowncorpusoccuronlyonce,and96%occuronceortwice.Therangeoflinguisticformsissovastthatnosampleislargeenoughtocaptureallofitsvarietiesevenwhenwemakeacertainnumberofabstractions.Figure3plotstherankandfrequencydistributionsofsyntacticrulesofmodernEnglishfromthePennTreebank(Marcusetal.1993).Sincethecorpushasbeenmanuallyannotatedwithsyntacticstructures,itisstraightforwardtoextractrulesandtallytheirfrequencies.3Themostfrequentruleis“PP!PNP”,followedby“S!NPVP”:again,theZipf-likepatterncanbeseenbythecloseapproximationbyastraightlineonthelog-logscale. 2Forexample,giventhesentence“thecatchasesthemouse”,thebigrams(n=2)are“thecatchasesthemouse”are“thecat”,“catchases”,“chasesthe”,and“themouse”,andthetrigrams(n=3)are“thecatchases”,“catchasesthe”,“chasesthemouse”.Whenn=1,wearejustdealingwithwords.3CertainruleshavebeencollapsedtogetherastheTreebankfrequentlyannotatesrulesinvolvingdistinctfunctionalheadsasseparaterules.6 SupposealinguisticsamplecontainsSdeterminer-nounpairs,whichconsistofDandNuniquedeterminersandnouns.(InthepresentcaseD=2for“a”and“the”.)ThefullproductivityoftheDPrule,byde nition,meansthatthetwocategoriescombineindependently.Twoobservations,oneobviousandtheothernovel,canbemadeaboutthedistributionsofthetwocategoriesandtheircombinations.First,nouns(andopenclasswordsingeneral)willfollowzipf'slaw.Forinstance,thesingularnounsthatappearintheformof“DP!DN”intheBrowncorpusshowalog-logslopeof-0.97.IntheCHILDES(MacWhinney2000)speechtranscriptsofsixchildren(seesection3.2fordetails),theaveragevalueoflog-logslopeis-0.98.Thismeansthatinalinguisticsample,relativelyfewnounsoccuroftenbutmanywilloccuronlyonce—whichofcoursecannotoverlapwithmorethanonedeterminers.Second,whilethecombinationofDandNissyntacticallyinterchangeable,N'stendtofavoroneofthetwodeterminers,aconsequenceofpragmaticsandindeednon-linguisticfactors.Forinstance,wesay“thebathroom”moreoftenthan“abathroom”but“abath”moreoftenthan“thebath”,eventhoughallfourDPsareperfectlygrammatical.Thereasonforsuchasymmetriesisnotamatteroflinguisticinterest:“thebathroom”ismorefrequentthan“abathroom”onlybecausebodilyfunctionsareamoreconstantthemeoflifethanrealestatematters.Wecanplacethesecombinatorialasymmetriesinaquantitativecontext.Asnotedearlier,about75%ofdistinctnounsintheBrowncorpusoccurwithexclusively“the”or“a”butnotboth.Eventheremaining25%whichdooccurwithtendtohavefavorites:onlyafurther25%(i.e.12.5%ofallnouns)areusedwith“a”and“the”equallyfrequently,andtheremaining75%areunbalanced.Overall,fornounsthatappearwithbothdeterminersasleastonce(i.e.25%ofallnouns),thefrequencyratiobetweenthemoreoverthelessfavoreddetermineris2.86:1.(Ofcourse,somenounsfavor“the”whileothersfavor“a”,asthe“bathroom”and“bath”examplesaboveillustrate.)Thesegeneralpatternsholdforchildandadultspeechdataaswell.Inthesixchildren'stranscripts(section3.2),theaveragepercentageofbalancednounsamongthosethatappearwithboth“the”and“a”is22.8%,andthemorefavoredvs.lessfavoreddeterminerhasanaveragefrequencyratioof2.54:1.Eventhoughtheseratiosdeviatefromtheperfect2:1ratiounderthestrictversionofZipf'slaw—themorefavoredisevenmoredominantovertheless—theyclearlypointouttheconsiderableasymmetryincategorycombinationusage.Asaresult,evenwhenanounappearsseveraltimesinasample,thereisstillasigni cantchancethatithasbeenpairedwithasingledeterminerinallinstances.Together,Zip andistributionsofatomiclinguisticunits(words;Figure1)andtheircombinations(n-gramsFigure2,phrasesFigure3)ensurethatthedeterminer-nounoverlapmustberelativelylowunlessthesamplesizeSisverylarge.Insection4,weexamine,anddiscoversimilarpatterns,fortheusagepatternsofverbalsyntaxandmorphology.Forthemoment,wedevelopaprecisemathematicaltreatmentandcontrastitwiththeitem-basedlearningapproachinthecontextoflanguageacquisition.3QuantifyingProductivity3.1TheoreticalanalysisConsiderasample(N,D,S),whichconsistsofNuniquenouns,Duniquedeterminers,andSdeterminer-nounpairs.HereD=2for“the”and“a”thoughweconsiderthegeneralcasehere.Thenounsthathaveappearedwithmorethanone(i.e.two)determinerswillhaveanoverlapvalueof1;otherwise,theyhavetheoverlapvalueof0.Theoverlapvaluefortheentiresamplewillbethenumberof1'sdividedbyN.Ouranalysiscalculatestheexpectedvalueoftheoverlapvalueforthesample(N,D,S)underthe8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90 100 ExpectedOverlapRank Figure4.Expectedoverlapvaluesfornounsorderedbyrank,forN=100nounsinasamplesizeofS=200withD=2determiners.WordfrequenciesareassumedtofollowtheZip andistribution.Ascanbeseen,fewofnounshavehighprobabilitiesofoccurringwithbothdeterminers,butmostare(far)belowchance.Theaverageoverlapis21.1%.UnderZip andistributionofcategoriesandtheirproductivecombinations,lowoverlapvaluesareamathematicalnecessity.Asweshallsee,thetheoreticalformulationherenearlyperfectlymatchthedistributionalpatternsinchildlanguage,towhichweturnpresently.3.2DeterminersandproductivityMethods.Tostudythedeterminersysteminchildlanguage,weconsiderthedatafromsixchildrenAdam,Eve,Sarah,Naomi,Nina,andPeter.ThesearetheallandonlychildrenintheCHILDESdatabase(MacWhinney2000)withsubstantiallongitudinaldatathatstartsattheverybeginningofsyntacticde-velopment(i.e,oneortwowordstage)sothattheitem-basedstage,ifexists,couldbeobserved.Forcomparison,wealsoconsidertheoverlapmeasureoftheBrowncorpus(Kucera&Francis1967),forwhichproductivityisnotindoubt.We rstremovedtheextraneousannotationsfromthechildtextandthenappliedanopensourceimplementationofarule-basedpart-of-speechtagger(Brill1995):5wordsarenowassociatedwiththeirpart-of-speech(e.g.,preposition,singularnoun,pasttenseverb,etc.).ForlanguagessuchasEnglish,whichhasrelativelysalientcuesforpart-of-speech(e.g.,rigidwordorder,lowdegreeofmorphologicalsyncretism),suchtaggerscanachievehighaccuracyatover97%.Thisalreadylowerrorratecausesevenlessconcernforourstudy,sincethedeterminers“a”and“the”arenotambiguousandarealwayscor-rectlytagged,whichreliablycontributestothetaggingofthewordsthatfollowthem.TheBrownCorpusisavailablewithmanuallyassignedpart-of-speechtagssonocomputationaltaggingisnecessary.Withtaggeddatasets,weextractedadjacentdeterminer-nounpairsforwhichDiseither“a”or“the”,andNhasbeentaggedasasingularnoun.Wordsthataremarkedasunknown,largelyunintelligible 5Availableathttp://gposttl.sourceforge.net/.10 15 20 25 30 35 40 45 50 15 20 25 30 35 40 45 50 EmpiricalPredictedidentity r=1.06 Figure5.Thesolidlinerepresentsthelinearregression toftheexpectedvs.empiricalvaluesofoverlapinTable1column5and6(r=1.08,adjustedR2=0.9716).Thedottedlineindicatesaperfect t(i.e.,theidentityfunctiony=x).Therefore,wecouldthatthedeterminerusagedatafromchildlanguageisconsistentwiththeproductiverule“DP!DN”.Theempiricalstudiesalsorevealconsiderableindividualvariationintheoverlapvalues,anditisinstructivetounderstandwhy.AstheBrowncorpusresultshows(Table1lastrow),samplesizeS,thenumberofnounsN,orthelanguageuser'sagealoneisnotpredictiveoftheoverlapvalue.Thevariationcanberoughlyanalyzedasfollows;seeValianetal.(2009)forarelatedproposal.GivenNuniquenounsinasampleofS,greateroverlapvaluecanbeobtainedifmorenounsoccurmorethanonce.Thatis,wordswhoseprobabilitiesaregreaterthan1=Scanincreasetheoverlapvalue.Zipf'slaw(2)allowsustoexpressthiscutofflineintermswithranks,astheprobabilityofthenounnrwithrankrhastheprobabilityof1=(rHN).ThederivationbelowusesthefactthattheNthHarmonicNumberPNi=11=icanbeapproximatedbylnN.S1 rHN=1r=S HNS lnN(5)Thatis,onlynounswhoseranksarelowerthanS=(lnN)canbeexpectedtobenon-zerooverlaps.ThetotaloverlapisthusamonotonicallyincreasingfunctionofS=(NlnN)which,giventheslowgrowthoflnN,isapproximatelyS=N,atermthatmustbepositivelycorrelatedwithoverlapmeasures.Thisresultiscon rmedinstrongestterms:S=Nisanearperfectpredictorfortheempiricalvaluesofoverlap(lasttwocolumnsofTable1):r=0.986,p0.00001.12 ciesfromthatchild'sinput(localmemorylearner)andthedeterminer-nounpairsalongwiththeirfre-quenciesintheentire1.1millionutterancesofadultspeech(globalmemorylearner).ForeachchildwithasamplesizeofS(seeTable1,column2),andforeachvariantofthememorymodel,weusetheMonteCarlosimulationtorandomlydrawSpairsfromthetwosetsofdatathatcorrespondtothelocalandglobalmemorylearningmodels.Theprobabilitywithwhichapairisdrawnisproportionaltoitsfre-quencyinthetwosetsofdata.Thus,amorefrequently-usedpairsintheinputwillhaveahigherchanceofbeingdrawn,whichreectsfrequencyeffectsinlearningsoftenemphasizedintheitem/usage-basedapproach(e.g.,Tomasello2001,2003,Matthewsetal.2005,Bybee&Hopper2001,amongothers).Eachsample,then,consistsofalistofdeterminer-nounpairswithvaryingoccurrencecounts.Wecalculatethevalueofoverlapfromthislist,thatis,thepercentageofnounsthatappearwithboth“a”and“the”overthetotalnumberofnouns.Theresultsareaveragedover1000draws.TheseresultsaregiveninTable2. Child SampleSize(S) Overlap(globalmemory) Overlap(localmemory) Overlap(empirical) Eve 831 16.0 17.8 21.6 Naomi 884 16.6 18.9 19.8 Sarah 2453 24.5 27.0 29.2 Peter 2873 25.6 28.8 40.4 Adam 3729 27.5 28.5 32.3 Nina 4542 28.6 41.1 46.7 First100 600 13.7 17.2 21.8 First300 1800 22.1 25.6 29.1 First500 3000 25.9 30.2 34.2 Table2.Thecomparisonofdeterminer-nounoverlapbetweentwovariantsofitem-basedlearningandempiricalresults.Bothsetsofoverlapvaluesfromthetwovariantsofitem-basedlearning(column3and4)differsig-ni cantlyfromtheempiricalmeasures(column5):p0.005forbothpairedt-testandpairedWilcoxontest.Thissuggeststhatchildren'suseofdeterminersdoesnotfollowthepredictionsoftheitem-basedlearningapproach;itcertainlydoesnotseemtobetheresultofthechildretrievingjointlystoreddeterminer-nounpairsfromtheinputinafrequencysensitivefashion.Naturally,ourevaluationhereistentativesincethepropertestcanbecarriedoutonlywhenthetheoreticalpredictionsofitem-basedlearningaremadeclear.Andthatisexactlythepoint:theadvocatesofitem-basedlearningnotonlyrejectedtheal-ternativehypothesiswithoutadequatestatisticaltests,butalsoacceptedthefavoredhypothesiswithoutadequatestatisticaltests.4AnItemizedLookatVerbsTheformalanalysisinsection3canbegeneralizedtochildverbsyntaxandmorphology,whichareamongthemainsupportingcasesforitem-basedlearning.Unfortunately,theacquisitiondatainsup-portoftheVerbIslandHypothesis(Tomasello1992)andtheitem-basednatureofearlymorphology(Pizutto&Caselli1994)citedinsection1hasnotbeenmadeavailableinthepublicdomain.ButtheZip anrealityisinherent:thecombinatoricsofverbsandtheirmorphologicalandsyntacticassociatesaresimilarlylopsidedinusagedistributionasiswiththedeterminers.Wenowturntoexaminethestatisticaldistributionsofverbalmorphologyandsyntax.14 4.2AllverbsareislandsWenowstudythedistributionalpropertiesofverbalsyntaxthathavebeenattributedtotheVerbIslandHypothesis.Wefocusonconstructionsthatinvolveatransitiveverbanditsnominalobjects,includingpronounsandnounphrases.Followingthede nitionof“sentenceframe”inTomasello'soriginalVerbIslandstudy(1992,p242),eachuniquelexicalitemintheobjectpositioncountsasauniqueconstructionfortheverb.Figure6showstheconstructionfrequenciesofthetop15transitiveverbsin1.1millionchilddi-rectedutterances.Processingmethodsareasdescribedinsection3.2excepthereweextractadjacentverb-nominalpairsinpart-of-speechtaggedtexts.Foreachverb,wecountitstop10mostfrequentcon-structions,whicharede nedastheverbfollowedauniquelexicalitemintheobjectposition(e.g.,“askhim”and“askJohn”aredifferentconstructions.)Foreachofthe10ranks,wetalliedtheconstructionfrequenciesforall15verbs.8 4 4.5 5 5.5 6 6.5 7 7.5 8 0 0.5 1 1.5 2 2.5 log(freq)log(rank) Figure6.Rankandfrequencyofverb-objectconstructionsbasedon1.1millionchild-directedutterances.TheverbconstructionfrequencythusalsofollowaZipf-likepattern:evenforlargecorpora,averbap-pearsinfewconstructionsfrequentlyandinmostconstructionsinfrequentlyifatall.TheobservationofVerbIslands,thatverbstendtocombinewithoneorfewelementsoutofalargerange,isinfactcharac-teristicofafullyproductiveverbalsyntaxsystem.Asfarasweknow,thequantitativepredictionsoftheVerbIslandHypothesishavenotbeenspelledoutbutwemayestimatethenecessaryamountoflanguagesamplethatwouldmasktheseislandeffects.Theappealtounevennessofverbalconstructionfrequenciesseemstoreecttheexpectationthatunderfullproductivity,mostverbsoughttoappearwithmostofthepossiblerangeofarguments.Substitutingnounsanddeterminersforverbsandnominals,theformalanalysiscouldbecarriedoutfortheverbalsyntacticsystem.Insteadofcalculatingtheexpectednumbersofdeterminersthatanounappearswith,onewouldcalculatetheexpectednumberofobjectsaverbappearswith. 8Theseverbsare:put,tell,see,want,let,give,take,show,got,ask,makeeat,like,bringandhear.Thefrequencytalliesofthetop10mostfrequentconstructionsare1904,838,501,301,252,189,137,109,88,and75.16 (Jelinek1998);then-gramandruledistributionsdiscussedinsection2.2makethesepointsveryclearly.Forthelinguist,theZip annatureoflanguageraisesimportantquestionsforthedevelopmentoflinguistictheories.First,Zipf'slawhintsattheinherentlimitationsinapproachesthatstressthestor-ageofconstruction-speci crulesorprocesses(e.g.,Goldberg2003,Culicover&Jackendoff2005).Forinstance,thecentraltenetsofConstructionGrammarviewsconstructionsas“storedpairingsofformandfunction,includingmorphemes,words,idioms,partiallylexically lledandfullygenerallinguis-ticpatterns”and“thetotalityofourknowledgeoflanguageiscapturedbyanetworkofconstructions”(Goldberg2003,p219).YettheZip andistributionoflinguisticcombinations,asillustratedinFigure3fortheWallStreetJournalandFigure4forchilddirectedspeech,ensurethatmost“pairingsofformandfunction”simplywillneverbeheard,nevermindstored,andthosethatdoappearmaydosowithsuf cientlylowfrequencysuchthatnoreliablestorageanduseispossible.Second,andmoregenerally,Zipf'slawchallengestheconventionalwisdomincurrentsyntacticthe-orizingthatmakesuseofahighlydetailedlexicalcomponent;therehavesuggestionsthatallmattersoflanguagevariationareinthelexiconwhichinanycaseneedstobeacquiredforindividuallanguages.Yettheeffectivenessoflexicalizationingrammarhasnotbeenfullyinvestigatedinlargescalestudies.However,usefulinferencescanbedrawnfromtheresearchonstatisticalinductionofgrammarincom-putationallinguistics(Charniak1993,Collins2003).Thesetaskstypicallytakealargesetofgrammaticalrules(e.g.,probabilisticcontextfreegrammar)and ndappropriateparametervalues(e.g.,expansionprobabilitiesinaprobabilisticcontextfreegrammar)onthebasisofanannotatedtrainingdatasuchastheTreebankwheresentenceshavebeenmanuallyparsedintophrasestructures.Theperformanceofthetrainedgrammarisevaluatedbymeasuringparsingaccuracyonanewsetofunanalyzedsentences,therebyobtainingsomemeasureofgeneralizationpowerofthegrammar.Obviously,inducingagrammaronacomputerishardlythesamethingasconstructingatheoryofgrammarbythelinguist.Nevertheless,statisticalgrammarinductioncanbeviewedasatoolthatexploreswhattypeofgrammaticalinformationisinprincipleavailableinandattainablefromthedata,whichinturncanguidethelinguistinmakingtheoreticaldecisions.Contemporaryworkonstatisticalgrammarinductionmakesuseofwiderangeofpotentiallyusefullinguisticinformationinthegrammarformalism.Forinstance,anphrase“drinkwater”mayberepresentedinmultipleforms:(a)VP!VNP(b)VP!VdrinkNP(c)VP!VdrinkNPwater(a)isthemostgeneraltypeofcontextfreegrammarrule,whereasboth(b)and(c)includeadditionallexicalinformation:(b)providesalexicallyspeci cexpansionruleconcerningtheheadverb“drink”,andthebilexicalrulein(c)encodestheitem-speci cpairingof“drink”and“water”,whichcorrespondstothenotionofsentenceframeinTomasello'sVerbIslandhypothesis(1992;seesection4.2).Byincludingorexcludingtherulesofthetypesaboveinthegrammaticalformalism,andevaluat-ingparsingaccuracyofthegrammarthustrained,wecanobtainsomequantitativemeasureofhowmucheachtypeofrules,fromgeneraltospeci c,contributestothegrammar'sabilitytogeneralizetonoveldata.Bikel(2004)providesthemostcomprehensivestudyofthisnature.Bilexicalrules(c),similartothenotionofsentenceframesandconstructions,turnouttoprovidevirtuallynogainoversimplermodelsthatonlyuserulesofthetype(a)and(b).Furthermore,lexicalizedrules(b)offeronlymodestimprovementovergeneralcategoricalrules(a)alone,withwhichalmostallofthegrammar'sgeneral-izationpowerlies.These ndingsarenotsurprisinggiventheZip annatureoflinguisticproductivity:18 Chang,F.,Lieven,E.,&Tomasello,M.(2006).Usingchildutterancestoevaluatesyntaxacquisitional-gorithms.InProceedingsofthe28thAnnualConferenceoftheCognitiveScienceSociety.Vancouver,CanadaChomsky,N.(1958).ReviewofLangagedesmachinesetlangagehumainbyParVitoldBelevitch.Lan-guage,34(1),99-105.Chomsky,N.(1965).Aspectsofthetheoryofsyntax.Cambridge,MA:MITPress.Chomsky,N.(1975).Reectionsonlanguage.NewYork:Pantheon.Chomsky,N.(1981).Lecturesongovernmentandbinding.Dordrectht:Foris.Crain,S.(1991).Languageacquisitionintheabsenceofexperience.BehavioralandBrainSciences.14,597-650.Culicover,P.&Jackendoff,R.(2005).Simplersyntax.NewYork:OxfordUniversityPress.Freudenthal,D.,Pine,J.M.,Aguado-Orea,J.&Gobet,F.(2007).Modellingthedevelopmentalpatterningof nitenessmarkinginEnglish,Dutch,GermanandSpanishusingMOSAIC.CognitiveScience,31,311-341.Freudenthal,D.,Pine,J.M.&Gobet,F.(2009).SimulatingthereferentialpropertiesofDutch,GermanandEnglishrootin nitives.LanguageLearningandDevelopment,5,1-29.Gabaix,X.(1999).Zipf'sLawforCities:AnExplanation.TheQuarterlyJournalofEconomics.114,739-767.Goldberg,E.(2003).Constructions.TrendsinCognitiveScience,7,219–224.Ha,LeQuan,Sicilia-Garcia,E.I.,Ming,Ji.&Smith,F.J.(2002).ExtensionofZipf'slawtowordsandphrases.Proceedingsofthe19thInternationalConferenceonComputationalLinguistics.315-320.Hay,J.&Baayen,H.(2005).Shiftingparadigms:gradientstructureinmorphology.TrendsinCognitiveSciences,9,342-348.Jelinek,F.(1998).Statisticalmethodsforspeechrecognition.Cambridge,MA:MITPress.Kucera,H&Francis,N.(1967).Computationalanalysisofpresent-dayEnglish.Providence,RI:BrownUniversityPress.Legate,J.A.&Yang,C.(2002).Empiricalreassessmentsofpovertystimulusarguments.LinguisticReview,19,151-162.Li,W.(1992).RandomtextsexhibitZipf'slaw-likewordfrequencydistribution.IEEETransactionsonInformationTheory,38(6),1842-1845.MacWhinney,B.(2000).TheCHILDESProject.LawrenceErlbaum.Mandelbrot,B.(1954).Structureformelledestextesetcommunication:Deuxétudes.Words,10,1–27.Matthews,D.,Lieven,E.,Theakston,A.&Tomasello,M.(2005).TheroleoffrequencyintheacquisitionofEnglishwordorder.CognitiveDevelopment,20,121-136.McNeill,D.(1963).Thecreationoflanguagebychildren.InLyons,J.&Wales,Roger.(Eds.)Psycholin-guisticPapers.Edinburgh:EdinburghUniversityPress.99-132.Miller,G.A.(1957).Someeffectsofintermittentsilence.TheAmericanJournalofPsychology,70,2,311-314.20 Zipf,G.K.(1949).Humanbehaviorandtheprincipleofleasteffort:Anintroductiontohumanecology.Addison-Wesley.22