165K - views

Proceedings of the th Annual Meeting of the Association for Computational Linguistics pages Jeju Republic of Korea July

2012 Association for Computational Linguistics You Had Me at Hello How Phrasing Affects Memorability Cristian DanescuNiculescuMizil Justin Cheng Jon Kleinberg Lillian Lee Department of Computer Science Cornell University cristiancscornelledu jc882co

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Proceedings of the th Annual Meeting of ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Proceedings of the th Annual Meeting of the Association for Computational Linguistics pages Jeju Republic of Korea July






Presentation on theme: "Proceedings of the th Annual Meeting of the Association for Computational Linguistics pages Jeju Republic of Korea July "— Presentation transcript:

dentofthese(obviouslyverycrucial)factors?Toinvestigatethequestion,oneneedsawayofcontrol-ling—asmuchaspossible—fortherolethatthesurroundingcontextofthelanguageplays.Thepresentwork(i):Evaluatinglanguage-basedmemorabilityDeningwhatmakesanutterancememorableissubtle,andscholarsinseveraldo-mainshavewrittenaboutthisquestion.Thereisaroughconsensusthatanappropriatedenitioninvolveselementsofbothrecognition—peopleshouldbeabletoretainthequoteandrecognizeitwhentheyhearitinvoked—andproduction—peo-pleshouldbemotivatedtorefertoitinrelevantsit-uations[15].Onesuggestedreasonforwhysomememessucceedistheirabilitytoprovokeemotions[16].Alternatively,memorablequotescanbegoodforexpressingthefeelings,mood,orsituationofanindividual,agroup,oraculture(thezeitgeist):“Cer-tainquotesexquisitelycapturethemoodorfeelingwewishtocommunicatetosomeone.Wehearthem...andstorethemawayforfutureuse”[10].Noneoftheseobservations,however,serveasdenitions,andindeed,webelieveitdesirabletonotpre-committoanabstractdenition,butrathertoadoptanoperationalformulationbasedonexter-nalhumanjudgments.Indesigningourstudy,wefocusonadomaininwhich(i)thereisrichuseoflanguage,someofwhichhasachieveddeepculturalpenetration;(ii)therealreadyexistalargenumberofexternalhumanjudgments—perhapsimplicit,butinaformwecanextract;and(iii)wecancontrolforthesettinginwhichthetextwasused.Specically,weusethecompletescriptsofroughly1000movies,representingdiversegenres,eras,andlevelsofpopularity,andconsiderwhichlinesarethemost“memorable”.Toacquirememo-rabilitylabels,foreachsentenceineachscript,wedeterminewhetherithasbeenlistedasa“memo-rablequote”byusersofthewidely-knownIMDb(theInternetMovieDatabase),andalsoestimatethenumberoftimesitappearsontheWeb.Bothoftheseserveasmemorabilitymetricsforourpurposes.Whenweevaluatepropertiesofmemorablequotes,wecomparethemwithquotesthatarenotas-sessedasmemorable,butwerespokenbythesamecharacter,atapproximatelythesamepointinthesamemovie.Thisenablesustocontrolinafairlyne-grainedwayfortheconfoundingeffectsofcon-textdiscussedabove:wecanobservedifferencesthatpersistevenaftertakingintoaccountboththespeakerandthesetting.Inapilotvalidationstudy,wendthathumansubjectsareeffectiveatrecognizingthemoreIMDb-memorableoftwoquotes,evenformoviestheyhavenotseen.Thismotivatesasearchforfeaturesin-trinsictothetextofquotesthatsignalmemorabil-ity.Infact,commentsprovidedbythehumansub-jectsaspartofthetasksuggestedtwobasicformsthatsuchtextualsignalscouldtake:subjectsfeltthat(i)memorablequotesofteninvolveadistinctiveturnofphrase;and(ii)memorablequotestendtoinvokegeneralthemesthataren'ttiedtothespecicsettingtheycamefrom,andhencecanbemoreeasilyin-vokedforfuture(outofcontext)uses.Wetestbothoftheseprinciplesinouranalysisofthedata.Thepresentwork(ii):Whatdistinguishesmem-orablequotesUnderthecontrolled-comparisonsettingsketchedabove,wendthatmemorablequotesexhibitsignicantdifferencesfromnon-memorablequotesinseveralfundamentalrespects,andthesedifferencesinthedatareinforcethetwomainprinciplesfromthehumanpilotstudy.First,weshowaconcretesenseinwhichmemorablequotesareindeeddistinctive:withrespecttolexi-callanguagemodelstrainedonthenewswirepor-tionsoftheBrowncorpus[21],memorablequoteshavesignicantlylowerlikelihoodthantheirnon-memorablecounterparts.Interestingly,thisdistinc-tivenesstakesplaceatthelevelofwords,butnotatthelevelofothersyntacticfeatures:thepart-of-speechcompositionofmemorablequotesisinfactmorelikelywithrespecttonewswire.Thus,wecanthinkofmemorablequotesasconsisting,inanag-gregatesense,ofunusualwordchoicesbuiltonascaffoldingofcommonpart-of-speechpatterns.Wealsoidentifyanumberofwaysinwhichmem-orablequotesconveygreatergenerality.Intheirpat-ternsofverbtenses,personalpronouns,anddeter-miners,memorablequotesarestructuredsoastobemore“free-standing,”containingfewermarkersthatindicatereferencestonearbytext.Memorablequotesdifferinotherinterestingas-pectsaswell,suchassounddistributions.Ouranalysisofmemorablemoviequotessuggests aframeworkbywhichthememorabilityoftextinarangeofdifferentdomainscouldbeinvestigated.Weprovideevidencethatsuchcross-domainprop-ertiesmayhold,guidedbyoneofourmotivatingapplicationsinmarketing.Inparticular,weanalyzeacorpusofadvertisingslogans,andweshowthatthesesloganshavesignicantlygreaterlikelihoodatboththewordlevelandthepart-of-speechlevelwithrespecttoalanguagemodeltrainedonmem-orablemoviequotes,comparedtoacorrespondinglanguagemodeltrainedonnon-memorablemoviequotes.Thissuggeststhatsomeoftheprinciplesun-derlyingmemorabletexthavethepotentialtoapplyacrossdifferentareas.Roadmapx2laystheempiricalfoundationsofourwork:thedesignandcreationofourmovie-quotesdataset,whichwemakepubliclyavailable(x2.1),apilotstudywithhumansubjectsvalidatingIMDb-basedmemorabilitylabels(x2.2),andfurtherstudyofincorporatingsearch-enginecounts(x2.3).x3de-tailsouranalysisandpredictionexperiments,usingbothmovie-quotesdataand,asanexplorationofcross-domainapplicability,slogansdata.x4surveysrelatedworkacrossavarietyofelds.x5brieysummarizesandindicatessomefuturedirections.2I'mreadyformyclose-up.2.1DataTostudythepropertiesofmemorablemoviequotes,weneedasourceofmovielinesandadesignationofmemorability.Following[8],weconstructedacorpusconsistingofalllinesfromroughly1000movies,varyingingenre,era,andpopularity;foreachmovie,wethenextractedthelistofquotesfromIMDb'sMemorableQuotespagecorrespondingtothemovie.1AmemorablequoteinIMDbcanappeareitherasanindividualsentencespokenbyonecharacter,orasamulti-sentenceline,orasablockofdialoguein-volvingmultiplecharacters.Inthelattertwocases,itcanbehardtodeterminewhichparticularportionisviewedasmemorable(someinvolveabuild-uptoapunchline;othersinvolvethefollow-throughafterawell-phrasedopeningsentence),andsowefocus 1Thisextractioninvolvedsomeedit-distance-basedalign-ment,sincetheexactformofthelineinthescriptcanexhibitminordifferencesfromtheversiontypedintoIMDb. Figure1:Locationofmemorablequotesineachdecileofmoviescripts(therst10th,thesecond10th,etc.),summedoverallmovies.Thesamequalitativeresultsholdifwediscardeachmovie'sveryrstandlastline,whichmighthaveprivilegedstatus.inourcomparisonsonthosememorablequotesthatappearasasinglesentenceratherthanamulti-lineblock.2Wenowformulateataskthatwecanusetoeval-uatethefeaturesofmemorablequotes.Recallthatourgoalistoidentifyeffectsbasedinthelanguageofthequotesthemselves,beyondanyfactorsarisingfromthespeakerorcontext.Thus,foreach(single-sentence)memorablequoteM,weidentifyanon-memorablequotethatisassimilaraspossibletoMinallcharacteristicsbutthechoiceofwords.Thismeanswewantittobespokenbythesamecharac-terinthesamemovie.Italsomeansthatwewantittohavethesamelength:controllingforlengthisimportantbecauseweexpectthatonaverage,shorterquoteswillbeeasiertorememberthanlongquotes,andthatwouldn'tbeaninterestingtextualeffecttoreport.Moreover,wealsowanttocontrolforthefactthataquote'spositioninamoviecanaffectmemorability:certainscenesproducemoremem-orabledialogue,andasFigure1demonstrates,inaggregatememorablequotesalsooccurdispropor-tionatelynearthebeginningsandespeciallytheendsofmovies.Insummary,then,foreachM,wepickacontrasting(single-sentence)quoteNfromthesamemoviethatisascloseinthescriptaspossibletoM(eitherbeforeorafterit),subjecttotheconditionsthat(i)MandNareutteredbythesamespeaker,(ii)MandNhavethesamenumberofwords,and 2Wealsoranexperimentsrelaxingthesingle-sentenceas-sumption,whichallowsforstricterscenecontrolandalargerdatasetbutcomplicatescomparisonsinvolvingsyntax.Thenon-syntaxresultswereinlinewiththosereportedhere. Movie FirstQuote SecondQuote JackieBrown Halfamilliondollarswillalwaysbemissed. Iknowthetype,trustmeonthis. StarTrek:Nemesis Ithinkit'stimetotrysomeunsafevelocities. Nocoldfeet,oranyotherpartsofouranatomy. OrdinaryPeople Alittleadviceaboutfeelingskiddo;don'tex-pectitalwaystotickle. Imeanthere'ssomeonebesidesyourmotheryou'vegottoforgive. Table1:Threeexamplepairsofmoviequotes.Eachpairsatisesourcriteria:thetwocomponentquotesarespokenclosetogetherinthemoviebythesamecharacter,havethesamelength,andoneislabeledmemorablebytheIMDbwhiletheotherisnot.(Contractionssuchas“it's”countastwowords.)(iii)NdoesnotoccurintheIMDblistofmemorablequotesforthemovie(eitherasasinglelineoraspartofalargerblock).Givensuchpairs,weformulateapairwisecom-parisontask:givenMandN,determinewhichisthememorablequote.Psychologicalresearchonsubjectiveevaluation[35],aswellasinitialexperi-mentsusingourselvesassubjects,indicatedthatthispairwiseset-upeasiertoworkwiththansimplypre-sentingasinglesentenceandaskingwhetheritismemorableornot;thelatterrequiresagreementonan“absolute”criterionformemorabilitythatisveryhardtoimposeconsistently,whereastheformersim-plyrequiresajudgmentthatonequoteismoremem-orablethananother.Ourmaindataset,availableathttp://www.cs.cornell.edu/cristian/memorability.html,3thuscon-sistsofapproximately2200such(M;N)pairs,sep-aratedbyamedianof5same-characterlinesinthescript.ThereadercangetasenseforthenatureofthedatafromthethreeexamplesinTable1.Wenowdiscusstwofurtheraspectstotheformu-lationoftheexperiment:apreliminarypilotstudyinvolvinghumansubjects,andtheincorporationofsearchenginecountsintothedata.2.2Pilotstudy:HumanperformanceAsapreliminaryconsideration,wedidasmallpilotstudytoseeifhumanscandistinguishmemorablefromnon-memorablequotes,assumingourIMDB-inducedlabelsasgoldstandard.Sixsubjects,allna-tivespeakersofEnglishandnoneanauthorofthispaper,werepresentedwith11or12pairsofmem-orablevs.non-memorablequotes;again,wecon-trolledforextra-textualeffectsbyensuringthatineachpairthetwoquotescomefromthesamemovie,arebythesamecharacter,havethesamelength,and 3Alsoavailablethere:otherexamplesandfactoids. subject numberofmatcheswith IMDb-inducedannotation A 11/11=100% B 11/12=92% C 9/11=82% D 8/11=73% E 7/11=64% F 7/12=58% macroavg —78% Table2:Humanpilotstudy:numberofmatchestoIMDb-inducedannotation,orderedbydecreasingmatchpercentage.Forthenullhypothesisofrandomguessing,theseresultsarestatisticallysignicant,p2�6:016.appearasnearlyaspossibleinthesamescene.4Theorderofquoteswithinpairswasrandomized.Im-portantly,becausewewantedtounderstandwhetherthelanguageofthequotesbyitselfcontainssignalsaboutmemorability,wechosequotesfrommoviesthatthesubjectssaidtheyhadnotseen.(Thismeansthateachsubjectsawadifferentsetofquotes.)Moreover,thesubjectswererequestednottoconsultanyexternalsourcesofinformation.5Thereaderiswelcometotryademoversionofthetaskathttp://www.cs.cornell.edu/cristian/memorability.html.Table2showsthatallthesubjectsperformed(sometimesmuch)betterthanchance,andagainstthenullhypothesisthatallsubjectsareguessingran-domly,theresultsarestatisticallysignicant,p2�6:016.Thesepreliminaryndingsprovideev-idenceforthevalidityofourtask:despitetheappar-entdifcultyofthejob,evenhumanswhohaven'tseenthemovieinquestioncanrecoverourIMDb- 4Inthispilotstudy,weallowedmulti-sentencequotes.5Wedidnotusecrowd-sourcingbecausewesawnowaytoensurethatthisconditionwouldbeobeyedbyarbitrarysubjects.Wedonote,though,thatafterourresearchwascompletedandasofApr.26,2012,11,300peoplecompletedtheonlinetest:averageaccuracy:72%,modenumbercorrect:9/12. inducedlabelswithsomereliability.62.3IncorporatingsearchenginecountsThusfarwehavediscussedadatasetinwhichmem-orabilityisdeterminedthroughanexplicitlabel-ingdrawnfromtheIMDb.Giventhe“produc-tion”aspectofmemorabilitydiscussedinx1,weshouldalsoexpectthatmemorablequoteswilltendtoappearmoreextensivelyonWebpagesthannon-memorablequotes;notethatincorporatingthisin-sightmakesitpossibletousethe(implicit)judg-mentsofamuchlargernumberofpeoplethanarerepresentedbytheIMDbdatabase.Itthereforemakessensetotryusingsearch-engineresultcountsasasecondindicationofmemorability.Weexperimentedwithseveralwaysofconstruct-ingmemorabilityinformationfromsearch-enginecounts,butthisprovedchallenging.Searchingforaquoteasastand-alonephraserunsintotheprob-lemthatanumberofquotesarealsosentencesthatpeopleusewithoutthemovieinmind,andsohighcountsforsuchquotesdonottestifytothephrase'sstatusasamemorablequotefromthemovie.Ontheotherhand,searchingforthequoteinaBooleanconjunctionwiththemovie'stitlediscardsmostoftheseuses,butalsoeliminatesalargefractionoftheappearancesontheWebthatwewanttond:preciselybecausememorablequotestendtohavewidespreadculturalusage,peoplegenerallydon'tfeeltheneedtoincludethemovie'stitlewhenin-vokingthem.Finally,sincewearedealingwithroughly1000movies,theresultcountsvaryoveranenormousrange,fromrecentblockbusterstomovieswithrelativelysmallfanbases.Intheend,wefoundthatitwasmoreeffectivetousetheresultcountsinconjunctionwiththeIMDblabels,sothatthecountsplayedtheroleofanad-ditionallterratherthanafree-standingnumericalvalue.Thus,foreachpair(M;N)producedusingtheIMDbmethodologyabove,wesearchedforeachofMandNasquotedexpressionsinaBooleancon-junctionwiththetitleofthemovie.WethenkeptonlythosepairsforwhichM(i)producedmorethanveresultsinour(quoted,conjoined)search,and(ii)producedatleasttwiceasmanyresultsasthecor- 6Theaverageaccuracybeingbelow100%reinforcesthatcontextisveryimportant,too.respondingsearchforN.WecreatedaversionofthisltereddatasetusingeachofGoogleandBing,andallthemainndingswereconsistentwiththeresultsontheIMDb-onlydataset.Thus,inwhatfol-lows,wewillfocusonthemainIMDb-onlydataset,discussingtherelationshiptothedatasetlteredbysearchenginecountswhererelevant(inwhichcasewewillrefertothe+Googledataset).3Neversendahumantodoamachine'sjob.Wenowdiscussexperimentsthatinvestigatethehy-pothesesdiscussedinx1.Inparticular,wedevisemethodsthatcanassessthedistinctivenessandgen-eralityhypothesesandtestwhetherthereexistsano-tionof“memorablelanguage”thatoperatesacrossdomains.Inaddition,weevaluateandcomparethepredictivepowerofthesehypotheses.3.1DistinctivenessOneofthehypothesesweexamineiswhethertheuseoflanguageinmemorablequotesistosomeex-tentunusual.Inordertoquantifythelevelofdis-tinctivenessofaquote,wetakealanguage-modelapproach:wemodel“commonlanguage”usingthenewswiresectionsoftheBrowncorpus[21]7,andevaluatehowdistinctiveaquoteisbyevaluat-ingitslikelihoodwithrespecttothismodel—thelowerthelikelihood,themoredistinctive.Inor-dertoassessdifferentlevelsoflexicalandsyntacticdistinctiveness,weemployatotalofsixLaplace-smoothed8languagemodels:1-gram,2-gram,and3-gramwordLMsand1-gram,2-gramand3-grampart-of-speech9LMs.Wendstrongevidencethatfromalexicalper-spective,memorablequotesaremoredistinctivethantheirnon-memorablecounterparts.Asindi-catedinTable3,foreachofourlexical“commonlanguage”models,inabout60%ofthequotepairs,thememorablequoteismoredistinctive.Interestingly,thereverseistruewhenitcomesto 7Resultswerequalitativelysimilarifweusedthectionpor-tions.TheageoftheBrowncorpusmakesitlesslikelytocon-tainmodernmoviequotes.8WeemployLaplace(additive)smoothingwithasmoothingparameterof0.2.Thelanguagemodels'vocabularywasthatoftheentiretrainingcorpus.9Throughoutweobtainpart-of-speechtagsbyusingtheNLTKmaximumentropytaggerwithdefaultparameters. “commonlanguage”model IMDb-only +Google lexical1-gram 61:13% 59:21% 2-gram 59:22% 57:03% 3-gram 59:81% 58:32% syntactic1-gram 43:60% 44:77% 2-gram 48:31% 47:84% 3-gram 50:91% 50:92% Table3:Distinctiveness:percentageofquotepairsinwhichthethememorablequoteismoredistinctivethanthenon-memorableoneaccordingtotherespec-tive“commonlanguage”model.Signicanceaccord-ingtoatwo-tailedsigntestisindicatedusing*-notation(=“p.001”).syntax:memorablequotesappeartofollowthesyn-tacticpatternsof“commonlanguage”ascloselyasormorecloselythannon-memorablequotes.To-gether,theseresultssuggestthatmemorablequotesconsistofunusualwordsequencesbuiltoncommonsyntacticscaffolding.3.2GeneralityAnotherofourhypothesesisthatmemorablequotesareeasiertouseoutsidethespeciccontextinwhichtheywereuttered—thatis,more“portable”—andthereforeexhibitfewertermsthatrefertothoseset-tings.Weusethefollowingsyntacticpropertiesasproxiesforthegeneralityofaquote:Fewer3rd-personpronouns,sincethesecom-monlyrefertoapersonorobjectthatwasintro-ducedearlierinthediscourse.Utterancesthatemployfewersuchpronounsareeasiertoadapttonewcontexts,andsowillbeconsideredmoregeneral.Moreindenitearticleslikeaandan,sincetheyaremorelikelytorefertogeneralconceptsthandenitearticles.Quoteswithmoreinde-nitearticleswillbeconsideredmoregeneral.Fewerpasttenseverbsandmorepresenttenseverbs,sincetheformeraremorelikelytorefertospecicpreviousevents.Thereforeutterancesthatemployfewerpasttenseverbs(andmorepresenttenseverbs)willbeconsid-eredmoregeneral.Table4givestheresultsforeachofthesefourmetrics—ineachcase,weshowthepercentageof Generalitymetric IMDb-only +Google fewer3rdpers.pronouns 64:37% 62:93% moreindef.article 57:21% 58:23% lesspasttense 57:91% 59:74% morepresenttense 54:60% 55:86% Table4:Generality:percentageofquotepairsinwhichthememorablequoteismoregeneralthanthenon-memorableonesaccordingtotherespectivemetric.Pairswherethemetricdoesnotdistinguishbetweenthequotesarenotconsidered.quotepairsforwhichthememorablequotescoresbetteronthegeneralitymetric.Notethatbecausetheissueofgeneralityisacom-plexoneforwhichthereisnostraightforwardsinglemetric,ourapproachhereisbasedonseveralprox-iesforgenerality,consideredindependently;yet,astheresultsshow,allofthesepointinaconsistentdirection.Itisaninterestingopenquestiontode-velopricherwaysofassessingwhetheraquotehasgreatergenerality,inthesensethatpeopleintuitivelyattributetomemorablequotes.3.3“Memorable”languagebeyondmoviesOneofthemotivatingquestionsinouranalysisiswhethertherearegeneralprinciplesunderlying“memorablelanguage.”Theresultsthusfarsuggestpotentialfamiliesofsuchprinciples.Afurtherques-tioninthisdirectioniswhetherthenotionofmem-orabilitycanbeextendedacrossdifferentdomains,andforthiswecollected(anddistributeonourweb-site)431phrasesthatwereexplicitlydesignedtobememorable:advertisingslogans(e.g.,“Qualitynevergoesoutofstyle.”).Thefocusonslogansisalsoinkeepingwithoneoftheinitialmotivationsinstudyingmemorability,namely,marketingappli-cations—inotherwords,assessingwhetherapro-posedsloganhasfeaturesthatareconsistentwithmemorabletext.Thefactthatit'snotclearhowtoconstructacol-lectionof“non-memorable”counterpartstoslogansappearstoposeatechnicalchallenge.However,wecanstillusealanguage-modelingapproachtoas-sesswhetherthetextualpropertiesoftheslogansareclosertothememorablemoviequotes(asonewouldconjecture)ortothenon-memorablemoviequotes.Specically,wetrainonelanguagemodelonmemo-rablequotesandanotheronnon-memorablequotes (Non)memorablelanguagemodels Slogans Newswire lexical1-gram 56:15% 33:77% 2-gram 51:51% 25:15% 3-gram 52:44% 28:89% syntactic1-gram 73:09% 68:27% 2-gram 64:04% 50:21% 3-gram 62:88% 55:09% Table5:Cross-domainconceptof“memorable”lan-guage:percentageofslogansthathavehigherlikelihoodunderthememorablelanguagemodelthanunderthenon-memorableone(foreachofthesixlanguagemodelscon-sidered).Rightmostcolumn:forreference,thepercent-ageofnewswiresentencesthathavehigherlikelihoodun-derthememorablelanguagemodelthanunderthenon-memorableone. Generalitymetric slogans mem. n-mem. %3rdpers.pronouns 2:14% 2:16% 3:41% %indenitearticles 2:68% 2:63% 2:06% %pasttense 14:60% 21:13% 26:69% Table6:Slogansaremostgeneralwhencomparedtomemorableandnon-memorablequotes.(%sof3rdpers.pronounsandindenitearticlesarerelativetoalltokens,%sofpasttensearerelativetoallpastandpresentverbs.)andcomparehowlikelyeachsloganistobepro-ducedaccordingtothesetwomodels.AsshowninthemiddlecolumnofTable5,wendthatslogansarebetterpredictedbothlexicallyandsyntacticallybytheformermodel.Thisresultthusoffersevi-denceforaconceptof“memorablelanguage”thatcanbeappliedbeyondasingledomain.Wealsonotethatthehigherlikelihoodofslogansundera“memorablelanguage”modelisnotsimplyoccurringforthetrivialreasonthatthismodelpre-dictsallotherlargebodiesoftextbetter.Inpartic-ular,thenewswiresectionoftheBrowncorpusispredictedbetteratthelexicallevelbythelanguagemodeltrainedonnon-memorablequotes.Finally,Table6showsthatslogansemploygen-erallanguage,inthesensethatforeachofourgeneralitymetrics,weseeaslogans/memorable-quotes/non-memorablequotesspectrum.3.4PredictiontaskWenowshowhowtheprinciplesdiscussedabovecanprovidefeaturesforabasicpredictiontask,cor-respondingtothetaskinourhumanpilotstudy:givenapairofquotes,identifythememorableone.Ourrstformulationofthepredictiontaskusesastandardbag-of-wordsmodel10.Iftherewerenoinformationinthetextualcontentofaquotetodeterminewhetheritwerememorable,thenanSVMemployingbag-of-wordsfeaturesshouldper-formnobetterthanchance.Instead,though,itob-tains59.67%(10-foldcross-validation)accuracy,asshowninTable7.Wethendevelopmodelsusingfeaturesbasedonthemeasuresformulatedearlierinthissection:generalitymeasures(thefourlistedinTable4);distinctivenessmeasures(likelihoodac-cordingto1,2,and3-gram“commonlanguage”modelsatthelexicalandpart-of-speechlevelforeachquoteinthepair,theirdifferences,andpair-wisecomparisonsbetweenthem);andsimilarity-to-slogansmeasures(likelihoodaccordingto1,2,and3-gramslogan-languagemodelsatthelexicalandpart-of-speechlevelforeachquoteinthepair,theirdifferences,andpairwisecomparisonsbetweenthem).Evenarelativelysmallnumberofdistinctive-nessfeatures,ontheirown,improvesignicantlyoverthemuchlargerbag-of-wordsmodel.Whenweincludeadditionalfeaturesbasedongeneralityandlanguage-modelfeaturesmeasuringsimilaritytoslogans,theperformanceimprovesfurther(lastlineofTable7).Thus,themainconclusionfromthesepredictiontasksisthatabstractingnotionssuchasdistinctive-nessandgeneralitycanproducerelativelystream-linedmodelsthatoutperformmuchheavier-weightbag-of-wordsmodels,andcansuggeststepstowardapproachingtheperformanceofhumanjudgeswho—verymuchunlikeoursystem—havethefullcul-turalcontextinwhichmoviesoccurattheirdisposal.3.5OthercharacteristicsWealsomadesomeauxiliaryobservationsthatmaybeofinterest.Specically,wenddifferencesinlet-terandsounddistribution(e.g.,memorablequotes—aftercurse-wordremoval—usesignicantlymore“frontsounds”(labialsorfrontvowelssuchasrepresentedbytheletteri)andsignicantlyfewer“backsounds”suchastheonerepresentedbyu),11 10Wediscardedtermsappearingfewerthan10times.11Thesendingsmayrelatetomarketingresearchonsoundsymbolism[7,19,40]. Featureset #feats Accuracy bagofwords 962 59:67% distinctiveness 24 62:05% generality 4 56:70% slogansim. 24 58:30% allthreetypestogether 52 64:27% Table7:Prediction:SVM10-foldcrossvalidationresultsusingtherespectivefeaturesets.Randombaselineaccu-racyis50%.Accuraciesstatisticallysignicantlygreaterthanbag-of-wordsaccordingtoatwo-tailedt-testarein-dicatedwith*(p.05)and**(p.01).wordcomplexity(e.g.,memorablequotesusewordswithsignicantlymoresyllables)andphrasecom-plexity(e.g.,memorablequotesusefewercoordi-natingconjunctions).Thelattertwoareinlinewithourdistinctivenesshypothesis.4Alongtimeago,inagalaxyfar,farawayHowanitem'slinguisticformaffectsthereactionitgenerateshasbeenstudiedinseveralcontexts,in-cludingevaluationsofproductreviews[9],politicalspeeches[12],on-lineposts[13],scienticpapers[14],andretweetingofTwitterposts[36].Weuseadifferentsetoffeatures,abstractingthenotionsofdistinctivenessandgenerality,inordertofocusonthesehigher-levelaspectsofphrasingratherthanonparticularlower-levelfeatures.Relatedtoourinterestindistinctiveness,workinadvertisingresearchhasstudiedtheeffectofsyntac-ticcomplexityonrecognitionandrecallofslogans[5,6,24].TheremayalsobeconnectionstoVonRestorff'sisolationeffectHunt[17],whichassertsthatwhenallbutoneiteminalistaresimilarinsomeway,memoryforthedifferentitemisenhanced.Relatedtoourinterestingenerality,Knappetal.[20]surveyedsubjectsregardingmemorablemes-sagesorpiecesofadvicetheyhadreceived,ndingthattheabilitytobeappliedtomultipleconcretesit-uationswasanimportantfactor.Memorability,althoughdistinctfrom“memoriz-ability”,relatestoshort-andlong-termrecall.ThornandPage[34]surveysub-lexical,lexical,andse-manticattributesaffectingshort-termmemorabilityoflexicalitems.Studiesofverbatimrecallhavealsoconsideredthetaskofdistinguishinganexactquotefromcloseparaphrases[3].Investigationsoflong-termrecallhaveincludedstudiesofculturallysignif-icantpassagesoftext[29]andndingsregardingtheeffectofrhetoricaldevicesofalliterative[4],“rhyth-mic,poetic,andthematicconstraints”[18,26].Finally,therearecomplexconnectionsbetweenhumorandmemory[32],whichmayleadtointerac-tionswithcomputationalhumorrecognition[25].5Ithinkthisisthebeginningofabeautifulfriendship.Motivatedbythebroadquestionofwhatkindsofin-formationachievewidespreadpublicawareness,westudiedthetheeffectofphrasingonaquote'smem-orability.Achallengeisthatquotesdiffernotonlyinhowtheyareworded,butalsoinwhosaidthemandunderwhatcircumstances;todealwiththisdif-culty,weconstructedacontrolledcorpusofmoviequotesinwhichlinesdeemedmemorablearepairedwithnon-memorablelinesspokenbythesamechar-acteratapproximatelythesamepointinthesamemovie.Aftercontrollingforcontextandsituation,memorablequoteswerestillfoundtoexhibit,onav-erage(therewillalwaysbeindividualexceptions),signicantdifferencesfromnon-memorablequotesinseveralimportantrespects,includingmeasurescapturingdistinctivenessandgenerality.Ourex-perimentswithslogansshowhowtheprinciplesweidentifycanextendtoadifferentdomain.Futureworkmayleadtoapplicationsinmarket-ing,advertisingandeducation[4].Moreover,thesubtlenatureofmemorability,anditsconnectiontoresearchinpsychology,suggestsarangeoffurtherresearchdirections.Webelievethattheframeworkdevelopedherecanserveasthebasisforfurthercomputationalstudiesoftheprocessbywhichinfor-mationtakesholdinthepublicconsciousness,andtherolethatlanguageeffectsplayinthisprocess.Mymotherthanksyou.Myfatherthanksyou.Mysisterthanksyou.AndIthankyou:Re-beccaHwa,EvieKleinberg,DianaMinculescu,AlexNiculescu-Mizil,JenniferSmith,BenjaminZimmer,andtheanonymousreviewersforhelpfuldiscussionsandcomments;ourannotatorsStevenAn,LarsBackstrom,EricBaumer,JeffChadwick,EvieKleinberg,andMyleOtt;andthemakersofCepacol,Robitussin,andSudafed,whoseproductsgotusthroughthesubmissiondeadline.ThispaperisbaseduponworksupportedinpartbyNSFgrantsIIS-0910664,IIS-1016099,Google,andYahoo! References[1]EytanAdar,LiZhang,LadaA.Adamic,andRajanM.Lukose.Implicitstructureandthedynamicsofblogspace.InWorkshopontheWebloggingEcosystem,2004.[2]LarsBackstrom,DanHuttenlocher,JonKlein-berg,andXiangyangLan.Groupformationinlargesocialnetworks:Membership,growth,andevolution.InProceedingsofKDD,2006.[3]ElizabethBates,WalterKintsch,CharlesR.Fletcher,andVittoriaGiuliani.Theroleofpronominalizationandellipsisintexts:Somememoryexperiments.JournalofExperimentalPsychology:HumanLearningandMemory,6(6):676–691,1980.[4]FrankBoersandSethLindstromberg.Find-ingwaystomakephrase-learningfeasible:Themnemoniceffectofalliteration.System,33(2):225–238,2005.[5]SamuelD.BradleyandRobertMeeds.Surface-structuretransformationsandadvertis-ingslogans:Thecaseformoderatesyntacticcomplexity.PsychologyandMarketing,19:595–619,2002.[6]RobertChamblee,RobertGilmore,GloriaThomas,andGarySoldow.Whencopycom-plexitycanhelpadreadership.JournalofAd-vertisingResearch,33(3):23–23,1993.[7]JohnColapinto.Famousnames.TheNewYorker,pages38–43,2011.[8]CristianDanescu-Niculescu-MizilandLillianLee.Chameleonsinimaginedconversations:Anewapproachtounderstandingcoordinationoflinguisticstyleindialogs.InProceedingsoftheWorkshoponCognitiveModelingandComputationalLinguistics,2011.[9]CristianDanescu-Niculescu-Mizil,GueorgiKossinets,JonKleinberg,andLillianLee.Howopinionsarereceivedbyonlinecommu-nities:AcasestudyonAmazon.comhelpful-nessvotes.InProceedingsofWWW,pages141–150,2009.[10]StuartFischoff,EsmeraldaCardenas,AngelaHernandez,KoreyWyatt,JaredYoung,andRachelGordon.Popularmoviequotes:Re-ectionsofapeopleandaculture.InAnnualConventionoftheAmericanPsychologicalAs-sociation,2000.[11]DanielGruhl,R.Guha,DavidLiben-Nowell,andAndrewTomkins.Informationdiffusionthroughblogspace.ProceedingsofWWW,pages491–501,2004.[12]MarcoGuerini,CarloStrapparava,andOlivieroStock.Trustingpoliticians'words(forpersuasiveNLP).InProceedingsofCICLing,pages263–274,2008.[13]MarcoGuerini,CarloStrapparava,andG¨ozde¨Ozbal.Exploringtextviralityinsocialnet-works.InProceedingsofICWSM(poster),2011.[14]MarcoGuerini,AlbertoPepe,andBrunoLepri.Dolinguisticstyleandreadabilityofscienticabstractsaffecttheirvirality?InPro-ceedingsofICWSM,2012.[15]RichardJacksonHarris,AbigailJ.Werth,KyleE.Bures,andChelseaM.Bartel.Socialmoviequoting:What,why,andhow?CienciasPsicologicas,2(1):35–45,2008.[16]ChipHeath,ChrisBell,andEmilySteinberg.Emotionalselectioninmemes:Thecaseofurbanlegends.JournalofPersonality,81(6):1028–1041,2001.[17]R.ReedHunt.Thesubtletyofdistinctiveness:WhatvonRestorffreallydid.PsychonomicBulletin&Review,2(1):105–112,1995.[18]IraE.HymanJr.andDavidC.Rubin.Mem-orabeatlia:Anaturalisticstudyoflong-termmemory.Memory&Cognition,18(2):205–214,1990.[19]RichardR.Klink.Creatingbrandnameswithmeaning:Theuseofsoundsymbolism.Mar-ketingLetters,11(1):5–20,2000.[20]MarkL.Knapp,CynthiaStohl,andKath-leenK.Reardon.“Memorable”mes-sages.JournalofCommunication,31(4):27–41,1981.[21]HenryKuceraandW.NelsonFrancis.Compu-tationalanalysisofpresent-dayAmericanEn-glish.DartmouthPublishingGroup,1967. [22]JureLeskovec,LadaAdamic,andBernardoHuberman.Thedynamicsofviralmarket-ing.ACMTransactionsontheWeb,1(1),May2007.[23]JureLeskovec,LarsBackstrom,andJonKlein-berg.Meme-trackingandthedynamicsofthenewscycle.InProceedingsofKDD,pages497–506,2009.[24]TinaM.Lowrey.Therelationbetweenscriptcomplexityandcommercialmemorabil-ity.JournalofAdvertising,35(3):7–15,2006.[25]RadaMihalceaandCarloStrapparava.Learn-ingtolaugh(automatically):Computationalmodelsforhumorrecognition.ComputationalIntelligence,22(2):126–142,2006.[26]MilmanParryandAdamParry.ThemakingofHomericverse:ThecollectedpapersofMil-manParry.ClarendonPress,Oxford,1971.[27]EverettRogers.DiffusionofInnovations.FreePress,fourthedition,1995.[28]DanielM.Romero,BrendanMeeder,andJonKleinberg.Differencesinthemechanicsofinformationdiffusionacrosstopics:Idioms,politicalhashtags,andcomplexcontagiononTwitter.ProceedingsofWWW,pages695–704,2011.[29]DavidC.Rubin.Verylong-termmemoryforproseandverse.JournalofVerbalLearningandVerbalBehavior,16(5):611–621,1977.[30]NathanSchneider,RebeccaHwa,PhilipGi-anfortoni,DipanjanDas,MichaelHeilman,AlanW.Black,FrederickL.Crabbe,andNoahA.Smith.Visualizingtopicalquotationsovertimetounderstandnewsdiscourse.Tech-nicalReportCMU-LTI-01-103,CMU,2010.[31]DavidStrangandSarahSoule.Diffusioninor-ganizationsandsocialmovements:Fromhy-bridcorntopoisonpills.AnnualReviewofSo-ciology,24:265–290,1998.[32]HannahSummerfelt,LouisLippman,andIraE.HymanJr.Theeffectofhumoronmem-ory:Constrainedbythepun.TheJournalofGeneralPsychology,137(4),2010.[33]EricSun,ItamarRosenn,CameronMarlow,andThomasM.Lento.Gesundheit!Model-ingcontagionthroughFacebookNewsFeed.InProceedingsofICWSM,2009.[34]AnnabelThornandMikePage.InteractionsBetweenShort-TermandLong-TermMemoryintheVerbalDomain.PsychologyPress,2009.[35]LouisL.Thurstone.Alawofcomparativejudgment.PsychologicalReview,34(4):273–286,1927.[36]OrenTsurandAriRappoport.What'sinaHashtag?Contentbasedpredictionofthespreadofideasinmicrobloggingcommunities.InProceedingsofWSDM,2012.[37]FangWu,BernardoA.Huberman,LadaA.Adamic,andJoshuaR.Tyler.Informationowinsocialgroups.PhysicaA:StatisticalandTheoreticalPhysics,337(1-2):327–335,2004.[38]ShaomeiWu,JakeM.Hofman,WinterA.Ma-son,andDuncanJ.Watts.WhosayswhattowhomonTwitter.InProceedingsofWWW,2011.[39]JaewonYangandJureLeskovec.Patternsoftemporalvariationinonlinemedia.InPro-ceedingsofWSDM,2011.[40]EricYorkstonandGeetaMenon.Asoundidea:Phoneticeffectsofbrandnamesonconsumerjudgments.JournalofConsumerResearch,31(1):43–51,2004. Youhadmeathello:HowphrasingaffectsmemorabilityCristianDanescu-Niculescu-MizilJustinChengJonKleinbergLillianLeeDepartmentofComputerScienceCornellUniversitycristian@cs.cornell.edu,jc882@cornell.edu,kleinber@cs.cornell.edu,llee@cs.cornell.eduAbstractUnderstandingthewaysinwhichinformationachieveswidespreadpublicawarenessisare-searchquestionofsignicantinterest.Weconsiderwhether,andhow,thewayinwhichtheinformationisphrased—thechoiceofwordsandsentencestructure—canaffectthisprocess.Tothisend,wedevelopananaly-sisframeworkandbuildacorpusofmoviequotes,annotatedwithmemorabilityinforma-tion,inwhichweareabletocontrolforboththespeakerandthesettingofthequotes.Wendthattherearesignicantdifferencesbe-tweenmemorableandnon-memorablequotesinseveralkeydimensions,evenaftercontrol-lingforsituationalandcontextualfactors.Oneislexicaldistinctiveness:inaggregate,memo-rablequotesuselesscommonwordchoices,butatthesametimearebuiltuponascaf-foldingofcommonsyntacticpatterns.An-otheristhatmemorablequotestendtobemoregeneralinwaysthatmakethemeasytoap-plyinnewcontexts—thatis,moreportable.Wealsoshowhowtheconceptof“memorablelanguage”canbeextendedacrossdomains.ToappearatACL2012;nalversion1Hello.MynameisInigoMontoya.Understandingwhatitemswillberetainedinthepublicconsciousness,andwhy,isaquestionoffun-damentalinterestinmanydomains,includingmar-keting,politics,entertainment,andsocialmedia;asweallknow,manyitemsbarelyregister,whereasotherscatchonandtakeholdinmanypeople'sminds.Anactivelineofrecentcomputationalworkhasemployedavarietyofperspectivesonthisquestion.Buildingonafoundationinthesociologyofdiffu-sion[27,31],researchershaveexploredthewaysinwhichnetworkstructureaffectsthewayinformationspreads,withdomainsofinterestincludingblogs[1,11],email[37],on-linecommerce[22],andso-cialmedia[2,28,33,38].Therehasalsobeenrecentresearchaddressingtemporalaspectsofhowdiffer-entmediasourcesconveyinformation[23,30,39]andwaysinwhichpeoplereactdifferentlytoinfor-mationondifferenttopics[28,36].Beyondallthesefactors,however,one'severydayexperiencewiththesedomainssuggeststhatthewayinwhichapieceofinformationisexpressed—thechoiceofwords,thewayitisphrased—mightalsohaveafundamentaleffectontheextenttowhichittakesholdinpeople'sminds.Conceptsthatattainwidereachareoftencarriedinmessagessuchaspoliticalslogans,marketingphrases,oraphorismswhoselanguageseemsintuitivelytobememorable,“catchy,”orotherwisecompelling.Ourrstchallengeinexploringthishypothesisistodevelopanotionof“successful”languagethatispreciseenoughtoallowforquantitativeevaluation.Wealsofacethechallengeofdevisinganevaluationsettingthatseparatesthephrasingofamessagefromtheconditionsinwhichitwasdelivered—highly-citedquotestendtohavebeendeliveredundercom-pellingcircumstancesortanexistingcultural,po-litical,orsocialnarrative,andpotentiallywhatap-pealstousaboutthequoteisreallyjustitsinvoca-tionoftheseextra-linguisticcontexts.Istheformofthelanguageaddinganeffectbeyondorindepen- arXiv:1203.6360v2 [cs.CL] 30 Apr 2012