contemporaneouslywiththeultimatelyretractedarticlesandcrowdedeldsinwhichthemostrelatedarticlesachieveparticularlyhighPubMedrelatednessrankingsTheseresultssuggestthatthedegreeofscienticcompetit ID: 416938
Download Pdf The PPT/PDF document "Thespillovereectsofretractionsontheevol..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Thespillovereectsofretractionsontheevolutionofresearcheldsisparticularlyimpor-tantgiventhebroaderwelfareimplicationsthatarisefromscientistsshiftingtheirpositionin\intellectualspace"(Aghionetal.2008,Acemoglu2012,BorjasandDoran2012).However,evidenceiscurrentlylimited.Asastartingpoint,systematicdataonjournalarticleretrac-tionsshowsastrongupwardtrendinfrequency,butasinthecaseofcriminalactivity,theunderlyingmagnitudeofscienticmistakesandmisdeedsremainspoorlyestablished(Mar-tinson,Anderson,anddeVries2005).Inaddition,arecentanalysisshowsthatthemajorityofretractionsarecausedbymisconduct(Fangetal.2012).Moresalientfortheevolutionofelds,Furman,Jensen,andMurray(2012)provideevidencethatretractionnoticesareeectiveinalertingfollow-onresearcherstotheshakyfoundationsofaparticularpaper.Citationstoretractedpapersdeclinebyover60%inthepost-retractionperiodrelativetocarefullymatchedcontrols.Theiranalysis,however,focusesonthefateoftheretractedpapersthemselves,notwhetherandtowhatextentretractionsin uencetheevolutionofadjacentresearchareas.Italsodoesnotdistinguishbetweendierenttypesoffalsescienceassociatedwithretractedevents,althoughthisheterogeneityisofprimaryimportancesincetheinformationthatretractionprovidesregardingtheveracityofassociatedknowledgecanvarywidely.Thus,thechallengeforourpaperistoelucidatetheimpactofdierenttypesofretractionsonrelatedresearchlinesandthemagnitudeofspilloverstoresearchinproximateintellectualspace.OurconceptualapproachfollowsAcemoglu(2012),Aghionandco-authors(2009),andothersinunderstandingresearchasarisingthroughacumulativeprocessalongandacrossresearchlinesthatcanbetracedoutempiricallythroughcitationsfromonepublicationtoanother(e.g.,MurrayandStern2007).Thisapproachisgroundedintheassumptionthatknowledgeaccumulatesasresearcherstaketheknowledgeinaparticularpublicationanduseitasasteppingstonefortheirfollow-oninvestigations(Mokyr2002).Althoughitisacommonplaceinsightthattheprocessofknowledgeaccumulationunfoldswithinanintellectualspace(e.g.,Hull1988),ithasprovensurprisinglydicultforsocialscientiststogainempiricaltractiononthisconcept(seeAzoulay,GraZivin,andWang[2010]andBorjasandDoran[2012]forrareexceptions).Weconceptualizeretractioneventsas\shocks"tothestructureoftheintellectualneighborhoodsaroundtheretractedpapers,andimplementaproceduretodelineatetheboundariesofthisspaceintermsofrelatedpublicationsinawaythatisscalableandtransparent,andwithscantrelianceonhumanjudgement.Wearetheninterestedinstudyingwhetherresearchersincreaseordecreasetheirrelianceonrelatedpapersfollowingtheretractionevent.Wedierentiatethiscumulativeresponseacrossthree2 contemporaneouslywiththeultimatelyretractedarticles,and\crowded"elds,inwhichthemost-relatedarticlesachieveparticularlyhighPubMedrelatednessrankings.Theseresultssuggestthatthedegreeofscienticcompetitionwithinaeldimpactsthewayinwhichnegativeshocksaectknowledgeaccumulation.Weconcludeouranalysisbyexaminingtheproximatecausesandpotentialunderlyingmechanismsbehindtheobservedcitationdecline.Wendevidencethatpublicationratesintheeldsaectedbyaretractionmarkedlydecreasefollowingretraction,relativetocontrolelds.Similarly,wendthatfundingbyNationalInstitutesofHealth(NIH)intheseeldsdeclinesinanevensharperfashion.Weconsidertwomechanismsthatmayliebehindtheseeects.First,weexamineevidenceregardingthestrengthofalearninginterpretationrela-tivetoonebasedonstatusconcerns.Ontheonehand,wemightsimplybeobservingthatretractioneventsenablescientiststodiscoverthataparticulareldoersfewerprospectsofimportantndingsthanwaspreviouslybelieved,leavingthemtosubstituteawayfromthateldontolinesofresearchthatarenotdirectlyadjacenttotheretractedknowledge.Alterna-tively,scientistsintheaectedeldsmightbelievethattheirreputationwillbebesmirchediftheytietheirscienticagendatootightlytoaeldthathasbeen\contaminated"byaretraction.Statusconcernsofthiskindwouldjustassurelydriveawayprevious(orpoten-tial)participantsintheeld,butsuchshiftswouldthistimebeconstruedasconstitutingunder-investmentintheaectedareasfromawelfarestandpoint.Wendsuggestiveevidencethatthestatusinterpretationaccountsforatleastpartofthedamagesueredbyretraction-aictedelds.First,wedocumentthat,eveninthesetofarticlesrelatedtoretractionsoeringentirelyabsentshoulderstofollow-onresearchers,intentmattersinmodulatingtheobservedcitationresponses:thepenaltysueredbyrelatedarticlesismuchmoreseverewhentheassociatedsourcearticlewasretractedbecauseoffraudormisconduct,relativetocaseswheretheretractionoccurredbecauseof\honestmistakes."Second,startingfromthepremisethatstatusconsiderationsarelesslikelytodrivethecitingbehaviorofscientistsemployedinindustry,relativetothatofacademicciters,weshowthattheformeraremuchlessresponsivetotheretractioneventthanthelatter.Whilealearningstorysuggestsstrengtheningtheretractionsysteminitscurrentincarnation,theevidenceforastatusexplanationsuggeststhatresearchersoverreacttoretractionnoticesunderthecurrentsystem.Intheremainderofthepaper,weexaminetheinstitutionalcontextforretractionsasthecentralapproachtogoverningscienticmistakesandmisconductandlayoutourbroadempiricalstrategy.Wethenturntodata,methodsandadetailedpresentationofourresults.4 2.1InstitutionalContextVeryfewpracticesorsystemsexisttoidentifyandsignalresearchmisconductorerror.IntheUnitedStates,keypublicfundershavecreatedanOceofResearchIntegrity(ORI)toinvestigateallegationsoffraudormisconduct(PozziandDavid2007).Morebroadlyapplicableisthesystemofretractionsusedbyjournalsthemselvestoalertreaderswhenaresearchpublicationisstrickenfromthescienticliterature.Retractionscanbemadebyallorsomeoftheauthorsofapublication,orbythejournal'seditor,directlyorattherequestoftheauthorsemployer.Theseeventscanoccurforavarietyofreasons,aswedescribebelow.Retractioneventsremainveryrare,withtheunconditionaloddsofretractionstandingataboutonepertenthousand,regardlessofthedatasourceusedtocalculatetheseodds(seeLuetal.2013fortabulationsstemmingfromThomson-Reuters'WebofSciencedatabase).FigureAofSectionIintheonlineappendixdocumentssecularincreasesintheincidenceofretractionsinPubMed,wherethisincidenceismeasuredbothasarawfrequencyandasaproportionrelativetothetotalsizeofthePubMeduniverse.4Asamatterofinstitutionaldesign,thesystemofretractionstreadsatreacherousmiddlegroundinmanagingtheintegrityofscienticknowledge.Atoneendofthespectrum,scienticsocietiesandjournalscouldmakesignicantinvestmentsinreplicatingandverifyingallstudiespriortopublication,whileattheotherend,aknowledgeregistrationsystemwithnolteringmechanismcouldrequireresearcherstoexpendconsiderabletimeandenergyonreplicationandvalidation.Theactualsysteminexistencetodayreliesheavilyuponpeer-reviewbutprovidesonlylimitedguaranteethatpublishedknowledgeisofhighdelity.Asaresult,reputationalincentivesplayanessentialroletoensuretheintegrityofthescienticenterprise(Merton1973).Inpractice,retractionnoticesareidiosyncraticandvarywidelyintheamountofinfor-mationtheyprovide,rangingfromaonelinesentencetoamoreelaboratedstatementoftherationalebehindtheretractionevent.Understandingtheirimpactonthescienticcom-munityisofcentralimportancetotheprocessofcumulativeknowledgeproductionandinderivingimplicationsfortheallocationofresources,humanandnancial,withinandacrossscienticelds. 4Whilethispaperisnotfocusedonthedeterminantsoffalsesciencebutratheritsimpact,itisworthnotingthattheriseininstancesoffalsescience(oratleasttheincreaseinitsdocumentationviaretractionnotices)maybelinkedtoarangeoffactorsincludingtheincreasinglycomplexandcollaborativeorganizationofthescienticenterprise(Wutchy,JonesandUzzi2007)andthegrowingcompetitionforresourcesinscience.LaceteraandZirulia(2009)notethatcompetitionhasambiguouseectsontheincidenceofscienticmisconductsincescientistscanalsogainprominencebydetectinginstancesoffalsescience.6 ofIntentionalDeception"tocodecaseswheretheauthorsdidnotintendtodeceive,suchasinstancesofmiscommunication,contaminationofresearchmaterials,orcodingerror.\UncertainIntent"applieswherefraudisnotrmlyestablished,butnegligenceorunsub-stantiatedclaimsraisequestionsabouttheauthors'motives.The\IntentionalDeception"codeisreservedforcaseswherefalsication,misconduct,orwillfulactsofplagiarismandself-plagiarismappeartohaveoccurredandwereveriedbyauthoradmissionsorindepen-dentreviewsofmisconduct.Delineatingresearchelds.Todelineatetheboundariesoftheresearcheldsaectedbyretractedarticles,wedevelopanapproachbasedontopicsimilarityasinferredbytheoverlapinkeywordsbetweeneachretractedarticlesandtherestofthe(unretracted)scienticliterature.Specically,weusethePubMedRelatedCitationsAlgorithm(PMRA)whichreliesheavilyonMedicalSubjectHeadings(MeSH).MeSHtermsconstituteacontrolledvocabularymaintainedbytheNationalLibraryofMedicinethatprovidesaveryne-grainedpartitionoftheintellectualspacespannedbythebiomedicalresearchliterature.Importantlyforourpurposes,MeSHkeywordsareassignedtoeachscienticpublicationbyprofessionalindexersandnotbytheauthorsthemselves;theassignmentismadewithoutreferencetotheliteraturecitedinthearticle.Wethenusethe\RelatedArticles"functioninPubMedtoharvestjournalarticlesthatareproximatetotheretractedarticles,implicitlydeningascienticeldasthesetofarticleswhoseMeSHkeywordsoverlapwiththosetaggingtheultimatelyretractedarticle.Asabyproduct,PMRAprovidesuswithbothanordinalandacardinaldyadicmeasureofintellectualproximitybetweeneachrelatedarticleanditsassociatedretraction.Forthepurposesofourmainanalysis,weonlyconsiderrelatedarticlespublishedpriortotheretractiondate.Wedistinguishthosepublishedpriortotheretractedarticleandthosepublishedinthewindowbetweentheretractedarticle'spublicationdateandtheretractioneventitself.Further,wealsoexcluderelatedarticleswithanyco-authorsincommonwiththeretractedarticleinordertostripbareourmeasureofintellectualproximityfromany\associationalbaggage"stemmingfromcollaborationlinkages.Finally,webuildasetofcontrolarticlesbyselectingthe\nearestneighbors"oftherelatedarticles,i.e.,thearticlesappearingimmediatelybeforeorimmediatelyafterinthesamejournalandissue,asinFurmanandStern(2011)andFurmanetal.(2012a).6 6Weselectthenearestneighborsascontrolsonthepremisethattheorderingofpapersinjournalissuesisrandomorclosetorandom.Tovalidatethispremise,inanalysesavailablefromtheauthor,wereplicatetheresultsinTable8withanalternativecontrolgroupwhereonecontrolisselectedfromeachjournalissueliterallyatrandom.Theresultsdonotdiersubstantially.8 Afterpurgingfromthelistafewoddobservations,7weareleftwithasampleof1,104articles.8AsdetailedinSectionIIoftheonlineappendix,wedevelopanexhaustivecategoryschemetocodethereasonsthatexplaintheretractionevent.ThesereasonsaretabulatedinTable1.9Inournextstep,weclassifyeachretractionintooneofthreecategoriesthatdenotewhethertheresultscontainedinthesourcearticlecanberelieduponforfollow-onresearch.The\strongshoulders"subsamplecomprises202articlesretractedforreasonsthatdonotcastanyaspersiononthevalidityoftheresultscontainedtherein.Incontrast,weclassify589retractions(53.4%)asproviding\absentshoulders"forfollow-onscientiststostandon,oftenbecauseoffraudulentdataorothertypesofmisconduct.Finally,the\shakyshoulders"category(289eventsor26.2%ofthecases)groupsthoseretractioneventsforwhichthevalidityoftheresultsremainsshroudedinuncertainty.Mostofouranalysesfocusonthe589observationsbelongingtothe\absentshoulders"subsample(Table2).Thepapersinthissubsamplewerepublishedbetween1973and2007andtookanaveragetimeofthreeyearstoberetracted,thoughmanyofthemorerecentarticleswereretractedwithinoneyear|perhapsbecauseofahigherprobabilityofdetectionsincethedawnoftheelectronicpublishingera.Althoughthissubsampleisdominatedbyinstancesoffraudorothertypesofmisconduct,31%oftheeventsappeartobetheresultsofhonestmistakesonthepartoftheinvestigatorsinvolved,withafurther8%forwhichitisunclearwhetherthescientistsactivelysubvertedthescienticprocessinthecourseofperformingtheresearchandreportingitsresults.10Regardlessofintent,however,itwouldbeamistaketoconsidereachobservationascompletelyindependentfromalltheothersinthesample.Closetosixtypercentoftheobservationscanbegroupedintocasesinvolvingmorethanoneretractionevent,forexamplebecausethesamerogueinvestigatorcommittedfraudinmultiplepapers,orbecausethesamecontaminatedresearchmaterialswereusedinmultiplepublishedarticles.FigureB 7Theseincludeanarticleretractedandsubsequentlyunretracted,anerratumthatwasretractedbecauseofdisagreementwithintheauthorshipteamaboutwhethertheoriginalarticleindeedcontainedanerror,alongwithafewothers.8Incomparison,Luetal.(2013)extract1,465retractioneventsfromThomsonReuters'WebofScienceoverthesameperiod.TheWebofSciencecoversawidercross-sectionofscienticelds(includingthesocialsciencesandengineering),buthasshallowercoveragethanPubMedinthelifesciences.Bycombiningtheeventscorrespondingtolifesciencesjournalsaswellasmultidisciplinaryjournals|suchasScience,PNAS,orNature|itappearsthatthelifesciencesaccountforbetween60%and70%ofthetotalnumberofretractionsintheLuetal.sample.9Despiteextensiveeorts,wewereunabletolocatearetractionnoticein24(2.17%)cases.10Thisrepresentsaninversionoftherelativeprevalenceoffraudandmistakes,comparedtoanearlieranalysisperformedbyNathetal.(2006),butitisinlinewiththerecentresultsreportedbyFangetal.(2012).10 spacethatincludes100relatedrecords.12Givenoursetofsourcearticles,wedelineatethescienticeldstowhichtheybelongbyfocusingonthesetofarticlesreturnedbyPMRAthatsatisfyveadditionalconstraints:(i)theyareoriginalarticles(asopposedtoeditorials,comments,reviews,etc.);(ii)theywerepublisheduptotheyearthatprecedesthecalendaryearoftheunderlyingretractionevent;(iii)theyappearinjournalsindexedbytheWebofScience(sothatfollow-oncitationinformationcanbecollected);(iv)theydonotshareanyauthorwiththesource,and(v)theyarecitedatleastoncebyanotherarticleindexedbytheWebofScienceintheperiodbetweentheirpublicationyearand2011.FigureCofSectionIintheonlineappendixrunsthroughaspecicexampleinthesampletoillustratetheuseofPMRA.13SectionIIIoftheonlineappendixillustratesthroughanexamplehowPMRAprocessesMeSHkeywordinformationtodelineatetheboundariesofresearchelds.Forthesetof589retractionswithabsentshoulders,thenaldatasetcomprises32,699relatedarticlesthatcanbeorderedbyrelatednessusingbothanordinalmeasure(therankreturnedbyPMRA)aswellasacardinalmeasurewhichwenormalizesuchthatascoreof100%correspondstotherst\non-trivial"relatedrecord.14Asaresultofthesecomputationalanddesignchoices,theboundariesoftheeldswedelineatearederivedfromsemanticlinkagestotheexclusionofotherconsiderationssuchasbackwardandforwardcitationrelationships,orcoauthorships.Judgementandsubjectivityisconnedtotheinitialindexingtaskwhichassignskeywordstoindividualarticles.Theindividualsperformingthesetasksaretrainedinaconsistentway,drawthekeywordsfromacontrolledvocabularywhichevolvesonlyslowlyovertime,anddonothaveanyincentivesto\window-dress"thearticlestheyindexwithtermscurrentlyinvogueinordertocurryattentionfromreferees,editors,ormembersoffundingpanels.Ofcourse,thecostofthisapproachisthatitmayresultinboundariesbetweeneldsthatmightonlyimperfectlydovetailwiththecontoursofthescienticcommunitieswithwhichtheauthorsinoursamplewouldself-identify.Themainbenet,however,isthatitmakesitsensibletousecitationinformationtoevaluatewhetherthenarroweldsaroundeachretractedarticleatrophyorexpandfollowingeachretractionevent. 12However,thealgorithmembodiesatransitivityruleaswellasaminimumdistancecutorule,suchthattheeectivenumberofrelatedarticlesreturnedbyPMRAvariesbetween4and2,642inthelargersampleof1,104retractions,withameanof172recordsandamedianof121.13TofacilitatetheharvestingofPubMed-relatedrecordsonalargescale,wehavedevelopedanopen-sourcesoftwaretoolthatqueriesPubMedandPMRAandstorestheretrieveddatainaMySQLdatabase.Thesoftwareisavailablefordownloadathttp://www.stellman-greene.com/FindRelated/.14Asourcearticleisalwaystriviallyrelatedtoitself.TherelatednessmeasuresarebasedontherawdatareturnedbyPMRA,andignoretheltersappliedtogeneratethenalanalysisdataset,e.g.,eliminatingreviews,etc.12 whichtheyareindirectlyassociatedisalmosttwoordersofmagnitudesmallerthantherateofcitationthatlinkstheretractionswiththe\treated"(i.e.,related)articles.Citationdata.PubMeddoesnotcontaincitationdatabutwewereabletoretrievethisinformationfromtheWebofScience(uptotheendof2011)usingaperlscript.Wefurtherprocessthesedatatomakethemamenabletostatisticalanalysis.First,weeliminateallself-citations,whereself-citationisinferredbyoverlapbetweenanyofthecitedauthorswithanyofthecitingauthors(anauthornameisthecombinationformedbythelastnameandtherstinitialforthepurposeofthislter).Second,weparsethecitingarticledatatodistinguishbetweentheinstitutionalaliationsofciters,inparticularby aggingthecitingarticlesforwhichatleastoneoftheaddressesrecordedbytheWebofScienceisacorporateaddress,whichweinferfromthepresenceofabbreviationssuchasInc,Corp,GmbH,Ltd,etc.Wethenaggregatethisinformationatthecitedarticle-yearlevelofanalysis.Inotherwords,wecandecomposethetotalnumberofcitations owingtoindividualarticlesatagivenpointintimeintoa\private"anda\public"set,wherepubliccitationsshouldbeunderstoodasstemmingfromacademicscientists,broadlyconstrued(thiswillalsoincludescientistsemployedinthepublicsectoraswellasthoseemployedbynon-protresearchinstitutes).Citationsareanoisyandwidely-usedmeasureoftheimpactofapaperandtheattentionitreceives.Buttheuseofcitationdatatotraceoutthediusionofindividualbitsofscienticknowledgeissubjecttoanimportantcaveat.Citationscanbemadefor\strategic"ratherthan\substantial"reasons(cf.Lampe[2012]forevidenceinthisspiritinthecontextofpatentcitations).Forexample,authorsofapapermayprefertoreducethenumberofcitationsinordertomakelargerclaimsfortheirownpaper;theymaybemorelikelyto\getawaywithit"(i.e.,nothavingeditorsandrefereesrequesttoaddcitations)ifthestrategicallyuncitedpapersarecloseinintellectualspacetoaretractedpaper.Unfortunately,wedonothavetheabilitytoparsethecitationdatatodistinguishstrategicfromsubstantialcitations,alimitationthatthereadershouldbearinmindwheninterpretingourresults.DescriptiveStatistics.Table3providesbasicinformationaboutthematchedsample.Byconstruction,controlandtreatedarticlesarematchedonyearofpublicationandjournal,andtheyappeartomatchverycloselyonthelengthoftheauthorshiproster.Becauseinmanycases,retractionoccursrelativelyquicklyafterpublication,only30%oftherelatedarticlesinthedataarepublishedafterthepublicationofthesourcearticle,andonly7.9%ofthesearticlescitethesoon-to-be-retractedsource.Conversely,only6.1%ofthearticles14 Itisthenstraightforwardtocomputeyearly\entryrates"intotreatedandcontroleldsbycountingthenumberofrelatedarticlespublishedintheeldineachyear.Capturingfund-inginformationattheeldlevelisslightlymoreinvolved.PubMedsystematicallyrecordsNIHgrantacknowledgementsusinggrantnumbers,butwithoutreferencingtheparticulargrantcycletowhichthepublicationshouldbecredited.Toaddressthisissue,weadoptthefollowingprocedure:foreachrelatedpublication,weidentifytheclosestprecedingyearinathree-yearwindowduringwhichfundingwasawardedthrougheitheranewawardoracompetitiverenewal;wethensumallthefundinginthegrantyearthatultimatelygeneratespublicationsinthefocaleld.Thedescriptivestatisticsfortheeld-levelanalysesaredisplayedonTable4.Thenum-berofobservationsacrossthepublicationfrequencydatasetandthefundingdatasetdierbecause(i)thefundingdataareavailableonlyuntil2007,whereasthepublicationdataisavailableuntiltheendofourobservationperiod(2011);and(ii)wedropfromthefundinganalysistheeldsforwhichthereisnotasinglepublicationacknowledgingNIHfundingfortheentire1970-2007period.4ResultsTheexpositionoftheeconometricresultsproceedsinfourstages.Afterabriefexpositionofthemaineconometricissues,wepresentdescriptivestatisticsandresultspertainingtotheeectofretractionsontherateofcitationsthataccruetotheretractedarticles.Second,weexaminetheextentoftheretractioneectonthesetofrelatedarticles.Third,westudywhethertheretractioneventsalteredpatternsofentryandfundingintothescienticeldsassociatedwiththeretractedarticles.Fourth,weexplicatethemechanism(s)underlyingtheresults.4.1EconometricConsiderationsOurestimatingequationrelatesthenumberofcitationsthatarereceivedbyrelatedarticlejinyearttocharacteristicsofjandofretractedarticlei:E[CITESjtjXijt]=exp[0+1RLTDjAFTERit+f(AGEjt)+t+ ij]whereAFTERdenotesanindicatorvariablethatswitchestoonetheyearaftertheretrac-tion,RLTDdenotesanindicatorvariablethatisequaltooneforrelatedarticlesandzerofor16 4.2EectofRetractiononRetractedPapersTable5reportstheresultsfromsimpledierence-in-dierencesanalysesforthesampleof1,037retractionsand1,922nearestneighborsinthejournalsinwhichtheretractedarticlesappeared.17Column1reportstheestimateoftheretractioneectforthebaselinespeci-cation.Theresultimpliesthat,relativetothecontrols,retractedpaperslose69%oftheircitationsinthepost-retractionperiod.Themagnitudeoftheeectisinlinewiththe60%declineestimatedbyFurmanetal.(2012a)inasmallersampleofPubMed-indexedretrac-tions.Column2showsthattheeectisbarelyaectedwhenwedropfromthesamplethoseobservationscorrespondingtoretractedarticlesforwhichtheretractionreasonismissing.Column3includesinthespecicationsthemaineectoftheretractiontreatmentaswellastwointeractionswiththe\shakyshoulders"and\absentshoulders"indicatorvariables.Inthismodel,themaineectimplicitlycapturesthepost-retractionfateoftheretractedpapersthatstillmaintain\strongshoulders."Whilethiseectisnegativeandstatisticallysignicant(withanimplieddecreaseinthecitationrateequalto38%)itsmagnitudeismarkedlysmallerthanthatoftheeectcorrespondingtothe\shakyshoulders"retractions(66%)andsmallerstillthantheeectforthe\absentshoulders"category(73%).Droppingthe\strongshoulders"groupfromthesampleincreasesthemagnitudeoftheretractioneectinabsolutevalue(to72%,column4),whilefocusingontheearliestretractioneventineachcaseslightlylowerstheestimatedeect(66%,column5).Inshort,ourresultsconrmtheearlierndingsofFurmanetal.(2012a).Inaddition,theresultsincolumn3provideimportantempiricalvalidationforthecodingexercisedetailedintheonlineappendix.Althoughthecoecientsinthisspecicationarenotstatisticallydierentfromeachother,theirmagnitudesareorderedinanintuitiveway,withthepost-retractionpenaltydecreasingmonotonicallywiththestrengthoftheshouldersprovidedtofollow-onresearchers.4.3EectofRetractiononRelatedPapersWenowturntothecoreoftheempiricalanalysis,examiningtheeectofretractiononthecitationoutcomesfortherelatedarticlesidentiedbythePubMedRelatedCitationsAlgorithm.TherstsetofresultsappearsinTable6,whichisstructuredanalogouslytoTable5.Column1reportsthedierence-in-dierenceestimatefortheentiresample.Wend 17SixtysevenretractedarticlesneededtobedroppedfromtheestimationsamplebecausetheyappearedinjournalsnotindexedbytheWebofScience.18 Moreover,withanaverageof60relatedpapersperretractedarticle,theaggregatecitationconsequencesoftheretractioneventsforthescienticeldsinvolvedarenottrivial.Toprovideabettersenseofthemagnitudeoftheseaggregatelosses,weestimateananalogofTable6usingOLSinSectionIVoftheonlineappendix.Thedependentvariableisthenumberofcitationsreceivedinlevels.TheresultsaresubstantiallyunchangedcomparedtoourbenchmarkPoissonspecication.Furthermore,thecitationdeclineestimatedtherein(-0.173citationperyear)canformthebasisofback-of-the-envelopecalculation.Usingthisestimateofthecitationpenaltyandaggregatingtotheeldlevel(takingintoaccountboththeaveragenumbersofarticlespereldandtheaveragelengthofthepost-retractionperiodinthesample),weconcludethatretraction-aictedeldsexperience,onaverage,alossof75citationsrelativetocontrolelds.Stateddierently,thisisasifwedeletedfromtheaverageeldonepaperintheTop7%ofthedistributionforthetotalnumberoflong-runcitations.Dynamicsofthetreatmenteect.Wealsoexplorethedynamicsoftheeectsuncov-eredinTable6.WedosoinFigure3byestimatingaspecicationinwhichthetreatmenteectisinteractedwithasetofindicatorvariablescorrespondingtoaparticularyearrel-ativetotheretractionyear,andthengraphingtheeectsandthe95%condenceintervalaroundthem.Twofeaturesofthegureareworthyofnote.First,thereisnodiscernibleevidenceofaneectintheyearsleadinguptotheretraction,andingthatvalidatesexpostouridenticationstrategy.19Second,aftertheretraction,thetreatmenteectincreasesmonotonicallyinabsolutevaluewithnoevidenceofrecovery.Exploringheterogeneityintheeectofretractions.Weexploreanumberoffactorsthatcouldmodulatethemagnitudeoftheretractioneectonintellectualneighbors'citationrates.Table7reportstheresultsofsevenspecicationsthatincludeinteractiontermsbetweentheretractiontreatmenteectandcharacteristicsofeithertheretractedarticleortheretracted/relatedarticledyad.Column1evaluateshowthecumulativeattentiontotheretractedarticleaectsthereductionofcitationtorelatedarticles.Therationaleforthisanalysisisthatcitationsareaproxyfortheamountofattentionthatscientistsintheeld(andotherrelatedelds)gavetotheretractedpaperpriortoretraction,andmaybeapredictorfortheamountofcollateraldamageinagiveneld.Thecoecientontheinteractiontermshowsthathighlycitedretractedpapers|thoseinthetop25th 19Thisndingisalsoreassuringasitsuggeststhatretractionsarenotendogenoustotheexhaustionofaparticularintellectualtrajectory,i.e.,itdoesnotappearasifresearchersresorttothetypeofmisconductthatyieldsretractionsafteruncoveringevidencethattheireldisonthedecline.20 relatedarticlesthatarealsocitedbytheretractionexperiencea6.1%boostinthecitationratefollowingtheretractionevent.Thisresultisconsistentwiththeideathattheresearcherswhocontinuetoworkintheeldinspiteoftheretractioneventchoosetobuildinsteadonprior,unretractedresearch.Theoveralleectontheeldcanstillbenegativesinceonlyasmallfraction(7.9%)ofarticlesrelatedtothesourcearealsocitedbythesource.Column6usesourcodingofauthor\intent"tocomparehowthetreatmenteectofretractiondiersinclearcasesoffraudfromfraud-freeretractioncasesorthosewithuncertainintent.Weseethatcasesof\IntentionalDeception"largelydrivethenegativeeectontheeld'scitations(-7.8%),whileeldsthatexperiencedretractionswith\NoSignofIntentionalDeception"(theomittedcategory)hadnocitationdecline,onaverage.FigureDofSectionIintheonlineappendixexplorestheextenttowhichtheageofarelatedarticleatthetimeoftheretractioneventin uencesthemagnitudeofthetreatmenteect.Inthisgure,eachcirclecorrespondstothecoecientestimatesstemmingfromaspecicationinwhichthecitationratesforrelatedarticlesandtheircontrolsareregressedontoyeareects,articleageindicatorvariables,aswellasinteractiontermsbetweenthetreatmenteectandthevintageofeachrelatedarticlesatthetimeoftheretraction.Sincerelatedarticlesinthesamplearepublishedbetweenoneandtenyearsbeforetheirassociatedretractionevent,therearetensuchinteractionterms.22Theresultsshowthatonlyrecentarticles(thosepublishedone,two,orthreeyearsbeforetheretraction)experienceacitationpenaltyinthepost-retractionperiod,whereasolderarticlesarerelativelyimmunetotheretractionevent.Finally,Figure4andFigureE(SectionIintheonlineappendix)investigatetheextenttowhich\relatedness"(inthesenseofPMRA)exacerbatesthemagnitudeoftheresponse.InFigure4,weusetheordinalmeasureofrelatedness,namelytherankreceivedbyafocalarticleinthelistreturnedbyPMRAforaspecicsourcearticle.Wecreate22interactionvariablesbetweentheretractioneectandtherelatednessrank:Top5,Top6-10,...,Top95-100,100andabove.Theresultsshowthatlower-ranked(i.e.,morecloselyrelated)articlesbearthebruntofthenegativecitationresponseinthepost-retractionevent.FigureEisconceptuallysimilar,exceptthatitreliesonthecardinalmeasureofrelatedness.Wecreateonehundredvariablesinteractingtheretractioneectwitheachpercentileoftherelatednessmeasure,andestimatethebaselinespecicationofTable7,column1inwhichthemainretractioneecthasbeenreplacedbythe100correspondinginteractionterms.FigureEgraphstheestimatesalongwiththe95%condenceintervalaroundthem.Theresultsare 22The95%condenceintervals(correspondingtorobuststandarderrors,clusteredaroundcasecodes)aredenotedbytheblueverticalbars.22 investigators(post-docsandgraduatestudentsareoftenlistedasrstauthors)areatfaultforretraction.Furthermore,wendthatretractedrstauthorsandmiddleauthorsarelesslikelytoreappearineldsinwhichpapershavebeenretractedthanareretractedlastauthors(SectionVIoftheonlineappendix).Theseanalysessuggestthat(1)thestrengthofthetreatmenteectisgreatestwhentheauthorculpableforretractionistherstauthor(ratherthanthelastauthor)and(2)thatthepublicationdeclineisnotdrivenbytheexitofPIsorlabdirectors,butmaybedrivenbytheexitofrstauthors.24Tosummarize,theseresultshelpexplainwhyweobservedownwardmovementinthecitationsreceivedbyrelatedarticleshighlightedearlier:Therearefewerpapersbeingpub-lishedintheseeldsandalsolessfundingavailabletowritesuchpapers.Whiletheseeectsconstitutetheproximatecausesofthenegativespilloversthatarethecentralndingofthepaper,theybegthequestionofwhattheunderlyingmechanismsare.Whatexplainsthe ightofresourcesawayfromtheseelds?4.5UnderlyingMechanismsoftheRetractionEectAnumberofmechanismsmayunderlieourndingsregardingnegativecitation,entry,andfunding.Weinvestigateevidenceregardingtwopossibilities.First,arelativedecreaseinattentionsubsequenttoretractionmayre ectscientists'learningaboutthelimitedpotentialforfollow-onresearchinretraction-aictedelds.ThecaseofJan-HendrikSchoniscon-sistentwiththisexplanation.Schon'sresearchatBellLabsinitiallyproducedspectacularresultsusingorganicmaterialstoachieveaeld-transistoreect;hisresultswereeventuallydemonstratedtohavebeentheresultoffraudulentbehaviorandsubsequenteortsbuildingonhisworksuggesttheimpossibilityofachievingeld-transistoreectsusingthematerialsSchonemployed(Reich,2009).Second,theeld-leveldeclinesincitation,entry,andfundingweobservecouldalsoarisefromafearofreputationalassociationwiththe\contaminated"eldsorauthors.ThecaseofWoo-SukHwangthatweinvokeatthebeginningofthepaperisconsistentwiththistypeofexplanation:Follow-onresearcherseschewedallimplicationsofHwang'swork,althoughsomewouldprovepromisingwhentheeldrevisitedhisworkafewyearsaftertheretractions.Althoughwemaynotbeabletoruleouteitherexplanationentirely,exploringtherela-tiveimportanceofthesemechanismsmattersbecausetheirwelfareimplicationsdier.Forexample,itmaybeidealfromasocialplanner'sperspectiveifscientistssimplyredirect 24TheseresultsaccordwellwiththeevidencepresentedinJinetal(2013).24 haveadisproportionateeectonfuturecitationsdoesnotlendsupporttotheseexplanatorymechanisms.Tofurtherinvestigatethepossibilitythatareputationalmechanismmaybeatwork,weexamineheterogeneousresponsesbetweenacademic-andrm-basedciters.Westartfromthepremisethatscientistsemployedbyprot-seekingrmswouldpersistininvestigatingtopicsthatuniversity-basedscientists(andNIHstudysections)frownupon(postretrac-tion),aslongasthepossibilityofdevelopingacommercialproductremains.28Weparsetheforwardcitationdatatoseparatethecitationsthatstemfromprivaterms(mostlypharma-ceuticalandbiotechnologyrms,identiedbysuxessuchasInc.,Corp.,LLC,Ltd.,GmbH,etc.)fromthosethatoriginateinacademia(broadlydenedtoincludenon-protresearchinstitutesandpublicresearchinstitutionsaswellasuniversities).Eventhoughweclassifyas\private"anycitingarticlewithamixofprivateandacademicaddresses,almost90%ofthecitationsinoursampleare\academic"accordingtothisdenition.InTable9,columns1aand1b,wendthatacademicandprivatecitersdonotdieratallintheextenttowhichtheypenalizetheretractedarticles.Conversely,columns2aand2bindicatethatprivatecitershardlypenalizerelatedarticles,whereasacademiccitersdototheextentpreviouslydocumented.29Thedierencebetweenthecoecientsisstatisticallysignicant(p0:01).Thesendingsareconsistentwiththeviewthattheretraction-inducedspilloverswehavedocumentedstem,atleastinpart,fromacademicscientists'concernthattheirpeerswillholdtheminloweresteemiftheyremainwithinanintellectualeldwhosereputationhasbeentarnishedbyretractions,eventhoughtheseresearcherswereneithercoauthorsontheretractedarticleitselfnorbuildingdirectlyuponit.Itispossible,however,thatthesedierencesarisebecauseindustryscientistsnditeasiertosubstitutecitationswithinaeldbecausetheirworkismoreappliedinnature.30Toinvestigatethispossibility,wehavematchedthePubMeddatabasewiththeUSpatentdatatoidentifythecitationsreceivedfrompatentsbypublishedscienticarticles.31Our citationsandentryinaectedelds,asscientistsspendtimetryingtoinvestigateandverifyresultsrelatedtotheretractedpaper.28Wegroundourassumptionsregardingthepotentiallydierentialresponsesofacademic-andindustry-basedscientistsbyappealingtopriorworkondierencesinincentivesandstatusconcernsamongacademicandindustrialscientists,theformerofwhomhaveprincipally(thoughnotexclusivelypriority-basedincen-tives)andthelatterofwhomfacestronger(thoughnotexclusive)nancialandorganizationalincentivesthatarenotdirectlytiedtostandingintheresearchcommunity(DasguptaandDavid,1994;Stern,2004).29Theestimationsampleislimitedtothesetofrelatedarticlesandtheircontrolsthatreceiveatleastonecitationofeachtypeovertheobservationperiod.30Wethankananonymousrefereeforthissuggestion.31SeeAppendixDinAzoulayetal.(2012)formoredetailsonthepatent-to-publicationmatchingprocessthatprovidesafoundationfortheanalysespresentedinTable9.26 evaluationhasanumberofinterestingimplications.Throughthecodingschemewehavede-velopedtounderstandtheparticularcircumstanceofeachretractionevent,wehighlightthelimitationsoftheinstitutionalpracticesthataresupposedtoensurethedelityofscienticknowledge.Inparticular,theanalysisbringssystematicevidencetobearontheheightenedattentiondevotedtothetopicofscienticmisconductinsciencepolicycircles.Someana-lystssuggestthatthescienticrewardsystemhasbeencorruptedandisinneedofwholesale,radicalreform(Fangetal.2012).Thisviewpointstotheincreaseindetectedfraudsanderrorsasastrongindicationthatmuchinvalidsciencegoesundetected.Acknowledgingthispossibility,othersretortthatasystemofretractionsispreciselywhatthe\RepublicofScience"requires:amechanismthatswiftlyidentiesfalsescienceandeectivelycommu-nicatesitsimplicationsforfollow-onresearch(Furmanetal.2012a).Thevalidityofthemoreoptimisticviewhingescruciallyonwhatissignaledbyaretractionnoticeandonhowscientistsintheaectedeldsprocessthisinformationandactuponit.Ourresultssuggestthatretractionsdohavethedesiredeectontheparticularpaperinquestion,butalsoleadtospillovereectsontothesurroundingintellectualelds,whichbecomelessvibrant.Ifthesenegativespilloverssimplyre ectedthediminishedscienticpotentialoftheaf-fectedelds,thenthe\collateraldamage"inducedbyretractionswouldnotbeacauseforconcernandwouldreinforcethebeliefthattheretractionprocessisarelativelyeectivewaytopolicethescienticcommons(Furmanetal.2012a).However,ourevidenceindicatesthatbroadperceptionsoflegitimacyareanimportantdriverofthedirectionofscienticinquiry.Unfortunately,retractionnoticesoftenobfuscatetheunderlyingreasonforretraction,whichdiminishestheinformationcontentofthesignaltheyprovidetofollow-onresearchers.Asaresult,therecouldbehighreturnstodevelopingastandardizedcodingapproachforretrac-tionsthatjournalsandscienticsocietiescoulddrawupontohelpthescienticcommunityupdatetheirbeliefsregardingthenatureandscopeoffalsescience.Whilejournaleditorsmayunderstandablybalkatthesuggestionthatitisincumbentuponthemtomakecleardeterminationsregardingtheunderlyingcausesofretractions,aclearly-articulatedschemawouldincreasetheincentivesofauthorstoreportproblemsemergingafterthepublicationofanarticleandprovideamorenuancedcontextwithinwhichuniversitiesthemselves(aswellasfundingbodies)mightinvestigateandadjudicateinstancesoffalsescience.32 32Alternativemechanisms|suchas\replicationrings"|havebeenproposedtocounteractthenegativespilloversinintellectualspaceassociatedwithretractionevents(Kahneman2012).Whether\local"responsesofthistypecanbeimplementedsuccessfullyisquestionable,inlightofthecoststheywouldimposeonresearchersactiveinretraction-aectedelds.28 SectionIICodingofRetractionReasonsThepurposeofthisdocumentistodescribetheretractionscodingschemethatformsthebasisoftheanalysisimplementedinthemainbodyofthepaper,aswellastoprovideamethodforclassicationoffutureretractions.Thegoalistoreconciletwocontradictoryobjectives:onetheonehand,groupretractionsintoasmallnumberofmutuallyexclusivecategories;ontheotherhand,captureinameaningfulwaytheinherentheterogeneityinretractionreasons.Thecodingschemehasbeendevelopedbytheauthorssolelyforthepurposeofscholarlyacademicre-search.Thecodingofeachindividualretractionisbasedonarangeofpublicinformationsources,rang-ingfromthenoticeofretractionitself,toentriesinthe\RetractionWatch"blog,toresultsofGooglesearches.Noadditionalinformationhasbeengatheredfromtheauthorsoftheretractedpapersoroth-ersinvolvedinthesecases.Assuchthecodingrepresentsaninformedjudgmentofthecontextinwhicheachretractioneventtookplace,ratherthantheoutcomeofaformalinvestigation.Thelistofretrac-tions,articlecharacteristics,andreasonscanbedownloadedfromtheinternetatthefollowingURL:http://jkrieger.scripts.mit.edu/retractions/. MethodsSummary:AnalysisofretractionsindexedbyPubMed,publishedbetween1973and2008,andretractedbeforetheendof2009,yielded13mutuallyexclusive\reasons"categories(seelistbelow).Inarststep,weassignoneofthesereasonstoeachretractedarticlesolelybasedotheinformationcontainedintheretractionnotice.Inasecondstep,weassignedareasontoeachretractedarticlebasedoinformationinthenoticeaswellasanyadditionalinformationfoundthroughinternetsleuthing(e.g.,newsarticles,blogs,pressreleases,etc.):Wealsocodeeachretractionobservationbasedonitsvalidityasafoundationforfutureresearch.These\shoulders"categoriesareStrongShoulders,ShakyShoulders,andAbsentShoulders.StrongShoul-dersmeansthattheretractiondoesnotcastdoubtonthevalidityofthepaper'sunderlyingclaims.Apublishermistakenlyprintinganarticletwice,anauthorplagiarizingsomeoneelse'sdescriptionofaphe-nomenon,oraninstitutionaldisputeabouttheownershipofsamplesareallexampleswherethecontentoftheretractedpaperisnotinquestion.ShakyShouldersmeansthatthevalidityofclaimsisuncertainorthatonlyaportionoftheresultsareinvalidatedbytheretraction.AbsentShouldersistheappropriatecodeinfraudcases,aswellasininstanceswherethemainconclusionsofthepaperarecompromisedbyanerror.Lastly,weattempttodiscernthelevelofintentionaldeceitinvolvedineachcase.Deceptionmightinvolvethepaper'sactualclaims(results,materialsmethod),itsattributionofscholarlycreditthroughauthorshipandcitations,ortheoriginalityofthework.WeuseNoSignofIntentionalDeceptiontocodeinstanceswheretheauthorsdidnotintendtodeceive,suchasinthecaseof\honestmistakes"ormiscommunications.UncertainIntentapplieswherefraudisnotrmlyestablished,butnegligenceorunsubstantiatedclaimsraisequestionsaboutanauthor'smotives.TheIntentionalDeceptioncodecoverscaseswherefalsication,intentionalmisconductorwillfulactsofplagiarismappeartohaveoccurred.The\intent"and\shoulders"codingareinherentlymoresubjectivethanthatoftheunderlyingretractionreasons.Infact,thereisnosimplemappingofthelatterintoreasonsintotheformer:eachshouldersorintentcodeisassignedbasedonathoroughreviewoftheavailableevidenceineachcaseandaccordingtotheguidelinesbelow.Thereasonscategoriescaptureacombinationofcontext,validityandintent,whilethe\shoulders"codeonlypertainstothevalidityofthearticle'scontent,andthe\intent"coderelatesonlytointent.1 1Foreachcategory,wementionasmallnumberofPubMedIDsofnoticesthatcanserveasgoodillustrationsofthecodingchoice.vi theauthorguiltyoffalsicationandplagiarism(#12411512,#12833069,#19575288).Ifaretractionmeetsthecriteriaof\FakeData&Plagiarism"thenAbsentShouldersandIntentionalDeceptionarethelogicalcomplementarycodes.6.Duplication.Theimportantcriterionfor\Duplication"isthattheauthorscopiedfromthemselves.Mostofthearticlesinthiscategoryalreadyappearedinanotherjournalbeforethesecondjournalrealizedthattheentirearticleisanexactduplicateorvirtuallyidenticaltoanarticlebythesameau-thorsinadierentjournal(#12589830).Someofthe\Duplication"casesarenotentirelyrepublishedarticles,butwillreproduceimportantcontent,suchasdata,chartsandconclusions(#15580694).Aswithplagiarismcases,thesecasesareassignedtheStrongShoulderscodebydefault,butmayfallintotheShakyShouldersbucketwhenmeaningfuldierencesexistbetweentheduplicatedarticleanditsoriginalversion(#1930642).The\intent"codingfollowsasimilarlogic,withIntentionalDeceptionbeingtheprimaryclassication.Yet,UncertainIntentsometimesisthemorelogicalchoicewhenduplicationresultedfromanapparentmiscommunication(#17047133,#16683328).7.QuestionsaboutValidity.Thiscategorycapturesretractioncasesassociatedwithvaguemis-conductallegations(#118049464),suspicious\irregularities"(#118560433),and\questionable"data(#118951275).Thehallmarkoftheseretractionnoticesisthattheyobfuscatethenatureofthemisconduct.Thevaguenatureofthiscategory'snoticesmakesShakyShouldersandUncertainIntentthefrequentchoiceforcomplementarycodes.8.AuthorDispute.Thesecasesinvolvedisagreementsbetweenauthorsaboutcontent,credit,andpermission.Often,thesedierenttypesofdisputeswillbecombined(#14723797,#19727599).Papersubmissionwithouttheconsentofcoauthorsisthemostcommonunderlyingreasonforthiscode.Unlesswarrantedbyinformationgainedthroughsleuthing,ShakyShouldersistheappropriatecodefor\AuthorDispute"cases|mostdisputesstemfromcon ictssurroundingcreditattributionandthevericationofresults,ratherthanoutrightfraud.IntentionalDeceptionistheprevalentintentcodein\AuthorDispute"cases,thoughexceptionsdoexist(#17081259;#16003050).9.LackofConsent/IRBApproval.ThiscategoryincludescaseswheretheauthorsdidnotgetIRBapprovalordidnotsecurepatientinformedconsentbeforeconductingtheirstudy.Ambiguouscasesof\ethicsviolations"(#19819378,#18774408)alsofallintothe\LackofConsent/IRBApproval"category.Thedefault\shoulders"codeisShakyShoulders.StrongShouldersmaybeappropriateifthereisevidenceindicatingthattheauthorsbelievedtheyhadIRBapproval(#14617761),orthatthepaper'sresultsaredevoidoffraud/deception(#16832233).Determiningthelevelofintentislessstraightforwardforthiscategory.Ingeneral,ethicsviolationscountasIntentionalDeception,butuncertaintyaboutauthorintentmaywarrantothercodingchoices.Forexample,theauthorsmayhaveerroneouslythoughttheyreceivedIRBapproval(#14617761),orapprovalmayhavebeenociallyobtainedonlyaftertheauthorscompletedthestudy(#16842490).10.DidNotMaintainProperRecords.Althoughthedatasetonlyhasthreeretractionsthatfallintothiscategory,weincludeitasdistinctretractingreason.Thedeningcharacteristicofthiscategoryisabsenceofproperdatarecords.Withproperrecords,thescienticcommunitycouldbetterdeterminethereliabilityoftheclaimscontainedinthesepapers.ShakyShouldersandUncertainIntentarethepropercomplementarycodes.11.PublisherError.Retractionsoccasionallystemfrompublishermistakesratherthanauthormis-conductorerror.Theassociatednoticesestablishthatthepublisherissolelyresponsiblefortheerror,whichisusuallyaduplicatepublication(#17452723,#15082607)orprintingofanearlierdraft(#19662582,#15685781).StrongShouldersisanaturaltforpublishererrorsresultingfromduplicates,whileShakyShouldersisappropriatewhenthejournalprintsthewrongdraft.Bydenition,theproperintentcodingisNoSignofIntentionalDeception.12.NotEnoughInformationtoClassifyorMissing.Theessentialdierencebetweenthesetwocategoriesisthatwehaveanoticefortheformeranddonothaveanoticeforthelatter.\NotEnoughInformationtoClassify"impliesthatthenoticeissovaguethatwecannotassignanothercode.Suchviii SectionIIIPubMedRelatedCitationsAlgorithm[PMRA]ThefollowingparagraphswereextractedfromabriefdescriptionofPMRA:2Theneighborsofadocumentarethosedocumentsinthedatabasethatarethemostsimilartoit.Thesimi-laritybetweendocumentsismeasuredbythewordstheyhaveincommon,withsomeadjustmentfordocumentlengths.Tocarryoutsuchaprogram,onemustrstdenewhatawordis.Forus,awordisbasicallyanunbrokenstringoflettersandnumeralswithatleastoneletterofthealphabetinit.Wordsendathyphens,spaces,newlines,andpunctuation.Alistof310common,butuninformative,words(alsoknownasstopwords)areeliminatedfromprocessingatthisstage.Next,alimitedamountofstemmingofwordsisdone,butnothesaurusisusedinprocessing.Wordsfromtheabstractofadocumentareclassiedastextwords.Wordsfromtitlesarealsoclassiedastextwords,butwordsfromtitlesareaddedinasecondtimetogivethemasmalladvantageinthelocalweightingscheme.MeSHtermsareplacedinathirdcategory,andaMeSHtermwithasubheadingqualierisenteredtwice,oncewithoutthequalierandoncewithit.IfaMeSHtermisstarred(indicatingamajorconceptinadocument),thestarisignored.Thesethreecategoriesofwords(orphrasesinthecaseofMeSH)comprisetherepresentationofadocument.Nootherelds,suchasAuthororJournal,enterintothecalculations.Havingobtainedthesetoftermsthatrepresenteachdocument,thenextstepistorecognizethatnotallwordsareofequalvalue.Eachtimeawordisused,itisassignedanumericalweight.Thisnumericalweightisbasedoninformationthatthecomputercanobtainbyautomaticprocessing.Automaticprocessingisimportantbecausethenumberofdierenttermsthathavetobeassignedweightsisclosetotwomillionforthissystem.Theweightorvalueofatermisdependentonthreetypesofinformation:1)thenumberofdierentdocumentsinthedatabasethatcontaintheterm;2)thenumberoftimesthetermoccursinaparticulardocument;and3)thenumberoftermoccurrencesinthedocument.Therstofthesepiecesofinformationisusedtoproduceanumbercalledtheglobalweightoftheterm.Theglobalweightisusedinweightingthetermthroughoutthedatabase.Thesecondandthirdpiecesofinformationpertainonlytoaparticulardocumentandareusedtoproduceanumbercalledthelocalweightoftheterminthatspecicdocument.Whenawordoccursintwodocuments,itsweightiscomputedastheproductoftheglobalweighttimesthetwolocalweights(onepertainingtoeachofthedocuments).Theglobalweightofatermisgreaterforthelessfrequentterms.Thisisreasonablebecausethepresenceofatermthatoccurredinmostofthedocumentswouldreallytelloneverylittleaboutadocument.Ontheotherhand,atermthatoccurredinonly100documentsofonemillionwouldbeveryhelpfulinlimitingthesetofdocumentsofinterest.Awordthatoccurredinonly10documentsislikelytobeevenmoreinformativeandwillreceiveanevenhigherweight.Thelocalweightofatermisthemeasureofitsimportanceinaparticulardocument.Generally,themorefrequentatermiswithinadocument,themoreimportantitisinrepresentingthecontentofthatdocument.However,thisrelationshipissaturating,i.e.,asthefrequencycontinuestogoup,theimportanceofthewordincreaseslessrapidlyandnallycomestoanitelimit.Inaddition,wedonotwantalongerdocumenttobeconsideredmoreimportantjustbecauseitislonger;therefore,alengthcorrectionisapplied.Thesimilaritybetweentwodocumentsiscomputedbyaddinguptheweightsofallofthetermsthetwodocu-mentshaveincommon.Oncethesimilarityscoreofadocumentinrelationtoeachoftheotherdocumentsinthedatabasehasbeencomputed,thatdocument'sneighborsareidentiedasthemostsimilar(highestscoring)documentsfound.Thesecloselyrelateddocumentsarepre-computedforeachdocumentinPubMedsothatwhenoneselectsRelatedArticles,thesystemhasonlytoretrievethislist.Thisenablesafastresponsetimeforsuchqueries.WeillustratetheuseofPMRAwithanexampletakenfromoursample.AmitavHajraisaformerUniversityofMichigangraduatestudentwhofalsieddatainthreepapersretractedin1996.OneofHajra'sretractedpapers(PubMedID#7651416)appearedintheSeptember1995issueofMolecularandCellularBiologyandlists27MeSHterms.Its10thmostrelatedpaper(PubMedID#8035830),accordingtothePMRAalgorithm,appearedinthesamejournalinAugust1994andhas23MeSHterms,10ofwhichoverlapwiththeHajraarticle.Thesetermsincludecommontermssuchas\Mice"and\DNA-BindingProteins/genetics"aswellasmorespecickeywordsincluding\CoreBindingFactorAlphaSubunits,"\NeoplasmProteins/metabolism," 2Availableathttp://ii.nlm.nih.gov/MTI/related.shtmlx