/
Thespillovereectsofretractionsontheevolutionofresearcheldsisparticul Thespillovereectsofretractionsontheevolutionofresearcheldsisparticul

Thespillovere ectsofretractionsontheevolutionofresearch eldsisparticul - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
381 views
Uploaded On 2016-07-24

Thespillovere ectsofretractionsontheevolutionofresearch eldsisparticul - PPT Presentation

contemporaneouslywiththeultimatelyretractedarticlesandcrowded eldsinwhichthemostrelatedarticlesachieveparticularlyhighPubMedrelatednessrankingsTheseresultssuggestthatthedegreeofscienti ccompetit ID: 416938

contemporaneouslywiththeultimatelyretractedarticles and\crowded" elds inwhichthemost-relatedarticlesachieveparticularlyhighPubMedrelatednessrankings.Theseresultssuggestthatthedegreeofscienti ccompetit

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Thespillovere ectsofretractionsontheevol..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Thespillovere ectsofretractionsontheevolutionofresearch eldsisparticularlyimpor-tantgiventhebroaderwelfareimplicationsthatarisefromscientistsshiftingtheirpositionin\intellectualspace"(Aghionetal.2008,Acemoglu2012,BorjasandDoran2012).However,evidenceiscurrentlylimited.Asastartingpoint,systematicdataonjournalarticleretrac-tionsshowsastrongupwardtrendinfrequency,butasinthecaseofcriminalactivity,theunderlyingmagnitudeofscienti cmistakesandmisdeedsremainspoorlyestablished(Mar-tinson,Anderson,anddeVries2005).Inaddition,arecentanalysisshowsthatthemajorityofretractionsarecausedbymisconduct(Fangetal.2012).Moresalientfortheevolutionof elds,Furman,Jensen,andMurray(2012)provideevidencethatretractionnoticesaree ectiveinalertingfollow-onresearcherstotheshakyfoundationsofaparticularpaper.Citationstoretractedpapersdeclinebyover60%inthepost-retractionperiodrelativetocarefullymatchedcontrols.Theiranalysis,however,focusesonthefateoftheretractedpapersthemselves,notwhetherandtowhatextentretractionsin uencetheevolutionofadjacentresearchareas.Italsodoesnotdistinguishbetweendi erenttypesoffalsescienceassociatedwithretractedevents,althoughthisheterogeneityisofprimaryimportancesincetheinformationthatretractionprovidesregardingtheveracityofassociatedknowledgecanvarywidely.Thus,thechallengeforourpaperistoelucidatetheimpactofdi erenttypesofretractionsonrelatedresearchlinesandthemagnitudeofspilloverstoresearchinproximateintellectualspace.OurconceptualapproachfollowsAcemoglu(2012),Aghionandco-authors(2009),andothersinunderstandingresearchasarisingthroughacumulativeprocessalongandacrossresearchlinesthatcanbetracedoutempiricallythroughcitationsfromonepublicationtoanother(e.g.,MurrayandStern2007).Thisapproachisgroundedintheassumptionthatknowledgeaccumulatesasresearcherstaketheknowledgeinaparticularpublicationanduseitasasteppingstonefortheirfollow-oninvestigations(Mokyr2002).Althoughitisacommonplaceinsightthattheprocessofknowledgeaccumulationunfoldswithinanintellectualspace(e.g.,Hull1988),ithasprovensurprisinglydicultforsocialscientiststogainempiricaltractiononthisconcept(seeAzoulay,Gra Zivin,andWang[2010]andBorjasandDoran[2012]forrareexceptions).Weconceptualizeretractioneventsas\shocks"tothestructureoftheintellectualneighborhoodsaroundtheretractedpapers,andimplementaproceduretodelineatetheboundariesofthisspaceintermsofrelatedpublicationsinawaythatisscalableandtransparent,andwithscantrelianceonhumanjudgement.Wearetheninterestedinstudyingwhetherresearchersincreaseordecreasetheirrelianceonrelatedpapersfollowingtheretractionevent.Wedi erentiatethiscumulativeresponseacrossthree2 contemporaneouslywiththeultimatelyretractedarticles,and\crowded" elds,inwhichthemost-relatedarticlesachieveparticularlyhighPubMedrelatednessrankings.Theseresultssuggestthatthedegreeofscienti ccompetitionwithina eldimpactsthewayinwhichnegativeshocksa ectknowledgeaccumulation.Weconcludeouranalysisbyexaminingtheproximatecausesandpotentialunderlyingmechanismsbehindtheobservedcitationdecline.We ndevidencethatpublicationratesinthe eldsa ectedbyaretractionmarkedlydecreasefollowingretraction,relativetocontrol elds.Similarly,we ndthatfundingbyNationalInstitutesofHealth(NIH)inthese eldsdeclinesinanevensharperfashion.Weconsidertwomechanismsthatmayliebehindthesee ects.First,weexamineevidenceregardingthestrengthofalearninginterpretationrela-tivetoonebasedonstatusconcerns.Ontheonehand,wemightsimplybeobservingthatretractioneventsenablescientiststodiscoverthataparticular eldo ersfewerprospectsofimportant ndingsthanwaspreviouslybelieved,leavingthemtosubstituteawayfromthat eldontolinesofresearchthatarenotdirectlyadjacenttotheretractedknowledge.Alterna-tively,scientistsinthea ected eldsmightbelievethattheirreputationwillbebesmirchediftheytietheirscienti cagendatootightlytoa eldthathasbeen\contaminated"byaretraction.Statusconcernsofthiskindwouldjustassurelydriveawayprevious(orpoten-tial)participantsinthe eld,butsuchshiftswouldthistimebeconstruedasconstitutingunder-investmentinthea ectedareasfromawelfarestandpoint.We ndsuggestiveevidencethatthestatusinterpretationaccountsforatleastpartofthedamagesu eredbyretraction-aicted elds.First,wedocumentthat,eveninthesetofarticlesrelatedtoretractionso eringentirelyabsentshoulderstofollow-onresearchers,intentmattersinmodulatingtheobservedcitationresponses:thepenaltysu eredbyrelatedarticlesismuchmoreseverewhentheassociatedsourcearticlewasretractedbecauseoffraudormisconduct,relativetocaseswheretheretractionoccurredbecauseof\honestmistakes."Second,startingfromthepremisethatstatusconsiderationsarelesslikelytodrivethecitingbehaviorofscientistsemployedinindustry,relativetothatofacademicciters,weshowthattheformeraremuchlessresponsivetotheretractioneventthanthelatter.Whilealearningstorysuggestsstrengtheningtheretractionsysteminitscurrentincarnation,theevidenceforastatusexplanationsuggeststhatresearchersoverreacttoretractionnoticesunderthecurrentsystem.Intheremainderofthepaper,weexaminetheinstitutionalcontextforretractionsasthecentralapproachtogoverningscienti cmistakesandmisconductandlayoutourbroadempiricalstrategy.Wethenturntodata,methodsandadetailedpresentationofourresults.4 2.1InstitutionalContextVeryfewpracticesorsystemsexisttoidentifyandsignalresearchmisconductorerror.IntheUnitedStates,keypublicfundershavecreatedanOceofResearchIntegrity(ORI)toinvestigateallegationsoffraudormisconduct(PozziandDavid2007).Morebroadlyapplicableisthesystemofretractionsusedbyjournalsthemselvestoalertreaderswhenaresearchpublicationisstrickenfromthescienti cliterature.Retractionscanbemadebyallorsomeoftheauthorsofapublication,orbythejournal'seditor,directlyorattherequestoftheauthorsemployer.Theseeventscanoccurforavarietyofreasons,aswedescribebelow.Retractioneventsremainveryrare,withtheunconditionaloddsofretractionstandingataboutonepertenthousand,regardlessofthedatasourceusedtocalculatetheseodds(seeLuetal.2013fortabulationsstemmingfromThomson-Reuters'WebofSciencedatabase).FigureAofSectionIintheonlineappendixdocumentssecularincreasesintheincidenceofretractionsinPubMed,wherethisincidenceismeasuredbothasarawfrequencyandasaproportionrelativetothetotalsizeofthePubMeduniverse.4Asamatterofinstitutionaldesign,thesystemofretractionstreadsatreacherousmiddlegroundinmanagingtheintegrityofscienti cknowledge.Atoneendofthespectrum,scienti csocietiesandjournalscouldmakesigni cantinvestmentsinreplicatingandverifyingallstudiespriortopublication,whileattheotherend,aknowledgeregistrationsystemwithno lteringmechanismcouldrequireresearcherstoexpendconsiderabletimeandenergyonreplicationandvalidation.Theactualsysteminexistencetodayreliesheavilyuponpeer-reviewbutprovidesonlylimitedguaranteethatpublishedknowledgeisofhigh delity.Asaresult,reputationalincentivesplayanessentialroletoensuretheintegrityofthescienti centerprise(Merton1973).Inpractice,retractionnoticesareidiosyncraticandvarywidelyintheamountofinfor-mationtheyprovide,rangingfromaonelinesentencetoamoreelaboratedstatementoftherationalebehindtheretractionevent.Understandingtheirimpactonthescienti ccom-munityisofcentralimportancetotheprocessofcumulativeknowledgeproductionandinderivingimplicationsfortheallocationofresources,humanand nancial,withinandacrossscienti c elds. 4Whilethispaperisnotfocusedonthedeterminantsoffalsesciencebutratheritsimpact,itisworthnotingthattheriseininstancesoffalsescience(oratleasttheincreaseinitsdocumentationviaretractionnotices)maybelinkedtoarangeoffactorsincludingtheincreasinglycomplexandcollaborativeorganizationofthescienti centerprise(Wutchy,JonesandUzzi2007)andthegrowingcompetitionforresourcesinscience.LaceteraandZirulia(2009)notethatcompetitionhasambiguouse ectsontheincidenceofscienti cmisconductsincescientistscanalsogainprominencebydetectinginstancesoffalsescience.6 ofIntentionalDeception"tocodecaseswheretheauthorsdidnotintendtodeceive,suchasinstancesofmiscommunication,contaminationofresearchmaterials,orcodingerror.\UncertainIntent"applieswherefraudisnot rmlyestablished,butnegligenceorunsub-stantiatedclaimsraisequestionsabouttheauthors'motives.The\IntentionalDeception"codeisreservedforcaseswherefalsi cation,misconduct,orwillfulactsofplagiarismandself-plagiarismappeartohaveoccurredandwereveri edbyauthoradmissionsorindepen-dentreviewsofmisconduct.Delineatingresearch elds.Todelineatetheboundariesoftheresearch eldsa ectedbyretractedarticles,wedevelopanapproachbasedontopicsimilarityasinferredbytheoverlapinkeywordsbetweeneachretractedarticlesandtherestofthe(unretracted)scienti cliterature.Speci cally,weusethePubMedRelatedCitationsAlgorithm(PMRA)whichreliesheavilyonMedicalSubjectHeadings(MeSH).MeSHtermsconstituteacontrolledvocabularymaintainedbytheNationalLibraryofMedicinethatprovidesavery ne-grainedpartitionoftheintellectualspacespannedbythebiomedicalresearchliterature.Importantlyforourpurposes,MeSHkeywordsareassignedtoeachscienti cpublicationbyprofessionalindexersandnotbytheauthorsthemselves;theassignmentismadewithoutreferencetotheliteraturecitedinthearticle.Wethenusethe\RelatedArticles"functioninPubMedtoharvestjournalarticlesthatareproximatetotheretractedarticles,implicitlyde ningascienti c eldasthesetofarticleswhoseMeSHkeywordsoverlapwiththosetaggingtheultimatelyretractedarticle.Asabyproduct,PMRAprovidesuswithbothanordinalandacardinaldyadicmeasureofintellectualproximitybetweeneachrelatedarticleanditsassociatedretraction.Forthepurposesofourmainanalysis,weonlyconsiderrelatedarticlespublishedpriortotheretractiondate.Wedistinguishthosepublishedpriortotheretractedarticleandthosepublishedinthewindowbetweentheretractedarticle'spublicationdateandtheretractioneventitself.Further,wealsoexcluderelatedarticleswithanyco-authorsincommonwiththeretractedarticleinordertostripbareourmeasureofintellectualproximityfromany\associationalbaggage"stemmingfromcollaborationlinkages.Finally,webuildasetofcontrolarticlesbyselectingthe\nearestneighbors"oftherelatedarticles,i.e.,thearticlesappearingimmediatelybeforeorimmediatelyafterinthesamejournalandissue,asinFurmanandStern(2011)andFurmanetal.(2012a).6 6Weselectthenearestneighborsascontrolsonthepremisethattheorderingofpapersinjournalissuesisrandomorclosetorandom.Tovalidatethispremise,inanalysesavailablefromtheauthor,wereplicatetheresultsinTable8withanalternativecontrolgroupwhereonecontrolisselectedfromeachjournalissueliterallyatrandom.Theresultsdonotdi ersubstantially.8 Afterpurgingfromthelistafewoddobservations,7weareleftwithasampleof1,104articles.8AsdetailedinSectionIIoftheonlineappendix,wedevelopanexhaustivecategoryschemetocodethereasonsthatexplaintheretractionevent.ThesereasonsaretabulatedinTable1.9Inournextstep,weclassifyeachretractionintooneofthreecategoriesthatdenotewhethertheresultscontainedinthesourcearticlecanberelieduponforfollow-onresearch.The\strongshoulders"subsamplecomprises202articlesretractedforreasonsthatdonotcastanyaspersiononthevalidityoftheresultscontainedtherein.Incontrast,weclassify589retractions(53.4%)asproviding\absentshoulders"forfollow-onscientiststostandon,oftenbecauseoffraudulentdataorothertypesofmisconduct.Finally,the\shakyshoulders"category(289eventsor26.2%ofthecases)groupsthoseretractioneventsforwhichthevalidityoftheresultsremainsshroudedinuncertainty.Mostofouranalysesfocusonthe589observationsbelongingtothe\absentshoulders"subsample(Table2).Thepapersinthissubsamplewerepublishedbetween1973and2007andtookanaveragetimeofthreeyearstoberetracted,thoughmanyofthemorerecentarticleswereretractedwithinoneyear|perhapsbecauseofahigherprobabilityofdetectionsincethedawnoftheelectronicpublishingera.Althoughthissubsampleisdominatedbyinstancesoffraudorothertypesofmisconduct,31%oftheeventsappeartobetheresultsofhonestmistakesonthepartoftheinvestigatorsinvolved,withafurther8%forwhichitisunclearwhetherthescientistsactivelysubvertedthescienti cprocessinthecourseofperformingtheresearchandreportingitsresults.10Regardlessofintent,however,itwouldbeamistaketoconsidereachobservationascompletelyindependentfromalltheothersinthesample.Closetosixtypercentoftheobservationscanbegroupedintocasesinvolvingmorethanoneretractionevent,forexamplebecausethesamerogueinvestigatorcommittedfraudinmultiplepapers,orbecausethesamecontaminatedresearchmaterialswereusedinmultiplepublishedarticles.FigureB 7Theseincludeanarticleretractedandsubsequentlyunretracted,anerratumthatwasretractedbecauseofdisagreementwithintheauthorshipteamaboutwhethertheoriginalarticleindeedcontainedanerror,alongwithafewothers.8Incomparison,Luetal.(2013)extract1,465retractioneventsfromThomsonReuters'WebofScienceoverthesameperiod.TheWebofSciencecoversawidercross-sectionofscienti c elds(includingthesocialsciencesandengineering),buthasshallowercoveragethanPubMedinthelifesciences.Bycombiningtheeventscorrespondingtolifesciencesjournalsaswellasmultidisciplinaryjournals|suchasScience,PNAS,orNature|itappearsthatthelifesciencesaccountforbetween60%and70%ofthetotalnumberofretractionsintheLuetal.sample.9Despiteextensivee orts,wewereunabletolocatearetractionnoticein24(2.17%)cases.10Thisrepresentsaninversionoftherelativeprevalenceoffraudandmistakes,comparedtoanearlieranalysisperformedbyNathetal.(2006),butitisinlinewiththerecentresultsreportedbyFangetal.(2012).10 spacethatincludes100relatedrecords.12Givenoursetofsourcearticles,wedelineatethescienti c eldstowhichtheybelongbyfocusingonthesetofarticlesreturnedbyPMRAthatsatisfy veadditionalconstraints:(i)theyareoriginalarticles(asopposedtoeditorials,comments,reviews,etc.);(ii)theywerepublisheduptotheyearthatprecedesthecalendaryearoftheunderlyingretractionevent;(iii)theyappearinjournalsindexedbytheWebofScience(sothatfollow-oncitationinformationcanbecollected);(iv)theydonotshareanyauthorwiththesource,and(v)theyarecitedatleastoncebyanotherarticleindexedbytheWebofScienceintheperiodbetweentheirpublicationyearand2011.FigureCofSectionIintheonlineappendixrunsthroughaspeci cexampleinthesampletoillustratetheuseofPMRA.13SectionIIIoftheonlineappendixillustratesthroughanexamplehowPMRAprocessesMeSHkeywordinformationtodelineatetheboundariesofresearch elds.Forthesetof589retractionswithabsentshoulders,the naldatasetcomprises32,699relatedarticlesthatcanbeorderedbyrelatednessusingbothanordinalmeasure(therankreturnedbyPMRA)aswellasacardinalmeasurewhichwenormalizesuchthatascoreof100%correspondstothe rst\non-trivial"relatedrecord.14Asaresultofthesecomputationalanddesignchoices,theboundariesofthe eldswedelineatearederivedfromsemanticlinkagestotheexclusionofotherconsiderationssuchasbackwardandforwardcitationrelationships,orcoauthorships.Judgementandsubjectivityiscon nedtotheinitialindexingtaskwhichassignskeywordstoindividualarticles.Theindividualsperformingthesetasksaretrainedinaconsistentway,drawthekeywordsfromacontrolledvocabularywhichevolvesonlyslowlyovertime,anddonothaveanyincentivesto\window-dress"thearticlestheyindexwithtermscurrentlyinvogueinordertocurryattentionfromreferees,editors,ormembersoffundingpanels.Ofcourse,thecostofthisapproachisthatitmayresultinboundariesbetween eldsthatmightonlyimperfectlydovetailwiththecontoursofthescienti ccommunitieswithwhichtheauthorsinoursamplewouldself-identify.Themainbene t,however,isthatitmakesitsensibletousecitationinformationtoevaluatewhetherthenarrow eldsaroundeachretractedarticleatrophyorexpandfollowingeachretractionevent. 12However,thealgorithmembodiesatransitivityruleaswellasaminimumdistancecuto rule,suchthatthee ectivenumberofrelatedarticlesreturnedbyPMRAvariesbetween4and2,642inthelargersampleof1,104retractions,withameanof172recordsandamedianof121.13TofacilitatetheharvestingofPubMed-relatedrecordsonalargescale,wehavedevelopedanopen-sourcesoftwaretoolthatqueriesPubMedandPMRAandstorestheretrieveddatainaMySQLdatabase.Thesoftwareisavailablefordownloadathttp://www.stellman-greene.com/FindRelated/.14Asourcearticleisalwaystriviallyrelatedtoitself.TherelatednessmeasuresarebasedontherawdatareturnedbyPMRA,andignorethe ltersappliedtogeneratethe nalanalysisdataset,e.g.,eliminatingreviews,etc.12 whichtheyareindirectlyassociatedisalmosttwoordersofmagnitudesmallerthantherateofcitationthatlinkstheretractionswiththe\treated"(i.e.,related)articles.Citationdata.PubMeddoesnotcontaincitationdatabutwewereabletoretrievethisinformationfromtheWebofScience(uptotheendof2011)usingaperlscript.Wefurtherprocessthesedatatomakethemamenabletostatisticalanalysis.First,weeliminateallself-citations,whereself-citationisinferredbyoverlapbetweenanyofthecitedauthorswithanyofthecitingauthors(anauthornameisthecombinationformedbythelastnameandthe rstinitialforthepurposeofthis lter).Second,weparsethecitingarticledatatodistinguishbetweentheinstitutionalaliationsofciters,inparticularby aggingthecitingarticlesforwhichatleastoneoftheaddressesrecordedbytheWebofScienceisacorporateaddress,whichweinferfromthepresenceofabbreviationssuchasInc,Corp,GmbH,Ltd,etc.Wethenaggregatethisinformationatthecitedarticle-yearlevelofanalysis.Inotherwords,wecandecomposethetotalnumberofcitations owingtoindividualarticlesatagivenpointintimeintoa\private"anda\public"set,wherepubliccitationsshouldbeunderstoodasstemmingfromacademicscientists,broadlyconstrued(thiswillalsoincludescientistsemployedinthepublicsectoraswellasthoseemployedbynon-pro tresearchinstitutes).Citationsareanoisyandwidely-usedmeasureoftheimpactofapaperandtheattentionitreceives.Buttheuseofcitationdatatotraceoutthedi usionofindividualbitsofscienti cknowledgeissubjecttoanimportantcaveat.Citationscanbemadefor\strategic"ratherthan\substantial"reasons(cf.Lampe[2012]forevidenceinthisspiritinthecontextofpatentcitations).Forexample,authorsofapapermayprefertoreducethenumberofcitationsinordertomakelargerclaimsfortheirownpaper;theymaybemorelikelyto\getawaywithit"(i.e.,nothavingeditorsandrefereesrequesttoaddcitations)ifthestrategicallyuncitedpapersarecloseinintellectualspacetoaretractedpaper.Unfortunately,wedonothavetheabilitytoparsethecitationdatatodistinguishstrategicfromsubstantialcitations,alimitationthatthereadershouldbearinmindwheninterpretingourresults.DescriptiveStatistics.Table3providesbasicinformationaboutthematchedsample.Byconstruction,controlandtreatedarticlesarematchedonyearofpublicationandjournal,andtheyappeartomatchverycloselyonthelengthoftheauthorshiproster.Becauseinmanycases,retractionoccursrelativelyquicklyafterpublication,only30%oftherelatedarticlesinthedataarepublishedafterthepublicationofthesourcearticle,andonly7.9%ofthesearticlescitethesoon-to-be-retractedsource.Conversely,only6.1%ofthearticles14 Itisthenstraightforwardtocomputeyearly\entryrates"intotreatedandcontrol eldsbycountingthenumberofrelatedarticlespublishedinthe eldineachyear.Capturingfund-inginformationatthe eldlevelisslightlymoreinvolved.PubMedsystematicallyrecordsNIHgrantacknowledgementsusinggrantnumbers,butwithoutreferencingtheparticulargrantcycletowhichthepublicationshouldbecredited.Toaddressthisissue,weadoptthefollowingprocedure:foreachrelatedpublication,weidentifytheclosestprecedingyearinathree-yearwindowduringwhichfundingwasawardedthrougheitheranewawardoracompetitiverenewal;wethensumallthefundinginthegrantyearthatultimatelygeneratespublicationsinthefocal eld.Thedescriptivestatisticsforthe eld-levelanalysesaredisplayedonTable4.Thenum-berofobservationsacrossthepublicationfrequencydatasetandthefundingdatasetdi erbecause(i)thefundingdataareavailableonlyuntil2007,whereasthepublicationdataisavailableuntiltheendofourobservationperiod(2011);and(ii)wedropfromthefundinganalysisthe eldsforwhichthereisnotasinglepublicationacknowledgingNIHfundingfortheentire1970-2007period.4ResultsTheexpositionoftheeconometricresultsproceedsinfourstages.Afterabriefexpositionofthemaineconometricissues,wepresentdescriptivestatisticsandresultspertainingtothee ectofretractionsontherateofcitationsthataccruetotheretractedarticles.Second,weexaminetheextentoftheretractione ectonthesetofrelatedarticles.Third,westudywhethertheretractioneventsalteredpatternsofentryandfundingintothescienti c eldsassociatedwiththeretractedarticles.Fourth,weexplicatethemechanism(s)underlyingtheresults.4.1EconometricConsiderationsOurestimatingequationrelatesthenumberofcitationsthatarereceivedbyrelatedarticlejinyearttocharacteristicsofjandofretractedarticlei:E[CITESjtjXijt]=exp[ 0+ 1RLTDjAFTERit+f(AGEjt)+t+ ij]whereAFTERdenotesanindicatorvariablethatswitchestoonetheyearaftertheretrac-tion,RLTDdenotesanindicatorvariablethatisequaltooneforrelatedarticlesandzerofor16 4.2E ectofRetractiononRetractedPapersTable5reportstheresultsfromsimpledi erence-in-di erencesanalysesforthesampleof1,037retractionsand1,922nearestneighborsinthejournalsinwhichtheretractedarticlesappeared.17Column1reportstheestimateoftheretractione ectforthebaselinespeci -cation.Theresultimpliesthat,relativetothecontrols,retractedpaperslose69%oftheircitationsinthepost-retractionperiod.Themagnitudeofthee ectisinlinewiththe60%declineestimatedbyFurmanetal.(2012a)inasmallersampleofPubMed-indexedretrac-tions.Column2showsthatthee ectisbarelya ectedwhenwedropfromthesamplethoseobservationscorrespondingtoretractedarticlesforwhichtheretractionreasonismissing.Column3includesinthespeci cationsthemaine ectoftheretractiontreatmentaswellastwointeractionswiththe\shakyshoulders"and\absentshoulders"indicatorvariables.Inthismodel,themaine ectimplicitlycapturesthepost-retractionfateoftheretractedpapersthatstillmaintain\strongshoulders."Whilethise ectisnegativeandstatisticallysigni cant(withanimplieddecreaseinthecitationrateequalto38%)itsmagnitudeismarkedlysmallerthanthatofthee ectcorrespondingtothe\shakyshoulders"retractions(66%)andsmallerstillthanthee ectforthe\absentshoulders"category(73%).Droppingthe\strongshoulders"groupfromthesampleincreasesthemagnitudeoftheretractione ectinabsolutevalue(to72%,column4),whilefocusingontheearliestretractioneventineachcaseslightlylowerstheestimatede ect(66%,column5).Inshort,ourresultscon rmtheearlier ndingsofFurmanetal.(2012a).Inaddition,theresultsincolumn3provideimportantempiricalvalidationforthecodingexercisedetailedintheonlineappendix.Althoughthecoecientsinthisspeci cationarenotstatisticallydi erentfromeachother,theirmagnitudesareorderedinanintuitiveway,withthepost-retractionpenaltydecreasingmonotonicallywiththestrengthoftheshouldersprovidedtofollow-onresearchers.4.3E ectofRetractiononRelatedPapersWenowturntothecoreoftheempiricalanalysis,examiningthee ectofretractiononthecitationoutcomesfortherelatedarticlesidenti edbythePubMedRelatedCitationsAlgorithm.The rstsetofresultsappearsinTable6,whichisstructuredanalogouslytoTable5.Column1reportsthedi erence-in-di erenceestimatefortheentiresample.We nd 17SixtysevenretractedarticlesneededtobedroppedfromtheestimationsamplebecausetheyappearedinjournalsnotindexedbytheWebofScience.18 Moreover,withanaverageof60relatedpapersperretractedarticle,theaggregatecitationconsequencesoftheretractioneventsforthescienti c eldsinvolvedarenottrivial.Toprovideabettersenseofthemagnitudeoftheseaggregatelosses,weestimateananalogofTable6usingOLSinSectionIVoftheonlineappendix.Thedependentvariableisthenumberofcitationsreceivedinlevels.TheresultsaresubstantiallyunchangedcomparedtoourbenchmarkPoissonspeci cation.Furthermore,thecitationdeclineestimatedtherein(-0.173citationperyear)canformthebasisofback-of-the-envelopecalculation.Usingthisestimateofthecitationpenaltyandaggregatingtothe eldlevel(takingintoaccountboththeaveragenumbersofarticlesper eldandtheaveragelengthofthepost-retractionperiodinthesample),weconcludethatretraction-aicted eldsexperience,onaverage,alossof75citationsrelativetocontrol elds.Stateddi erently,thisisasifwedeletedfromtheaverage eldonepaperintheTop7%ofthedistributionforthetotalnumberoflong-runcitations.Dynamicsofthetreatmente ect.Wealsoexplorethedynamicsofthee ectsuncov-eredinTable6.WedosoinFigure3byestimatingaspeci cationinwhichthetreatmente ectisinteractedwithasetofindicatorvariablescorrespondingtoaparticularyearrel-ativetotheretractionyear,andthengraphingthee ectsandthe95%con denceintervalaroundthem.Twofeaturesofthe gureareworthyofnote.First,thereisnodiscernibleevidenceofane ectintheyearsleadinguptotheretraction,a ndingthatvalidatesexpostouridenti cationstrategy.19Second,aftertheretraction,thetreatmente ectincreasesmonotonicallyinabsolutevaluewithnoevidenceofrecovery.Exploringheterogeneityinthee ectofretractions.Weexploreanumberoffactorsthatcouldmodulatethemagnitudeoftheretractione ectonintellectualneighbors'citationrates.Table7reportstheresultsofsevenspeci cationsthatincludeinteractiontermsbetweentheretractiontreatmente ectandcharacteristicsofeithertheretractedarticleortheretracted/relatedarticledyad.Column1evaluateshowthecumulativeattentiontotheretractedarticlea ectsthereductionofcitationtorelatedarticles.Therationaleforthisanalysisisthatcitationsareaproxyfortheamountofattentionthatscientistsinthe eld(andotherrelated elds)gavetotheretractedpaperpriortoretraction,andmaybeapredictorfortheamountofcollateraldamageinagiven eld.Thecoecientontheinteractiontermshowsthathighlycitedretractedpapers|thoseinthetop25th 19This ndingisalsoreassuringasitsuggeststhatretractionsarenotendogenoustotheexhaustionofaparticularintellectualtrajectory,i.e.,itdoesnotappearasifresearchersresorttothetypeofmisconductthatyieldsretractionsafteruncoveringevidencethattheir eldisonthedecline.20 relatedarticlesthatarealsocitedbytheretractionexperiencea6.1%boostinthecitationratefollowingtheretractionevent.Thisresultisconsistentwiththeideathattheresearcherswhocontinuetoworkinthe eldinspiteoftheretractioneventchoosetobuildinsteadonprior,unretractedresearch.Theoveralle ectonthe eldcanstillbenegativesinceonlyasmallfraction(7.9%)ofarticlesrelatedtothesourcearealsocitedbythesource.Column6usesourcodingofauthor\intent"tocomparehowthetreatmente ectofretractiondi ersinclearcasesoffraudfromfraud-freeretractioncasesorthosewithuncertainintent.Weseethatcasesof\IntentionalDeception"largelydrivethenegativee ectonthe eld'scitations(-7.8%),while eldsthatexperiencedretractionswith\NoSignofIntentionalDeception"(theomittedcategory)hadnocitationdecline,onaverage.FigureDofSectionIintheonlineappendixexplorestheextenttowhichtheageofarelatedarticleatthetimeoftheretractioneventin uencesthemagnitudeofthetreatmente ect.Inthis gure,eachcirclecorrespondstothecoecientestimatesstemmingfromaspeci cationinwhichthecitationratesforrelatedarticlesandtheircontrolsareregressedontoyeare ects,articleageindicatorvariables,aswellasinteractiontermsbetweenthetreatmente ectandthevintageofeachrelatedarticlesatthetimeoftheretraction.Sincerelatedarticlesinthesamplearepublishedbetweenoneandtenyearsbeforetheirassociatedretractionevent,therearetensuchinteractionterms.22Theresultsshowthatonlyrecentarticles(thosepublishedone,two,orthreeyearsbeforetheretraction)experienceacitationpenaltyinthepost-retractionperiod,whereasolderarticlesarerelativelyimmunetotheretractionevent.Finally,Figure4andFigureE(SectionIintheonlineappendix)investigatetheextenttowhich\relatedness"(inthesenseofPMRA)exacerbatesthemagnitudeoftheresponse.InFigure4,weusetheordinalmeasureofrelatedness,namelytherankreceivedbyafocalarticleinthelistreturnedbyPMRAforaspeci csourcearticle.Wecreate22interactionvariablesbetweentheretractione ectandtherelatednessrank:Top5,Top6-10,...,Top95-100,100andabove.Theresultsshowthatlower-ranked(i.e.,morecloselyrelated)articlesbearthebruntofthenegativecitationresponseinthepost-retractionevent.FigureEisconceptuallysimilar,exceptthatitreliesonthecardinalmeasureofrelatedness.Wecreateonehundredvariablesinteractingtheretractione ectwitheachpercentileoftherelatednessmeasure,andestimatethebaselinespeci cationofTable7,column1inwhichthemainretractione ecthasbeenreplacedbythe100correspondinginteractionterms.FigureEgraphstheestimatesalongwiththe95%con denceintervalaroundthem.Theresultsare 22The95%con denceintervals(correspondingtorobuststandarderrors,clusteredaroundcasecodes)aredenotedbytheblueverticalbars.22 investigators(post-docsandgraduatestudentsareoftenlistedas rstauthors)areatfaultforretraction.Furthermore,we ndthatretracted rstauthorsandmiddleauthorsarelesslikelytoreappearin eldsinwhichpapershavebeenretractedthanareretractedlastauthors(SectionVIoftheonlineappendix).Theseanalysessuggestthat(1)thestrengthofthetreatmente ectisgreatestwhentheauthorculpableforretractionisthe rstauthor(ratherthanthelastauthor)and(2)thatthepublicationdeclineisnotdrivenbytheexitofPIsorlabdirectors,butmaybedrivenbytheexitof rstauthors.24Tosummarize,theseresultshelpexplainwhyweobservedownwardmovementinthecitationsreceivedbyrelatedarticleshighlightedearlier:Therearefewerpapersbeingpub-lishedinthese eldsandalsolessfundingavailabletowritesuchpapers.Whilethesee ectsconstitutetheproximatecausesofthenegativespilloversthatarethecentral ndingofthepaper,theybegthequestionofwhattheunderlyingmechanismsare.Whatexplainsthe ightofresourcesawayfromthese elds?4.5UnderlyingMechanismsoftheRetractionE ectAnumberofmechanismsmayunderlieour ndingsregardingnegativecitation,entry,andfunding.Weinvestigateevidenceregardingtwopossibilities.First,arelativedecreaseinattentionsubsequenttoretractionmayre ectscientists'learningaboutthelimitedpotentialforfollow-onresearchinretraction-aicted elds.ThecaseofJan-HendrikSchoniscon-sistentwiththisexplanation.Schon'sresearchatBellLabsinitiallyproducedspectacularresultsusingorganicmaterialstoachievea eld-transistore ect;hisresultswereeventuallydemonstratedtohavebeentheresultoffraudulentbehaviorandsubsequente ortsbuildingonhisworksuggesttheimpossibilityofachieving eld-transistore ectsusingthematerialsSchonemployed(Reich,2009).Second,the eld-leveldeclinesincitation,entry,andfundingweobservecouldalsoarisefromafearofreputationalassociationwiththe\contaminated" eldsorauthors.ThecaseofWoo-SukHwangthatweinvokeatthebeginningofthepaperisconsistentwiththistypeofexplanation:Follow-onresearcherseschewedallimplicationsofHwang'swork,althoughsomewouldprovepromisingwhenthe eldrevisitedhisworkafewyearsaftertheretractions.Althoughwemaynotbeabletoruleouteitherexplanationentirely,exploringtherela-tiveimportanceofthesemechanismsmattersbecausetheirwelfareimplicationsdi er.Forexample,itmaybeidealfromasocialplanner'sperspectiveifscientistssimplyredirect 24TheseresultsaccordwellwiththeevidencepresentedinJinetal(2013).24 haveadisproportionatee ectonfuturecitationsdoesnotlendsupporttotheseexplanatorymechanisms.Tofurtherinvestigatethepossibilitythatareputationalmechanismmaybeatwork,weexamineheterogeneousresponsesbetweenacademic-and rm-basedciters.Westartfromthepremisethatscientistsemployedbypro t-seeking rmswouldpersistininvestigatingtopicsthatuniversity-basedscientists(andNIHstudysections)frownupon(postretrac-tion),aslongasthepossibilityofdevelopingacommercialproductremains.28Weparsetheforwardcitationdatatoseparatethecitationsthatstemfromprivate rms(mostlypharma-ceuticalandbiotechnology rms,identi edbysuxessuchasInc.,Corp.,LLC,Ltd.,GmbH,etc.)fromthosethatoriginateinacademia(broadlyde nedtoincludenon-pro tresearchinstitutesandpublicresearchinstitutionsaswellasuniversities).Eventhoughweclassifyas\private"anycitingarticlewithamixofprivateandacademicaddresses,almost90%ofthecitationsinoursampleare\academic"accordingtothisde nition.InTable9,columns1aand1b,we ndthatacademicandprivatecitersdonotdi eratallintheextenttowhichtheypenalizetheretractedarticles.Conversely,columns2aand2bindicatethatprivatecitershardlypenalizerelatedarticles,whereasacademiccitersdototheextentpreviouslydocumented.29Thedi erencebetweenthecoecientsisstatisticallysigni cant(p0:01).These ndingsareconsistentwiththeviewthattheretraction-inducedspilloverswehavedocumentedstem,atleastinpart,fromacademicscientists'concernthattheirpeerswillholdtheminloweresteemiftheyremainwithinanintellectual eldwhosereputationhasbeentarnishedbyretractions,eventhoughtheseresearcherswereneithercoauthorsontheretractedarticleitselfnorbuildingdirectlyuponit.Itispossible,however,thatthesedi erencesarisebecauseindustryscientists nditeasiertosubstitutecitationswithina eldbecausetheirworkismoreappliedinnature.30Toinvestigatethispossibility,wehavematchedthePubMeddatabasewiththeUSpatentdatatoidentifythecitationsreceivedfrompatentsbypublishedscienti carticles.31Our citationsandentryina ected elds,asscientistsspendtimetryingtoinvestigateandverifyresultsrelatedtotheretractedpaper.28Wegroundourassumptionsregardingthepotentiallydi erentialresponsesofacademic-andindustry-basedscientistsbyappealingtopriorworkondi erencesinincentivesandstatusconcernsamongacademicandindustrialscientists,theformerofwhomhaveprincipally(thoughnotexclusivelypriority-basedincen-tives)andthelatterofwhomfacestronger(thoughnotexclusive) nancialandorganizationalincentivesthatarenotdirectlytiedtostandingintheresearchcommunity(DasguptaandDavid,1994;Stern,2004).29Theestimationsampleislimitedtothesetofrelatedarticlesandtheircontrolsthatreceiveatleastonecitationofeachtypeovertheobservationperiod.30Wethankananonymousrefereeforthissuggestion.31SeeAppendixDinAzoulayetal.(2012)formoredetailsonthepatent-to-publicationmatchingprocessthatprovidesafoundationfortheanalysespresentedinTable9.26 evaluationhasanumberofinterestingimplications.Throughthecodingschemewehavede-velopedtounderstandtheparticularcircumstanceofeachretractionevent,wehighlightthelimitationsoftheinstitutionalpracticesthataresupposedtoensurethe delityofscienti cknowledge.Inparticular,theanalysisbringssystematicevidencetobearontheheightenedattentiondevotedtothetopicofscienti cmisconductinsciencepolicycircles.Someana-lystssuggestthatthescienti crewardsystemhasbeencorruptedandisinneedofwholesale,radicalreform(Fangetal.2012).Thisviewpointstotheincreaseindetectedfraudsanderrorsasastrongindicationthatmuchinvalidsciencegoesundetected.Acknowledgingthispossibility,othersretortthatasystemofretractionsispreciselywhatthe\RepublicofScience"requires:amechanismthatswiftlyidenti esfalsescienceande ectivelycommu-nicatesitsimplicationsforfollow-onresearch(Furmanetal.2012a).Thevalidityofthemoreoptimisticviewhingescruciallyonwhatissignaledbyaretractionnoticeandonhowscientistsinthea ected eldsprocessthisinformationandactuponit.Ourresultssuggestthatretractionsdohavethedesirede ectontheparticularpaperinquestion,butalsoleadtospillovere ectsontothesurroundingintellectual elds,whichbecomelessvibrant.Ifthesenegativespilloverssimplyre ectedthediminishedscienti cpotentialoftheaf-fected elds,thenthe\collateraldamage"inducedbyretractionswouldnotbeacauseforconcernandwouldreinforcethebeliefthattheretractionprocessisarelativelye ectivewaytopolicethescienti ccommons(Furmanetal.2012a).However,ourevidenceindicatesthatbroadperceptionsoflegitimacyareanimportantdriverofthedirectionofscienti cinquiry.Unfortunately,retractionnoticesoftenobfuscatetheunderlyingreasonforretraction,whichdiminishestheinformationcontentofthesignaltheyprovidetofollow-onresearchers.Asaresult,therecouldbehighreturnstodevelopingastandardizedcodingapproachforretrac-tionsthatjournalsandscienti csocietiescoulddrawupontohelpthescienti ccommunityupdatetheirbeliefsregardingthenatureandscopeoffalsescience.Whilejournaleditorsmayunderstandablybalkatthesuggestionthatitisincumbentuponthemtomakecleardeterminationsregardingtheunderlyingcausesofretractions,aclearly-articulatedschemawouldincreasetheincentivesofauthorstoreportproblemsemergingafterthepublicationofanarticleandprovideamorenuancedcontextwithinwhichuniversitiesthemselves(aswellasfundingbodies)mightinvestigateandadjudicateinstancesoffalsescience.32 32Alternativemechanisms|suchas\replicationrings"|havebeenproposedtocounteractthenegativespilloversinintellectualspaceassociatedwithretractionevents(Kahneman2012).Whether\local"responsesofthistypecanbeimplementedsuccessfullyisquestionable,inlightofthecoststheywouldimposeonresearchersactiveinretraction-a ected elds.28 SectionIICodingofRetractionReasonsThepurposeofthisdocumentistodescribetheretractionscodingschemethatformsthebasisoftheanalysisimplementedinthemainbodyofthepaper,aswellastoprovideamethodforclassi cationoffutureretractions.Thegoalistoreconciletwocontradictoryobjectives:onetheonehand,groupretractionsintoasmallnumberofmutuallyexclusivecategories;ontheotherhand,captureinameaningfulwaytheinherentheterogeneityinretractionreasons.Thecodingschemehasbeendevelopedbytheauthorssolelyforthepurposeofscholarlyacademicre-search.Thecodingofeachindividualretractionisbasedonarangeofpublicinformationsources,rang-ingfromthenoticeofretractionitself,toentriesinthe\RetractionWatch"blog,toresultsofGooglesearches.Noadditionalinformationhasbeengatheredfromtheauthorsoftheretractedpapersoroth-ersinvolvedinthesecases.Assuchthecodingrepresentsaninformedjudgmentofthecontextinwhicheachretractioneventtookplace,ratherthantheoutcomeofaformalinvestigation.Thelistofretrac-tions,articlecharacteristics,andreasonscanbedownloadedfromtheinternetatthefollowingURL:http://jkrieger.scripts.mit.edu/retractions/. MethodsSummary:AnalysisofretractionsindexedbyPubMed,publishedbetween1973and2008,andretractedbeforetheendof2009,yielded13mutuallyexclusive\reasons"categories(seelistbelow).Ina rststep,weassignoneofthesereasonstoeachretractedarticlesolelybasedo theinformationcontainedintheretractionnotice.Inasecondstep,weassignedareasontoeachretractedarticlebasedo informationinthenoticeaswellasanyadditionalinformationfoundthroughinternetsleuthing(e.g.,newsarticles,blogs,pressreleases,etc.):Wealsocodeeachretractionobservationbasedonitsvalidityasafoundationforfutureresearch.These\shoulders"categoriesareStrongShoulders,ShakyShoulders,andAbsentShoulders.StrongShoul-dersmeansthattheretractiondoesnotcastdoubtonthevalidityofthepaper'sunderlyingclaims.Apublishermistakenlyprintinganarticletwice,anauthorplagiarizingsomeoneelse'sdescriptionofaphe-nomenon,oraninstitutionaldisputeabouttheownershipofsamplesareallexampleswherethecontentoftheretractedpaperisnotinquestion.ShakyShouldersmeansthatthevalidityofclaimsisuncertainorthatonlyaportionoftheresultsareinvalidatedbytheretraction.AbsentShouldersistheappropriatecodeinfraudcases,aswellasininstanceswherethemainconclusionsofthepaperarecompromisedbyanerror.Lastly,weattempttodiscernthelevelofintentionaldeceitinvolvedineachcase.Deceptionmightinvolvethepaper'sactualclaims(results,materialsmethod),itsattributionofscholarlycreditthroughauthorshipandcitations,ortheoriginalityofthework.WeuseNoSignofIntentionalDeceptiontocodeinstanceswheretheauthorsdidnotintendtodeceive,suchasinthecaseof\honestmistakes"ormiscommunications.UncertainIntentapplieswherefraudisnot rmlyestablished,butnegligenceorunsubstantiatedclaimsraisequestionsaboutanauthor'smotives.TheIntentionalDeceptioncodecoverscaseswherefalsi cation,intentionalmisconductorwillfulactsofplagiarismappeartohaveoccurred.The\intent"and\shoulders"codingareinherentlymoresubjectivethanthatoftheunderlyingretractionreasons.Infact,thereisnosimplemappingofthelatterintoreasonsintotheformer:eachshouldersorintentcodeisassignedbasedonathoroughreviewoftheavailableevidenceineachcaseandaccordingtotheguidelinesbelow.Thereasonscategoriescaptureacombinationofcontext,validityandintent,whilethe\shoulders"codeonlypertainstothevalidityofthearticle'scontent,andthe\intent"coderelatesonlytointent.1 1Foreachcategory,wementionasmallnumberofPubMedIDsofnoticesthatcanserveasgoodillustrationsofthecodingchoice.vi theauthorguiltyoffalsi cationandplagiarism(#12411512,#12833069,#19575288).Ifaretractionmeetsthecriteriaof\FakeData&Plagiarism"thenAbsentShouldersandIntentionalDeceptionarethelogicalcomplementarycodes.6.Duplication.Theimportantcriterionfor\Duplication"isthattheauthorscopiedfromthemselves.Mostofthearticlesinthiscategoryalreadyappearedinanotherjournalbeforethesecondjournalrealizedthattheentirearticleisanexactduplicateorvirtuallyidenticaltoanarticlebythesameau-thorsinadi erentjournal(#12589830).Someofthe\Duplication"casesarenotentirelyrepublishedarticles,butwillreproduceimportantcontent,suchasdata,chartsandconclusions(#15580694).Aswithplagiarismcases,thesecasesareassignedtheStrongShoulderscodebydefault,butmayfallintotheShakyShouldersbucketwhenmeaningfuldi erencesexistbetweentheduplicatedarticleanditsoriginalversion(#1930642).The\intent"codingfollowsasimilarlogic,withIntentionalDeceptionbeingtheprimaryclassi cation.Yet,UncertainIntentsometimesisthemorelogicalchoicewhenduplicationresultedfromanapparentmiscommunication(#17047133,#16683328).7.QuestionsaboutValidity.Thiscategorycapturesretractioncasesassociatedwithvaguemis-conductallegations(#118049464),suspicious\irregularities"(#118560433),and\questionable"data(#118951275).Thehallmarkoftheseretractionnoticesisthattheyobfuscatethenatureofthemisconduct.Thevaguenatureofthiscategory'snoticesmakesShakyShouldersandUncertainIntentthefrequentchoiceforcomplementarycodes.8.AuthorDispute.Thesecasesinvolvedisagreementsbetweenauthorsaboutcontent,credit,andpermission.Often,thesedi erenttypesofdisputeswillbecombined(#14723797,#19727599).Papersubmissionwithouttheconsentofcoauthorsisthemostcommonunderlyingreasonforthiscode.Unlesswarrantedbyinformationgainedthroughsleuthing,ShakyShouldersistheappropriatecodefor\AuthorDispute"cases|mostdisputesstemfromcon ictssurroundingcreditattributionandtheveri cationofresults,ratherthanoutrightfraud.IntentionalDeceptionistheprevalentintentcodein\AuthorDispute"cases,thoughexceptionsdoexist(#17081259;#16003050).9.LackofConsent/IRBApproval.ThiscategoryincludescaseswheretheauthorsdidnotgetIRBapprovalordidnotsecurepatientinformedconsentbeforeconductingtheirstudy.Ambiguouscasesof\ethicsviolations"(#19819378,#18774408)alsofallintothe\LackofConsent/IRBApproval"category.Thedefault\shoulders"codeisShakyShoulders.StrongShouldersmaybeappropriateifthereisevidenceindicatingthattheauthorsbelievedtheyhadIRBapproval(#14617761),orthatthepaper'sresultsaredevoidoffraud/deception(#16832233).Determiningthelevelofintentislessstraightforwardforthiscategory.Ingeneral,ethicsviolationscountasIntentionalDeception,butuncertaintyaboutauthorintentmaywarrantothercodingchoices.Forexample,theauthorsmayhaveerroneouslythoughttheyreceivedIRBapproval(#14617761),orapprovalmayhavebeenociallyobtainedonlyaftertheauthorscompletedthestudy(#16842490).10.DidNotMaintainProperRecords.Althoughthedatasetonlyhasthreeretractionsthatfallintothiscategory,weincludeitasdistinctretractingreason.Thede ningcharacteristicofthiscategoryisabsenceofproperdatarecords.Withproperrecords,thescienti ccommunitycouldbetterdeterminethereliabilityoftheclaimscontainedinthesepapers.ShakyShouldersandUncertainIntentarethepropercomplementarycodes.11.PublisherError.Retractionsoccasionallystemfrompublishermistakesratherthanauthormis-conductorerror.Theassociatednoticesestablishthatthepublisherissolelyresponsiblefortheerror,whichisusuallyaduplicatepublication(#17452723,#15082607)orprintingofanearlierdraft(#19662582,#15685781).StrongShouldersisanatural tforpublishererrorsresultingfromduplicates,whileShakyShouldersisappropriatewhenthejournalprintsthewrongdraft.Byde nition,theproperintentcodingisNoSignofIntentionalDeception.12.NotEnoughInformationtoClassifyorMissing.Theessentialdi erencebetweenthesetwocategoriesisthatwehaveanoticefortheformeranddonothaveanoticeforthelatter.\NotEnoughInformationtoClassify"impliesthatthenoticeissovaguethatwecannotassignanothercode.Suchviii SectionIIIPubMedRelatedCitationsAlgorithm[PMRA]ThefollowingparagraphswereextractedfromabriefdescriptionofPMRA:2Theneighborsofadocumentarethosedocumentsinthedatabasethatarethemostsimilartoit.Thesimi-laritybetweendocumentsismeasuredbythewordstheyhaveincommon,withsomeadjustmentfordocumentlengths.Tocarryoutsuchaprogram,onemust rstde newhatawordis.Forus,awordisbasicallyanunbrokenstringoflettersandnumeralswithatleastoneletterofthealphabetinit.Wordsendathyphens,spaces,newlines,andpunctuation.Alistof310common,butuninformative,words(alsoknownasstopwords)areeliminatedfromprocessingatthisstage.Next,alimitedamountofstemmingofwordsisdone,butnothesaurusisusedinprocessing.Wordsfromtheabstractofadocumentareclassi edastextwords.Wordsfromtitlesarealsoclassi edastextwords,butwordsfromtitlesareaddedinasecondtimetogivethemasmalladvantageinthelocalweightingscheme.MeSHtermsareplacedinathirdcategory,andaMeSHtermwithasubheadingquali erisenteredtwice,oncewithoutthequali erandoncewithit.IfaMeSHtermisstarred(indicatingamajorconceptinadocument),thestarisignored.Thesethreecategoriesofwords(orphrasesinthecaseofMeSH)comprisetherepresentationofadocument.Noother elds,suchasAuthororJournal,enterintothecalculations.Havingobtainedthesetoftermsthatrepresenteachdocument,thenextstepistorecognizethatnotallwordsareofequalvalue.Eachtimeawordisused,itisassignedanumericalweight.Thisnumericalweightisbasedoninformationthatthecomputercanobtainbyautomaticprocessing.Automaticprocessingisimportantbecausethenumberofdi erenttermsthathavetobeassignedweightsisclosetotwomillionforthissystem.Theweightorvalueofatermisdependentonthreetypesofinformation:1)thenumberofdi erentdocumentsinthedatabasethatcontaintheterm;2)thenumberoftimesthetermoccursinaparticulardocument;and3)thenumberoftermoccurrencesinthedocument.The rstofthesepiecesofinformationisusedtoproduceanumbercalledtheglobalweightoftheterm.Theglobalweightisusedinweightingthetermthroughoutthedatabase.Thesecondandthirdpiecesofinformationpertainonlytoaparticulardocumentandareusedtoproduceanumbercalledthelocalweightoftheterminthatspeci cdocument.Whenawordoccursintwodocuments,itsweightiscomputedastheproductoftheglobalweighttimesthetwolocalweights(onepertainingtoeachofthedocuments).Theglobalweightofatermisgreaterforthelessfrequentterms.Thisisreasonablebecausethepresenceofatermthatoccurredinmostofthedocumentswouldreallytelloneverylittleaboutadocument.Ontheotherhand,atermthatoccurredinonly100documentsofonemillionwouldbeveryhelpfulinlimitingthesetofdocumentsofinterest.Awordthatoccurredinonly10documentsislikelytobeevenmoreinformativeandwillreceiveanevenhigherweight.Thelocalweightofatermisthemeasureofitsimportanceinaparticulardocument.Generally,themorefrequentatermiswithinadocument,themoreimportantitisinrepresentingthecontentofthatdocument.However,thisrelationshipissaturating,i.e.,asthefrequencycontinuestogoup,theimportanceofthewordincreaseslessrapidlyand nallycomestoa nitelimit.Inaddition,wedonotwantalongerdocumenttobeconsideredmoreimportantjustbecauseitislonger;therefore,alengthcorrectionisapplied.Thesimilaritybetweentwodocumentsiscomputedbyaddinguptheweightsofallofthetermsthetwodocu-mentshaveincommon.Oncethesimilarityscoreofadocumentinrelationtoeachoftheotherdocumentsinthedatabasehasbeencomputed,thatdocument'sneighborsareidenti edasthemostsimilar(highestscoring)documentsfound.Thesecloselyrelateddocumentsarepre-computedforeachdocumentinPubMedsothatwhenoneselectsRelatedArticles,thesystemhasonlytoretrievethislist.Thisenablesafastresponsetimeforsuchqueries.WeillustratetheuseofPMRAwithanexampletakenfromoursample.AmitavHajraisaformerUniversityofMichigangraduatestudentwhofalsi eddatainthreepapersretractedin1996.OneofHajra'sretractedpapers(PubMedID#7651416)appearedintheSeptember1995issueofMolecularandCellularBiologyandlists27MeSHterms.Its10thmostrelatedpaper(PubMedID#8035830),accordingtothePMRAalgorithm,appearedinthesamejournalinAugust1994andhas23MeSHterms,10ofwhichoverlapwiththeHajraarticle.Thesetermsincludecommontermssuchas\Mice"and\DNA-BindingProteins/genetics"aswellasmorespeci ckeywordsincluding\CoreBindingFactorAlphaSubunits,"\NeoplasmProteins/metabolism," 2Availableathttp://ii.nlm.nih.gov/MTI/related.shtmlx

Related Contents


Next Show more