/
3.TheCMSCollaboration,J.Instrumentation,S08004(2008).4.U.S.NationalAca 3.TheCMSCollaboration,J.Instrumentation,S08004(2008).4.U.S.NationalAca

3.TheCMSCollaboration,J.Instrumentation,S08004(2008).4.U.S.NationalAca - PDF document

pamella-moone
pamella-moone . @pamella-moone
Follow
374 views
Uploaded On 2016-06-01

3.TheCMSCollaboration,J.Instrumentation,S08004(2008).4.U.S.NationalAca - PPT Presentation

InstituteforQuantitativeSocialScience1737CambridgeStreetHarvardUniversityCambridgeMA02138USAEmailkingharvardeduwwwsciencemagorgVOL33111FEBRUARY2011 collapsingunlesstheimprovementsinmethods ID: 344371

InstituteforQuantitativeSocialScience 1737CambridgeStreet HarvardUniversity Cambridge MA02138 USA.E-mail:king@harvard.eduwww.sciencemag.orgVOL33111FEBRUARY2011 collapsingunlesstheimprovementsinmethods

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "3.TheCMSCollaboration,J.Instrumentation,..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

3.TheCMSCollaboration,J.Instrumentation,S08004(2008).4.U.S.NationalAcademyofEngineeringandRoyalAcademyofEngineering,FrontiersofEngineering,EU-USSymposium,Cambridge,UK,31Augustto3September2010;www.raeng.org.uk/international/activities/frontiers_engineering_symposium.htm.5.E.J.Candès,J.Romberg,T.Tao,IEEETrans.Inf.Theory,489(2006).6.D.L.Donoho,IEEETrans.Inf.Theory,1289(2006).7.J.B.Tenenbaum,V.deSilva,J.C.Langford,2319(2000).8.R.G.Baraniuk,M.B.Wakin,Found.Comput.Math.,519.S.Muthukrishnan,Found.TrendsTheor.Comput.Sci.(issue2),117(2005).10.N.Snavely,S.M.Seitz,R.Szeliski,ACMTrans.Graph.835(2006).10.1126/science.1197448PERSPECTIVEEnsuringtheData-RichFutureoftheSocialSciencesGaryKingMassiveincreasesintheavailabilityofinformativesocialsciencedataaremakingdramatic InstituteforQuantitativeSocialScience,1737CambridgeStreet,HarvardUniversity,Cambridge,MA02138,USA.E-mail:king@harvard.eduwww.sciencemag.orgVOL33111FEBRUARY2011 collapsingunlesstheimprovementsinmethodsforsharingsensitive,private,orproprietarydata)areabletobemodifiedfastenoughtokeepupwiththechangesinthetypesandquantitiesofdatabecomingavailableandunlesspublicpolicyadaptstopermitandencourageresearcherstousethem.Thenecessarytechnologicalinnovationsaremoredifficultthanitmayseem.Forexample,thevenerablestrategyofanonymizingdataisnotveryusefulwhen,forexample,dateofbirth,gender,andZIPcodealoneareenoughtoper-sonallyidentify87%oftheU.S.population(Andthecross-classificationof10surveyques-tionsof10categorieseachcontainsmoreuniqueclassificationsthantherearepeopleontheplanet.Andnowthinkofthechallengesofsharingcontinuous-timecellphone–locationinformationfromawholecity,orbiologicalinformationwithhundredsofthousandsofvari-ables.Thepoliticalsituationisalsocomplicated,withamediastormgeneratedbyeachnewrevelationofhowpersonalinformationisbecom-ingpubliclyavailable,butatthesametimecitizensarevoluntarilygivingupmoreprivacythanever,suchasviatherapidtransitionfromprivatee-mailtopublicorsemi-publicso-cialmediaposts.Ifprivacycanbeprotectedinawaythatstillallowsdatasharing,considerableprogresscanbemadeforpeopleeverywherewithoutharmcomingtoanyoneresearchsubject.Thisseemseasierthan,forexample,thesituationwithmostrandomizedmedicalexperiments,inwhichifeverythingworksasexpectedthoseinonetreatmentarmwillbeharmedrelativetothoseintheotherarms.Moreover,mostconcernaboutdatasharinginvolvesindividuals,where-associalscientistsusuallyseektomakegeneralizationsaboutaggregates,andsospanningthedivideisoftenpossiblewithap-propriatestatisticalmethods.Whatcanwedototakeadvantageofthenewdatawhilefacilitatingdatasharingandatthesametimeprotectingprivacy?First,beforewetrytoconvinceotherpartsofsocietytogiveussomeleeway,wesocialscientistsneedtogetourownacttogether.Atpresent,largedatasetscollectedbysocialscientistsinmostfieldsareroutinelyshared,butthefarmoreprevalentsmallerdatasetsthatareuniqueorderivedfromlargerdatasetsareregularlylost,hidden,orunavailableoftenmakingtherelatedpublicationsunreplica-ble.Inmostcases,manydatasetsassociatedwithindividualpublications,andtherelatedcomputercodeandotherinformationnecessarytoreproducethepublishedtablesandfiguresfromtheinputdata,arenotavailableunlessyouobtainpermissionoftheoriginalauthor,withnoenforceablerulesgoverningwhenaccessmustbeprovided.Thisdeservesseriousreconsiderationandaction.WeneedtodevolveWebvisibilityandscholarlycreditforthedatatotheoriginalauthorwhileensuringthatthedataareprofessionallyarchivedwithaccessstandardsformalizedinrulesthatdonotrequireadhocdecisionsoforcontrolbytheoriginalauthor,).Second,weneedtonurturethegrowingreplicationmovement(,).Moreindividualscholarsshouldseeitastheirresponsibilitytodepositdataandreplicationinformationinpublicarchives,suchasthoseassociatedwiththeDataPreservationAlliancefortheSocialSciences16).Morejournalsshouldencourageorrequireauthorstomakedataavailableasaconditionofpublication,andgrantingagenciesshouldcon-tinuetoencouragedatasharingnorms.Moreimportantly,whenweteachweshouldexplainthatdatasharingandreplicationisanintegralpartofthescientificprocess.Studentsneedtounder-standthatoneofthebiggestcontributionstheyoranyoneislikelytobeabletomakeisthroughdatasharing(Third,weneedtocontinueresearchintoprivacy-enhanceddatasharingprotocols()andtocommunicatebetterwhatispossibletogov-ernmentofficials.Moderntechnologyallowshundredsofmillionsofpeopletodoelectronicbanking,commerce,andinvestingontheweb;toviewtheirpersonalmedicalrecords;tostoretheirphotographs,videos,andpersonaldocumentsonline;andtosharewithselectedindividualstheirmostprivatethoughtsandsecrets.Sowhy,whenanalyzingtheseandotherpersonallyiden-tifiablesensitivedataforthepublicgood,doespolicyregularlyrequireresearchers(throughuni-versityInstitutionalReviewBoards)todotheirworkinlockedroomswithoutaccesstotheIn-ternet,otherdatasources,electroniccommuni-cationwithotherresearchers,ormanyoftheirusualsoftwareandhardwaretools?Surelywecandeveloppolicies,protocols,legalstandards,andcomputersecuritysothatprivacycanbemain-tainedwhiledatasharingandanalysisproceedsinfarmoreconvenient,efficient,andproductiveways.Progressinsocialsciencere-searchwouldbegreatlyacceleratedifpoliciesmerelyallowedresearch-ersmoreoften—astheydocorpo-rations,governments,andprivatecitizenstoanalyzesensitivedatausingappropriatedigitalratherthanphysicalsecurity.Fourth,evenwhenprivacyisnotanissue,datasharinginvolvesmorethanputtingthedataonaWebsite.Scientistsandeditorsofscholarlyjournalsarenotprofessionalarchi-vists,andmanyhomegrownone-offsolutionsdonotlastlong.Datafor-matshavebeenchangingsofastthatarchivingstandardsrequirespecialpreservationformatting,usingin-ternationallyagreed-uponmeta-dataprotocolsandappropriatedatacitationstandards.Socialscientistsneedtocontinuetobuildacommon,open-source,collaborativeinfrastruc-turethatmakesdataanalysisandsharingeasy().However,un-lesswearecontenttoletdatashar-ingworkonlywithindisciplinarysiloswhichofcoursemakeslittlesenseinanerawhensocialscienceresearchismoreinterdisciplinarythaneverweneedtodevelopsolutionsthatop-erate,oratleastinteroperate,acrossscholarlyfields.Last,socialscientistscoulduseadditionalhelpfromthelegalcommunity().Standardintel-lectualpropertyrulesanddatauseagreementsneedtobedevelopedsothateverydatasetdoesnothaveitsownessentiallyartisanlegalworkthatmerelyincreasestransactioncostsandreducesdatasharing.Thefederalgovernmentshouldre-considerandrelaxtherulesthatpreventaca-demicresearchersfromcollecting,sharing,andpublishingfromdatathatthoseinothersectorsofsocietydoroutinely.Ofcourse,socialscientistshaveplentytodoevenbeforewepublishandsharedata.Wemustfindwaysofeducatingstudentsaboutnon-standarddatatypes,computationalmethodsthatscale,legalprotocols,datasharingnorms,andstatisticaltoolsthatcantakeadvantageofthe THE FUTURE OF SOCIAL SCIENCE DATA Fig.1.Newtypesofresearchdataabouthumanbehaviorandsocietyposemanyopportunitiesifcrucialinfrastructuralchallengesaretackled.11FEBRUARY2011VOL331www.sciencemag.org on February 11, 2011www.sciencemag.orgDownloaded from newopportunities.Dataarenowarrivingfastenoughthattheworklifeofmanycurrentsocialscientistsisobservablychanging:Whereastheyoncesatintheirofficesworkingontheirown,ratesofco-authorshipareincreasingfast,andacollaborativelaboratory-typeworkmodelisemerg-inginmanysubfields.Thesetrendswouldbegreat-lyfacilitatedbyuniversitiesandfundingagenciesrecognizingtheneedtobuildtheinfrastructuretosupportsocialscienceresearch.Forthefirsttimeinmanyareasofthesocialsciences,newformsandquantitiesofinfor-mationmaywellmakedramaticprogresspossi-ble.Willwebeready?ReferencesandNotes1.H.Weintraubetal,1609(1995).2.D.E.Koshland,,1575(1995).3.G.King,K.Scholzman,N.Nie,Eds.,TheFutureofPoliticalScience:100Perspectives(Routledge,NewYork,2009),pp.914.D.Hopkins,G.King,Am.J.Pol.Sci.,2295.A.M.Blair,TooMuchtoKnow:ManagingScholarlyInformationbeforetheModernAge(YaleUniv.Press,NewHaven,2010).6.C.Mackie,N.Bradburn,Eds.,ImprovingAccesstoandConfidentialityofResearchData(NationalResearchCouncil,Washington,DC,2000),p.49.7.R.F.White,TheIndependentReview,5478.G.King,PSPol.Sci.Polit.,119(2006).9.TheDataverseNetwork,http://TheData.org.10.C.C.Aggarwal,P.S.Yu,Eds.,Privacy-PreservingDataMining:ModelsandAlgorithms(Springer,NewYork,2008).11.L.Sweeney,J.LawMed.Ethics,98(1997).12.G.King,Sociol.MethodsRes.,173(2007).13.M.Altman,G.King,,10.1045/march2007-altman(2007).14.G.King,PSPol.Sci.Polit.,494(1995).15.R.G.Anderson,W.H.Green,B.D.McCullough,H.D.Vinod,J.Econ.Methodol.,99(2008).16.DATA-Pass,www.icpsr.umich.edu/icpsrweb/DATAPASS/.17.V.Stodden,Int.J.Comm.LawPol.18.MythankstoM.AltmanandM.Crosasforhelpfulcommentsonanearlierversion.10.1126/science.1197872PERSPECTIVEJamesA.EvansandJacobG.FosterThegrowthofelectronicpublicationandinformaticsarchivesmakesitpossibletoharvestvastquantitiesofknowledgeaboutknowledge,orWereviewtheexpandingscopeofmetaknowledgeresearch,whichuncoversregularitiesinscientificclaimsandinfersthebeliefs,preferences,researchtools,andstrategiesbehindthoseregularities.Metaknowledgeresearchalsoinvestigatestheeffectofknowledgecontextoncontent.Teamsandcollaborationnetworks,institutionalprestige,andnewtechnologiesallshapethesubstanceanddirectionofresearch.Wearguethatasmetaknowledgegrowsinbreadthandquality,itwillenableresearcherstoreshapesciencetoidentifyareasinneedofreexamination,reweightformercertainties,andpointoutnewpathsthatcutacrossrevealedassumptions,heuristics,anddisciplinaryboundaries.hatknowledgeiscontainedinasci-entificarticle?Theresults,ofcourse;adescriptionofthemethods;andrefer-encesthatlocateitsfindingsinaspecificscientificdiscourse.Asanartifact,however,thearticlecon-tainsmuchmore.Figure1highlightsmanyofthelatentpiecesofdataweconsiderwhenwereadapaperinafamiliarfield,suchasthestatusandhistoryoftheauthorsandtheirinstitutions,thefocusandaudienceofthejournal,andidioms(intext,figures,andequations)thatindexabroadercontextofideas,scientists,anddisciplines.Thiscontextsuggestshowtoreadthepaperandassessitsimportance.Thescopeofsuchknowledgeaboutknowledge,orisillus-tratedbycomparingthesummaryinformationafirst-yeargraduatestudentmightgleanfromread-ingacollectionofscientificarticleswiththein-sightaccessibletoaleadingscientistinthefield.Nowconsidertheperspectivethatcouldbegainedbyacomputertrainedtoextractandsystematicallyanalyzeinformationacrossmillionsofscientificarticles(Fig.1).Metaknowledgeresultsfromthecriticalscru-tinyofwhatisknown,how,andbywhom.Itcannowbeobtainedonlargescales,enabledbyaconcurrentinformaticsrevolution.Overthepast20years,scientistsinfieldsasdiverseasmolec-ularbiologyandastrophysicshavedrawnonthepowerofinformationtechnologytoman-agethegrowingdelugeofpublishedfindings.Usinginformaticsarchivesspanningthescientif-icprocess,fromdataandpreprintstopublicationsandcitations,researcherscannowtrackknowl-edgeclaimsacrosstopics,tools,outcomes,andinstitutions().Suchinvestigationsyieldmeta-knowledgeabouttheexplicitcontentofscience,butalsoexposeimplicitcontent—beliefs,prefer-ences,andresearchstrategiesthatshapethedi-rection,pace,andsubstanceofscientificdiscovery.Metaknowledgeresearchfurtherexplorestheinter-actionofknowledgecontentwithknowledgecon-text,fromfeaturesofthescientificsystemsuchasmulti-institutionalcollaboration()toglobaltrendsandforcessuchasthegrowthoftheInternet(Thequantitativestudyofmetaknowledgebuildsonalargeandgrowingcorpusofqual-itativeinvestigationsintotheconductofsciencefromhistory,anthropology,sociology,philosophy,psychology,andinterdisciplinarystudiesofsci-ence.Suchinvestigationsrevealtheexistenceofmanyintriguingprocessesintheproductionofscientificknowledge.Here,wereviewquantitativeassessmentsofmetaknowledgethattracethedistributionofsuchprocessesatlargescales.Wearguethatthesedistributionalassessments,bycharacterizingtheinteractionandrelativeimpor-tanceofcompetingprocesses,willnotonlypro-videnewinsightintothenatureofsciencebutwillcreatenovelopportunitiestoimproveit.PatternsofScientificContentTheanalysisofexplicitknowledgecontenthasalonghistory.Contentanalysis,orassessmentofthefrequencyandco-appearanceofwords,phrases,andconceptsthroughoutatext,hasbeenpursuedsincethelate1600s,rangingfromeffortsin18th-centurySwedentoquantifythehereticalcontentofaMoravianhymnal()tomidstudiesofmassmediacontentintotalitarianre-gimes.Contemporaryapproachesfocusonthecomputationalidentificationofinacor-pusoftexts.Thesecanbetrackedovertime,asinarecentstudyofthenewscycle(Culturomicsprojectsnowfollowtopicsoverhundredsofyears,usingtextsdigitizedintheGoogleBooksproject).Topicscanalsobeusedtoidentifysimilaritiesbetweendocuments,asintopicmodeling,whichrepresentsdocumentsstatisticallyasunstructuredcollectionsoftopicsorphrases(WiththeriseoftheInternetandcomputingpower,statisticalmethodshavealsobecomecen-traltonaturallanguageprocessing(NLP),includ-inginformationextraction,informationretrieval,automaticsummarization,andmachinereading.AdvancesinNLPhavemadeitoneofthemostrapidlygrowingfieldsofartificialintelligence.Nowthatthevastmajorityofscientificpublica-tionsareproducedelectronically(),theyarenaturalobjectsfortopicmodeling()andNLP.Somerecentwork,forexample,usescomputa-tionalparsingtoextractrelationalclaimsaboutgenesandproteins,andthencomparestheseclaimsacrosshundredsofthousandsofpaperstoreconcilecontradictoryresults()andidentifylikelymissingelementsfrommolecularpath-ways().Insuchfieldsasbiomedicine,electron-icpublicationsarefurtherenrichedwithstructuredmetadata(e.g.,keywords)organizedintohierar-chicalontologiestoenhancesearch().Cita-tionshavelongbeenusedin”investigationstoexploredependenciesamong DepartmentofSociology,UniversityofChicago,Chicago,IL60637,USA.*Towhomcorrespondenceshouldbeaddressed.E-mail:jevans@uchicago.eduwww.sciencemag.orgVOL33111FEBRUARY2011