/
Figure1:OpenDiscovery:TheGeneralModelusingalgorithmsforMEDLINEwhichwed Figure1:OpenDiscovery:TheGeneralModelusingalgorithmsforMEDLINEwhichwed

Figure1:OpenDiscovery:TheGeneralModelusingalgorithmsforMEDLINEwhichwed - PDF document

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
373 views
Uploaded On 2017-03-06

Figure1:OpenDiscovery:TheGeneralModelusingalgorithmsforMEDLINEwhichwed - PPT Presentation

v21v22v2r1whereviTFilogNTFiandtherearertermsintheproleAlgorithmFigureoutlinesouropendiscoveryalgorithmwhichfollowstheframeworkshowninFigureWebeginbybuildingtheAtopicprolerestri ID: 523065

v21+v22+:::+v2r;(1)wherevi=TFilog(N=TFi)andtherearertermsintheprole.Algorithm:Figure??outlinesouropendiscoveryalgorithmwhichfollowstheframeworkshowninFigure??.WebeginbybuildingtheAtopicprolere-stri

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Figure1:OpenDiscovery:TheGeneralModelusi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Figure1:OpenDiscovery:TheGeneralModelusingalgorithmsforMEDLINEwhichwedeveloped,wereplicatedtheeightopenandcloseddiscoveriesmadebySwansonandSmalheiser.Incomparisonwithotherrepli-cationstudiesthesealgorithmswerethemosteffective(Srinivasan,2004).Theyalsorequiretheleastamountofmanualinputandanalyses.Forexample,inopendiscov-ery,ourmethodsexpecttheusertospecifyonlythetypeofBtermsofinterest.Followingthisouralgorithmse-lectsBtermsautomatically.IncontrasttheothermethodsrelymoreonuserinputforselectingBterms.Ourcur-rentresearchdemonstratesthatouropendiscoveryalgo-rithmcanbeusedtogeneratenewhypothesesfordiseasetreatmentthatcouldbetested.Inparticular,weapplyouropendiscoveryproceduretoexplorethetherapeuticpotentialofcurcumin/turmeric(CurcuminLonga)adi-etarysubstancecommonlyusedinAsia.Weshowthatourautomaticdiscoveryalgorithmidentiesretinaldis-easesasthenovelcontextforresearchoncurcumin.Wereviewgeneticandbiochemicalevidencetoindicatethatcurcuminmaybebenecialfortreatingretinaldiseases.Werstdescribeouropendiscoveryalgorithm.Nextweshowitsapplicationwithcurcuminasthestartingpoint(topicA).Wethenpresentananalysisofthecur-cumin-retinaldiseasesconnection.Thenextsectionisonrelatedresearch.Thenalsectionpresentsourcon-clusionsandplansforthenextphaseofthisresearch.2OpenDiscoveryOuropendiscoveryapproachisfoundedonthenotionoftopicproles.AtopicisanysubjectofinterestsuchastreatmentofhypertensionorATMgene.Aproleises-sentiallyarepresentationofatopicthatisderivedfromthetextcollectionbeingmined.ForMEDLINEourtopicprolesarevectorsofweightedMedicalSubjectHead-ings(MeSH).Thesetermsbelongtoacontrolledvocab-ularyandaremanuallyassignedtoeachMEDLINEdoc-umentbytrainedindexers.Givenatopicofinterest,ouralgorithmrstretrievesrelevantMEDLINEdocuments.MeSHtermsarethenextractedfromthesedocumentsandtheirweightsarecalculated.Theseweightedtermsformtheprolevectorforthetopic.Wediscussthemethodforcalculatingweightsshortly.WealsoexploitthefactthatMeSHtermshavebeenclassiedusing134UMLS(UniedMedicalLanguageSystem)1semantictypesasforexampleCellFunction,SignorSymptom.EachMeSHtermisassignedoneormoresemantictypes.Forexample,interferontypeIIfallswithinbothImmunologicFactorandPharmaco-logicSubstancesemantictypes.Moregenerally,seman-tictypesrepresent`categories'thathavebeenusedtoclassifytheMeSHmetadata.Semantictypesareusefulbecausedependingonthenatureofthediscoverygoalswemayadoptaparticularview,i.e.,wemayrestrictthediscoveryprocesstoconsideronlyMeSHtermsthatbe-longtocertainsemantictypes.InthesecasesthetopicprolesarerestrictedtoMeSHtermsbelongingtoseman-tictypesspeciedbytheview.WecalculatetermweightsfortheMeSHterms.TermweightsareaslightmodicationofthecommonlyusedTF*IDFscores.SinceaMeSHtermtypicallyoccursonceinaMEDLINErecord,hereTFi(termfrequency)equalsthenumberofdocumentsinwhichtheMeSHtermtioc-curswithintheretrieveddocumentset.IDFi(inversedocumentfrequency)islog(N=TFi).Nisthenumberofdocumentsretrievedforthetopic.Weightsarenormal-izedasshownbelowfortermti.ThisvectorofweightedMeSHtermsformsthetopicprole.weight(ti)=vi=q v21+v22+:::+v2r;(1)wherevi=TFilog(N=TFi)andtherearertermsintheprole.Algorithm:Figure??outlinesouropendiscoveryalgorithmwhichfollowstheframeworkshowninFigure??.WebeginbybuildingtheAtopicprolere-strictedtoST-Bsemantictypes.NotethatallMEDLINEsearchesareconductedautomaticallyviathePubMedinterface2.WethenautomaticallyselectMMeSHtermsforeachST-BsemantictypefromthisAproleandcallthesetheBterms.NextprolesarebuiltforeachoftheseBtermslimitedtoanotherselectedsetofsemantictypesST-C.TheseBprolesareanalysedincombinationtoselectaninitialpoolofcandidateCterms.ThesecandidatetermsarethencheckedfornoveltyinthecontextofthestartingAtopic.Whenthealgorithmterminatestheuserisprovidedanallistofranked,novelCterms.Thehighertherankthegreater 1http://umlsks.nlm.nih.gov2http://www.nlm.nih.gov theestimatedcondenceinthepotentialconnectionwiththeAtopic.Atthispointtherestoftheprocessdependsalmosten-tirelyontheuser.(Thisisalsothecaseinotherimple-mentationsoftheopendiscoveryprocess(eg.Lindsay&Gordon1999;Weeberetal.,2001)).ItisuptotheusertoselectA-Cpairsofinterestandexploretheliteratureforsupportingevidence.TheroleofST-BandST-Cinthealgorithmistoapplyreasonableconstraintstotheproblemandshapethepathofthediscoveryprocess.Similarly,parameterMmaybeusedtofocusthediscoveryprocess.ThehigherthisnumberthebiggerthescopethroughwhichonelooksfornovelCtopics.Obviouslyittakesexperiencetocomeupwithreasonablevaluesfortheseparameters.ButwealreadyseesomepatternsemergeintheMEDLINEminingliterature.Forexamplewhenlookingforsub-stanceslikelytoinuenceadiseaseseveralresearchershaveusedfunctionalsemantictypessuchasCellFunc-tionandMolecularDysfunctionforselectingintermedi-atepathways(eg.Weeberetal.,2001).Experimentsvaryingthesesemantictypeshavebeendescribedinourpreviouswork(Srinivasan,2004).Uniqueaspectsofouralgorithmincomparisontoopendiscoverymethodsex-ploredbyothers,includeforexample,thefactthatourweightingschemeidentiesinterestingandrelevantBtermsathighranks.Also,Ctermsareassessedbycom-biningtheevidenceontheirconnectiontothedifferentintermediateBterms.3OpenDiscoverywithTurmericOurinterestincurcuminwassparkedbythefactthatthisspiceiswidelyusedinAsiaandishighlyregardedforitscurativeandanalgesicproperties.Theseincludethetreat-mentofburns,stomachulcersandailments,andforvar-iousskindiseases.Curcuminisalsousedasanantisep-tic,inalleviatingsymptomsofthecommoncoldaswellasadepilatory.AnumberofMEDLINErecordshavereportedontheanti-cancerandanti-inammatoryprop-ertiesofcurcumin(12680238,12678737,126760443).Ouropendiscoverygoalisaimedatdeterminingwhethertherearenoveldiseasecontextsinwhichcurcumincouldprovebenecial,andtoproposeevidence-basedhypothe-sesthatcanbeexperimentallyveried.Weexecutedouropendiscoveryalgorithmwithcur-cuminasthestartingtopic(A).ThespecicPubMedsearchconductedwasturmericORcurcuminORcur-cuma(doneonNovember15,2003).Atotalof1,175PubMeddocumentswereretrieved.AsFigure??showsthemajorityofthesepublications(1,043,89%)arerela- 3NumberswithinparanthesessuchastheserefertoPubMedrecordids.ThereadermayenterthesedirectlyintothePubMedinterfacetoretrievethecorrespondingrecords. Inputfromuser:(1)anAtopicofinterest,(2)asetofUMLSsemantictypes(ST-B)forselectingBtermsandaset(ST-C)forselectingCterms.Parameter:MStep1:ConductanappropriatePubMedsearchfortopicA,andbuilditsMeSHpro-lelimitedtothesemantictypesinST-B.CallthisproleAP.Step2:ForeachsemantictypeinST-B,se-lecttheMtoprankingMeSHtermsfromAP.Removeduplicatetermsifany.ThesearedesignatedtheBterms(B1,B2,B3,etc.).Step3:ConductanindependentPubMedsearchforeachBtermandbuilditspro-lelimitedtothesemantictypesST-C.CalltheseprolesBP1,BP2,BP3,etc.Step4:ComputeanalcombinedprolewherethecombinedweightofaMeSHtermisthesumofitsweightsinBP1,BP2,BP3,etc.CallthisinitialproleCP.Step5:ForeachtermtinCPifaMEDLINEsearchontopicAANDtreturnsnonzeroresults,eliminatetfromCP.Output:ForeachsemantictypeinST-C,outputtheMeSHtermsinCPrankedbycombinedweight.ThesearetheCtermsorganizedbysemantictypeandrankedbyestimatedpotential. Figure2:OpenDiscoveryAlgorithm:OutlineofSteps. Diabeticretinopathyisaleadingcauseofblindness.Anearlysignofthediseaseistheadhesionofleukocytestothevesselsoftheretina,endothelialcellinjury,andthebreakdownoftheblood-retinabarrier(12000720).Evenacuteintensiveinsulintherapyconstitutesanadditionalriskfactorfordiabeticretinopathy,duetoinsulin-inducedhypoxiaandanassociatedaccelerationintheblood-retinabarrierbreakdown(11901189).Glaucomaisthesecondmostcommoncauseofblindnessintheworld(8695555)andiscausedbymutationsinanumberofgenesonchro-mosomes1and10aswellasinotherlocionchromo-somes2,3,8,and7.Whileseveraldiseaseshaveoneorafewgeneticlocithatcontroldiseaseprogressionandfamilialtransmission,itisoftenthecasethatavarietyofgenesmaybeinvolvedintheirpathophysiology.Follow-ingisabriefsurveyofsomeofthegenesthatmaybeinvolvedintheprocessoftissueinjuryorinammationandregulationofcelldivision.Controloftheimmuneprocessandoftheinammatoryresponseisimportantincombatinginfectionandautoimmunediseases.Regula-tionofcelldivision,particularlyprogrammedcelldeath,iscriticalindiversediseasessuchascancerandtissuere-generation,e.g.retinalinjuryanddiseases.Regulationoftheactivityofsuchgenescouldprovidestrategiesfortherapeuticinterventionusingcurcumin.Indiabetesandduringinammation,periodsofhy-poxia,i.e.lowoxygenconcentration,occurinvarioustissuesandorgans.Atsuchtimesanearlycellularre-sponseresultsintheelevatedexpressionofinterleukin-1beta(IL-1beta)andcyclooxygenase2(COX-2)genes(11527948,14507857,11821258)whichinturnstim-ulatenewbloodvesselgrowthleadingtoretinopathy(12821538,12601017).Similarly,theexpressionofCOX-2wasassociatedwiththedevelopmentofglau-coma(9441697).TreatmentwithCOX-2inhibitorssup-pressedblood-retinalbarrierbreakdownandhadanan-tiangiogeniceffect,i.e.theypreventedthegrowthofnewbloodvesselsandthushadaprotectiveeffectontheretina(12821538,11980873).Anothergene,tumornecrosisfactoralpha(TNF-alpha),waselevatedduringtheearlystagesofdiabeticretinopathyandinammation(11821258,12706995,11161842).Anti-TNF-alphatreatmentreducedleukocyteadhesiontobloodvesselsoftheeyeandvascularleakage(12714660)indicatingapotentialtherapeuticeffectforsuchatreatmenttoreduceocularinammation.Activa-tionofTNF-alphaandothergenesmayalsoleadtothepathophysiologyofglaucoma(10975909,10815159).Thefamilyofmitogen-activatedproteinkinases(MAPK)isanothergroupofgenesthathasanimportantroleinretinaldisease.Theseincludeextracellularsignal-regulatedkinases(ERK),c-Junamino(N)-terminalki-nase(JNK),andp38.Oneofthese,ERK,wasinducedinglaucoma(12824248).Ofteninammatoryresponsesincludetheinductionofapoptosis,orprogrammedcelldeath.TheinvolvementofJNKininducingapop-tosiswasdemonstratedinprostatecancer(12859962,12663665)andretinalcells(12270637).ThereisalsoalinktoTNF-alpha(discussedabove)whichwasshowntoactivatephosphorylationofERKs,p38,andJNKMAPKinhumanchondrocytes(12878172).IL-1betaactivation,inducedbythepresenceofreti-nalholes,akeyfeatureofdiabeticretinopathy,isalsoreportedtoresultintheactivationofanumberoftheMAPKgenesERK,JNK,andp38(12824248).Theseconditionsinturnexacerbatethediseaseprocessinthattheyresultinproliferativeandmigratorycellsaccumu-latinginthewoundedretina(12500176).InhibitorsofMAPKandphosphatidylinositol3-kinase(PI3)inhibitedretinalpigmentepithelialcellproliferation(12782163).Thebreakdownintheblood-retinabarrierisalsosup-pressedbyinhibitorsofp38MAPKandPI3(11901189).ChangesinthelevelsofthegeneNF-kappaBisanearlycellularresponsetoinammation.ActivationofTNF-alpha(discussedabove)isfollowedbyincreasedtranscriptionofNF-kappaBwhichinturnstimulatesERK,p38,andJNKMAPK(12878172).Alsoactiva-tionofNF-kappaBsubsequentlystimulatedCOX-2andmatrixmetalloproteinase-9expression(12807725).Curcuminwasshowntobeeffectiveininhibitingcellproliferationoftumorigenicandnon-tumorigenicbreastcancercells(12527329)andothertumorcells(12680238).AsdescribedpreviouslythegeneCOX-2isinvolvedinearlyinammatorydiabeticretinopathy(11821258).CurcuminwasabletosuppressCOX-2inadose-relatedmanner(12844482)andneutralizedtheeffectofIL-1beta,possiblythroughitseffectonp38andCOX-2andJNK(12957788).CurcuminisalsoaknowninhibitorofJNK(12957788,12854631,12582006,12130649,12105223,9674701)andasuppresserofNF-kappaBactivation(11753638,11506818,12878172,12825130).Forexample,itsuppressedtheinductionofNF-kappaBanditsdependentgenesbycigarettesmoke(12807725),inalcoholicliverdisease(12388178)andinculturedendothelialcells(12368225).Havingshownthatthesegenes,inparticular,IL-1beta,COX-2,TNF-alpha,JNK,ERK,NF-kappaB,etc.,areinvolvedinretinopathyandinregulatingcellprolifer-ationandleukocyteattachmentandthebreakdownoftheblood-retinabarrier,andhavingestablishedthatcur-cuminiscapableofinhibitingtheactivityofthesegeneswehypothesizethatcurcuminmayhavetherapeuticvalueinpreventingoramelioratinganumberofretinalpatholo-gies.Ourapproachhasfocusedonspecicgenes,inpartic-ulartoprovidecluesregardingtherelevantbiochemicalpathways.Insomecasestheevidenceisgatheredinthecontextofotherdiseasessuchasalcoholicliverdisease knowledgedomainofproteinfamilies.Bioinformatics,14(7):600-607.ChaussabelD.&SherA.(2002).Miningmicroarrayex-pressiondatabyliteratureproling.GenomeBiology,3(10):research0055.1-0055.16.GordonM.D&LindsayR.K.1996.Towarddiscoverysupportsystems:Areplication,re-examination,andextensionofSwanson'sworkonliterature-baseddis-coveryofaconnectionbetweenRaynaud'sandshoil.JournaloftheAmericanSocietyforInformationSci-ence,47:116-128.HearstM.Untanglingtextdatamining.(1999).In:Pro-ceedingsofACL,AnnualMeetingoftheAssociationforComputationalLinguistics(invitedtalk),Univer-sityofMaryland,Maryland,June20-26,1999.Jenssen,T-K.,Laegreid,A.,Komorowski,J.,&Hovig,E.2001.Aliteraturenetworkofhumangenesforhigh-throughputanalysisofgeneexpression.NatureGenet-ics,28:21-28.Lindsay,R.K,&Gordon,M.D.(1999).Literature-baseddiscoverybylexicalstatistics.JournaloftheAmericanSocietyforInformationScience,50(7):574-587.Masys,D.R.,Welsh,J.B,,Fink,J.L.,Gribskov,M.,Kla-cansky,I.,&Corbeil,J.2001.Useofkeywordhier-archiestointerpretgeneexpressionpatterns.Bioinfor-matics,17(4):319-326.Shatkay,H.,Edwards,S.,Wilbur,W.J.,&Boguski,M.2000.Genes,ThemesandMicroarrays.Usinginfor-mationretrievalforlarge-scalegeneanalysis.In:Pro-ceedingsofIntelligentSystemsforMolecularBiology,LaJolla,California,317-328.Smalheiser,N.R.,&Swanson,D.R.1996a.IndomethacinandAlzheimer'sdisease.Neurology,46:583.Smalheiser,N.R.,&Swanson,D.R.1996b.Linkinges-trogentoAlzheimer'sdisease:Aninformaticsap-proach.Neurology,47,809-810.Smalheiser,N.R,&Swanson,D.R.1998.Calcium-independentphospholipaseA2andSchizophrenia.ArchivesofGeneralPsychiatry.55(8),752-753.Srinivasan,P.Toappear2004.TextMining:GeneratingHypothesesfromMEDLINE.JournaloftheAmericanSocietyforInformationScience.Srinivasan,P.,&Wedemeyer,M.(2003).MiningCon-ceptProleswiththeVectorModelorWhereonEarthareDiseasesbeingStudied?In:ProceedingsofTextMiningWorkshop.ThirdSIAMInternationalConfer-enceonDataMining.SanFrancisco,CA.Swanson,DR.1986.Fishoil,Raynaud'ssyndrome,andundiscoveredpublicknowledge.PerspectivesinBiol-ogyandMedicine,30:7-18.Swanson,D.R.1988.MigraineandMagnesium:Elevenneglectedconnections.PerspectivesinBiologyandMedicine,31:526-557.Swanson,D.R.(1990).SomatomedinCandArginine:Implicitconnectionsbetweenmutuallyisolatedlit-eratures.PerspectivesinBiologyandMedicine,33(2):157-179.Swanson,D.R.,Smalheiser,N.R.,&Bookstein,A.(2001).Informationdiscoveryfromcomplementaryliteratures:categorizingvirusesaspotentialweapons.JournaloftheAmericanSocietyforInformationSci-ence,52(10):797-812.Weeber,M.,Klein,H.,Aronson,A.R.,Mork,J.G.,Jong-vandenBerg,L.,&Vos,R.(2000).Text-baseddis-coveryinbiomedicine:thearchitectureoftheDAD-system.In:ProceedingsofAMIA,theAnnualCon-ferenceoftheAmericanMedicalInformaticsAssocia-tion,November4-8,2000,903-907.Weeber,M.,Klein,H.,Berg,L.,&Vos,R.2001.Usingconceptsinliterature-baseddiscovery:Sim-ulatingSwanson'sRaynaud-FishOilandMigraine-Magnesiumdiscoveries.JournaloftheAmericanSo-cietyforInformationScience,52(7):548-557.Weeber,M.,Vos,R.,Klein,H.,deJong-VandenBerg,L.T.W.,Aronson,A&Molema,G.2003.Generatinghypothesesbydiscoveringimplicitassociationsintheliterature:AcasereportfornewpotentialtherapeuticusesforThalidomide.JournaloftheAmericanMedi-calInformaticsAssociation,10(3):252-259.