/
twowidelyusedagriculturalthesauri:TheUNFAOAGROVOCThesaurus(28,169conce twowidelyusedagriculturalthesauri:TheUNFAOAGROVOCThesaurus(28,169conce

twowidelyusedagriculturalthesauri:TheUNFAOAGROVOCThesaurus(28,169conce - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
490 views
Uploaded On 2015-12-07

twowidelyusedagriculturalthesauri:TheUNFAOAGROVOCThesaurus(28,169conce - PPT Presentation

4httpwwwificorghttpwwwcfsanfdagov152dmsfoodichtml5httpwwwgooglecom patternfreq part inftheg whole 488 part forfuseinjmakingg whole 129 part fortied whole 74 part from whole ID: 217226

4http://www.ific.org http://www.cfsan.fda.gov/˜dms/foodic.html5http://www.google.com patternfreq. part inftheg? whole 488 part forfuseinjmakingg? whole 129 part -fortied whole 74 part from whole

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "twowidelyusedagriculturalthesauri:TheUNF..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

twowidelyusedagriculturalthesauri:TheUNFAOAGROVOCThesaurus(28,169concepts,multilingual)andtheUSDANALAgriculturalThesaurus.(41,577concepts,monolingualenglish)Weconsidertheautomationasuccessifandonlyifthefollowingcriteriaaremet:1.Thekeyconceptsofeachimportantpathwaythroughwithacarcinogencanreachhumansshouldbefound.(i.e.,Recallshouldbeveryhigh.)2.Theresearchersshouldnotbedistractedbytoomanyredherrings.(i.e.,Precisionshouldbesufcient.)WemeasurePrecisionbycountinghowmanyofthereturnedpart-wholerelationsarevalid.WeestimateRecallbycountinghowmanyofagivensetofknownsourcesarefoundforfourcarcinogens(acrylamide,asbestos,benzeneanddioxins).Amoredetaileddescriptionofpartoftheexperimentdescribedinthispapercanbefoundin[6].Thedatasetsandtheresultsoftheexperimentcanbefoundontheweb.[7]2LearningPart-WholePatternsTolearnpart-wholepatternsweuseatrainingsetofcorrectpart-wholerelationsthatconsistsof503part-wholepairs,derivedfromalistofvariouskindsoffoodaddi-tivesandfoodproducttypestheycanoccurincreatedbytheInternationalFoodIn-formationCouncil(IFIC)andtheFDA.4Thelistcontains58additives(parts)and113foodproducts(wholes),groupedtogetherin18classesofadditivessuchassweeten-ersandpreservatives.Itisnotspeciedwhichadditivesoccurinwhichfoodprod-ucts.Todiscoverthis,wetookthecartesianproductoftheadditivesandthefoodproductsandlteredoutthepairsthatyieldednohitsonGoogle5whenputto-getherinawild-cardquery.Forexample,thepairhtable-topsugar;aspartameiisl-teredout,becausethequery"table-topsugar*aspartame"or"aspartame*table-topsugar"yieldsnohits.Forall503part-wholepairsthatdidyieldresultswecollectedtherst1000snip-pets(orasmanysnippetsaswereavailable).Welookedupallconsistentphrasesfromthesesnippetsthatconnectedthepartandwholefromthequery.Inthesephraseswesubstitutedallpartsandwholesbythevariables“partandwhole”.Thisyielded4502uniquepatterns,whichwesortedbyfrequencyofoccurrence.Duetothefactthatthereweremanylistsofsubstancesinourdatatherewerealsomanypatternsthatdidnotdescribeapart-wholerelation,butthatweremerelypartofalistofsubstancescontainingthepartandthewhole.Thesepatternscanbeeasilyrecognized,becausetheycontainnamesofsubstances.Forexample,forthepairhcheese;enzymesithefollowingsnippetwasreturned:“cheese(pasteurizedmilk,cheesecultures,salt,enzymes)”.Anexampleofagoodsnippetis:“Allcheesecontainsenzymes.”.Toexcludelistsweremovedallpatternsthatcontain,apartfromthepartandwhole,labelsofconceptsinthetwoagriculturalthesauri.Thislteredout1491patterns,ofwhichonly12werecorrectpart-wholepatterns. 4http://www.ific.org,http://www.cfsan.fda.gov/˜dms/foodic.html5http://www.google.com patternfreq. part inftheg? whole 488 part forfuseinjmakingg? whole 129 part -fortied whole 74 part from whole 60 part -enriched whole 49 part foundinfmanyg? whole 35 part usedinfmakingg? whole 34 part -based whole 32 part into whole 25 part addedto whole 23 pattern#wholesfoundPrec. part inftheg? whole 26799.84 whole with part 8787.68 part from whole 4249.96 part forfuseinjmakingg? whole 5917.68 part contentfinjintheg? whole 5794.60 whole contain part 3949.88 whole containingfhighg? part 29341 part based whole 4415.64 whole using part 3558.72 part levelsin whole 2591.92 Table1.(left)Thetop-10mostfrequentcorrectpatternsextractedfromGooglesnippets.(right)Thetop-10mostproductivepatterns.(Precisionfreq.)4AnalysisIntheintroductionwestatedtwocriteriathathavetobemetfortheapplicationofourpart-wholelearningmethodtobeasuccess.Precisionhastobesufcient,andRecallhastobeveryhigh.InSecs.2and3weanalyzedtheresultsintermsoffrequencyandPrecision.InthissectionwewillassessRecall.SinceeventheknowledgeofexpertsofwhetherornotasubstanceiscontainedinsomewholeisfarfromcompletewecannotcreateacompletegoldstandardtomeasureRecall.Itissimplyinfeasible.Wecan,however,approximateRecallbycomputingitonsamples.Wesetupfourtestcasescenteredtowardsdiscoveringpossiblecausesofexpo-suretoaspeciccarcinogenicagent.Theagentswechoseareacrylamide,asbestos,benzene,anddioxins.Foreachcasewedecidedon15importantconceptsthatcontainthecarcinogenanddeneapossibleexposureroute.TheselectionofthewholeswasbasedonreportsfromtheUnitedStatesEnvironmentalProtectionAgency(EPA)andtheNetherlandsOrganizationforAppliedScienticResearch(TNO)QualityofLife.RecallandthesetsofwholesareshowninTable2,alongwiththerankatwhichthewholeoccursinthelistofdiscoveredwholes.5DiscussionWeshowedthatlearningpart-wholerelationsinthehealthandsafetydomainisfeasible.OurmethodachievesanaveragePrecisionaround.7andanaverageRecallaround.8.Fortheusecasedescribedinthispaperthissatisesthecriteriaforsuccess.However,ourexperimentalsetupassumesthatallinterestinginformationpertainingtosomecarcinogenicsubstancecanbeobtainedinonesingleretrievalstep.Thecon-structionofcomplexpathsfromthesubstancetotheeventualexposurehastohappeninthemindoftheuser—anddependssolelyonhisexpertiseandingenuity.Thisisaseverelimitationthatleavesroomforconsiderableimprovement.Arelativelystraight-forwardextensionwouldbetoiteratetheretrievalstepusingsuitablewholesfoundinretrievalstepn�1inthepartslotinretrievalstepn.Weassumethatextrameasureswillberequiredtolimittheinevitablelossofprecisioncausedbytheiteration. 3.OrenEtzioni,MichaelCafarella,DougDowney,Ana-MariaPopescuTalShaked,StephenSoderland,DanielS.Weld,andAlexanderYates.Methodsfordomain-independentinforma-tionextractionfromtheweb:Anexperimentalcomparison.InProc.oftheAAAIConference,2004.4.MichalFinkelstein-LandauandEmmanuelMorin.Extractingsemanticrelationshipsbetweenterms:Supervisedvs.unsupervisedmethods.InInternationalWorkshoponOntologicalEn-gineeringontheGlobalInformationInfrastructure,pages71–80,1999.5.RoxanaGirju,AdrianaBadulescu,andDanMoldovan.Learningsemanticconstraintsfortheautomaticdiscoveryofpart-wholerelations.InProc.oftheHLT-NAACL,2003.6.WillemRobertvanHage,HapKolb,andGuusSchreiber.Amethodforlearningpart-wholerelations.InInternationalSemanticWebConference,2006.7.http://www.few.vu.nl/˜wrvhage/carcinogens.

Related Contents


Next Show more