4httpwwwificorghttpwwwcfsanfdagov152dmsfoodichtml5httpwwwgooglecom patternfreq part inftheg whole 488 part forfuseinjmakingg whole 129 part fortied whole 74 part from whole ID: 217226
Download Pdf The PPT/PDF document "twowidelyusedagriculturalthesauri:TheUNF..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
twowidelyusedagriculturalthesauri:TheUNFAOAGROVOCThesaurus(28,169concepts,multilingual)andtheUSDANALAgriculturalThesaurus.(41,577concepts,monolingualenglish)Weconsidertheautomationasuccessifandonlyifthefollowingcriteriaaremet:1.Thekeyconceptsofeachimportantpathwaythroughwithacarcinogencanreachhumansshouldbefound.(i.e.,Recallshouldbeveryhigh.)2.Theresearchersshouldnotbedistractedbytoomanyredherrings.(i.e.,Precisionshouldbesufcient.)WemeasurePrecisionbycountinghowmanyofthereturnedpart-wholerelationsarevalid.WeestimateRecallbycountinghowmanyofagivensetofknownsourcesarefoundforfourcarcinogens(acrylamide,asbestos,benzeneanddioxins).Amoredetaileddescriptionofpartoftheexperimentdescribedinthispapercanbefoundin[6].Thedatasetsandtheresultsoftheexperimentcanbefoundontheweb.[7]2LearningPart-WholePatternsTolearnpart-wholepatternsweuseatrainingsetofcorrectpart-wholerelationsthatconsistsof503part-wholepairs,derivedfromalistofvariouskindsoffoodaddi-tivesandfoodproducttypestheycanoccurincreatedbytheInternationalFoodIn-formationCouncil(IFIC)andtheFDA.4Thelistcontains58additives(parts)and113foodproducts(wholes),groupedtogetherin18classesofadditivessuchassweeten-ersandpreservatives.Itisnotspeciedwhichadditivesoccurinwhichfoodprod-ucts.Todiscoverthis,wetookthecartesianproductoftheadditivesandthefoodproductsandlteredoutthepairsthatyieldednohitsonGoogle5whenputto-getherinawild-cardquery.Forexample,thepairhtable-topsugar;aspartameiisl-teredout,becausethequery"table-topsugar*aspartame"or"aspartame*table-topsugar"yieldsnohits.Forall503part-wholepairsthatdidyieldresultswecollectedtherst1000snip-pets(orasmanysnippetsaswereavailable).Welookedupallconsistentphrasesfromthesesnippetsthatconnectedthepartandwholefromthequery.Inthesephraseswesubstitutedallpartsandwholesbythevariablespartandwhole.Thisyielded4502uniquepatterns,whichwesortedbyfrequencyofoccurrence.Duetothefactthatthereweremanylistsofsubstancesinourdatatherewerealsomanypatternsthatdidnotdescribeapart-wholerelation,butthatweremerelypartofalistofsubstancescontainingthepartandthewhole.Thesepatternscanbeeasilyrecognized,becausetheycontainnamesofsubstances.Forexample,forthepairhcheese;enzymesithefollowingsnippetwasreturned:cheese(pasteurizedmilk,cheesecultures,salt,enzymes).Anexampleofagoodsnippetis:Allcheesecontainsenzymes..Toexcludelistsweremovedallpatternsthatcontain,apartfromthepartandwhole,labelsofconceptsinthetwoagriculturalthesauri.Thislteredout1491patterns,ofwhichonly12werecorrectpart-wholepatterns. 4http://www.ific.org,http://www.cfsan.fda.gov/dms/foodic.html5http://www.google.com patternfreq. part inftheg? whole 488 part forfuseinjmakingg? whole 129 part -fortied whole 74 part from whole 60 part -enriched whole 49 part foundinfmanyg? whole 35 part usedinfmakingg? whole 34 part -based whole 32 part into whole 25 part addedto whole 23 pattern#wholesfoundPrec. part inftheg? whole 26799.84 whole with part 8787.68 part from whole 4249.96 part forfuseinjmakingg? whole 5917.68 part contentfinjintheg? whole 5794.60 whole contain part 3949.88 whole containingfhighg? part 29341 part based whole 4415.64 whole using part 3558.72 part levelsin whole 2591.92 Table1.(left)Thetop-10mostfrequentcorrectpatternsextractedfromGooglesnippets.(right)Thetop-10mostproductivepatterns.(Precisionfreq.)4AnalysisIntheintroductionwestatedtwocriteriathathavetobemetfortheapplicationofourpart-wholelearningmethodtobeasuccess.Precisionhastobesufcient,andRecallhastobeveryhigh.InSecs.2and3weanalyzedtheresultsintermsoffrequencyandPrecision.InthissectionwewillassessRecall.SinceeventheknowledgeofexpertsofwhetherornotasubstanceiscontainedinsomewholeisfarfromcompletewecannotcreateacompletegoldstandardtomeasureRecall.Itissimplyinfeasible.Wecan,however,approximateRecallbycomputingitonsamples.Wesetupfourtestcasescenteredtowardsdiscoveringpossiblecausesofexpo-suretoaspeciccarcinogenicagent.Theagentswechoseareacrylamide,asbestos,benzene,anddioxins.Foreachcasewedecidedon15importantconceptsthatcontainthecarcinogenanddeneapossibleexposureroute.TheselectionofthewholeswasbasedonreportsfromtheUnitedStatesEnvironmentalProtectionAgency(EPA)andtheNetherlandsOrganizationforAppliedScienticResearch(TNO)QualityofLife.RecallandthesetsofwholesareshowninTable2,alongwiththerankatwhichthewholeoccursinthelistofdiscoveredwholes.5DiscussionWeshowedthatlearningpart-wholerelationsinthehealthandsafetydomainisfeasible.OurmethodachievesanaveragePrecisionaround.7andanaverageRecallaround.8.Fortheusecasedescribedinthispaperthissatisesthecriteriaforsuccess.However,ourexperimentalsetupassumesthatallinterestinginformationpertainingtosomecarcinogenicsubstancecanbeobtainedinonesingleretrievalstep.Thecon-structionofcomplexpathsfromthesubstancetotheeventualexposurehastohappeninthemindoftheuseranddependssolelyonhisexpertiseandingenuity.Thisisaseverelimitationthatleavesroomforconsiderableimprovement.Arelativelystraight-forwardextensionwouldbetoiteratetheretrievalstepusingsuitablewholesfoundinretrievalstepn1inthepartslotinretrievalstepn.Weassumethatextrameasureswillberequiredtolimittheinevitablelossofprecisioncausedbytheiteration. 3.OrenEtzioni,MichaelCafarella,DougDowney,Ana-MariaPopescuTalShaked,StephenSoderland,DanielS.Weld,andAlexanderYates.Methodsfordomain-independentinforma-tionextractionfromtheweb:Anexperimentalcomparison.InProc.oftheAAAIConference,2004.4.MichalFinkelstein-LandauandEmmanuelMorin.Extractingsemanticrelationshipsbetweenterms:Supervisedvs.unsupervisedmethods.InInternationalWorkshoponOntologicalEn-gineeringontheGlobalInformationInfrastructure,pages7180,1999.5.RoxanaGirju,AdrianaBadulescu,andDanMoldovan.Learningsemanticconstraintsfortheautomaticdiscoveryofpart-wholerelations.InProc.oftheHLT-NAACL,2003.6.WillemRobertvanHage,HapKolb,andGuusSchreiber.Amethodforlearningpart-wholerelations.InInternationalSemanticWebConference,2006.7.http://www.few.vu.nl/wrvhage/carcinogens.