4084playsSportxyplaysForxzteamPlaysSportzy070playsSportxbaseballplaysForxyankees082teamPlaysSportxyplaysForxzplaysSportzy073teamPlaysSportxbaseball ID: 142685
Download Pdf The PPT/PDF document "21.Macro-readinginsteadofmicro-reading.W..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
21.Macro-readinginsteadofmicro-reading.Weusetheterm\micro-reading"torefertothetraditionalNLPtaskwhereasingletextdocumentisinput,andthedesiredoutputisthefullinformationcontentofthatdocument.Incontrast,wedene\macro-reading"asataskwheretheinputisalargetextcollection(e.g.,theweb),andthedesiredoutputisalargecollectionoffactsexpressedbythetextcollection,withoutrequiringthateveryfactbeextracted.Wearguethatmacro-readingismucheasierthanmicro-reading,fortworea-sons.First,macro-readingdoesnotrequireextractingeverybitofinformationcontainedinthetextcollection.Second,intextcorporaaslargeastheweb,manyimportantfactswillbestatedredundantly,thousandsoftimes,usingdif-ferentwordings.Amacro-readercanbenetfromthisredundancybyfocusingonanalyzingonlythesimplewordingsofthefact,ignoringhopelesslycomplexsentences,andbystatisticallycombiningevidencefrommanytextfragmentsinordertodeterminehowstronglytobelieveaparticularcandidatehypothesis.2.Ontology-drivenreading.Muchofthedicultyintrulyunderstandingfree-formtextfollowsfromthefactthatitcansayanything.Incontrast,weformulateourmacro-readingproblemasataskofpopulatinganontologythatisgivenasinput,andthatdenesthecategories(e.g.,sport,person,team)andrelations(e.g.,plays-sport,plays-on-team)ofinterest.Thisisanaturalwaytoframeaproblemofpopulatingsomeportionofthesemanticwebforwhichanontologyisavailable.Italsomakesourmacro-readingproblemeasierintwoways.First,thesystemcanfocusonlyonasubsetoftextthatison-topicrelativetotheontology.Second,theontologyitselfcandenemeta-propertiesofitscategoriesandrelationsthatmakeextractioneasierandmoreaccurate(e.g.,itcanstatethattherelation'plays-on-team'relatesargumentsoftype`person'and`team').3.Machinelearningmethodswhoseaccuracyimproveswithon-tologycomplexity.Athirddesignchoiceistousesemi-supervisedmachinelearningmethodsthatautomaticallydiscoverpatternsoftextandhypertextthatsupportreliablefactextraction.Ourmachinelearningapproachacquiresextrac-tionpatterns(e.g.,\mayorofX"oftenimpliesXisacity)foreachpredicate(categoryorrelation)intheinputontology.Webuildonearliersemi-supervisedbootstraplearningmethods[5{8]thatlearnfromjustahandfuloflabeledtrain-ingexamples,plusalargecorpusofunlabeledtext.Whiletheseearliermethodsshowedthefeasibilityofsemi-supervisedlearningofextractionpatterns,theywerelimitedbecauseaccuratelearningrequiresmoreconstraintsthanarepro-videdbyafewdozenlabeledtrainingexamples.Ouralgorithmachievessignif-icantlyhigheraccuracybyusingtheinputontologyitselftoprovideadditionalconstraintsthatguidethelearner[9].Forexample,whenouralgorithmlearnsextractionpatternsforthepredicates'person','team'and'plays-on-team',priorknowledgefromtheontologyrequiresthatforanyunlabeledsentencecontainingnounphrasesAandB,theextractorfor'plays-on-team'canlabelA,B]TJ/;ø 9;.962; Tf; 7.7;H 0; Td ;[000;apositiveexampleoftherelationonlyifthe'person'classierlabelsApositive,andthe'team'classierlabelsBpositive.Astheontologygrowsincomplexity,thesetofconstraintsonthelearneralsogrows,resultinginevenhigheraccuracy.Insummary,ourapproachusesacoupledsemi-supervisedlearningalgorithmtoacquireextractionstrategiesforeachpredicateintheinputontology,andappliesthesetomacro-readmillionsofwebpagestopopulatethatontology. 40.84playsSport(?x,?y):-playsFor(?x,?z),teamPlaysSport(?z,?y)0.70playsSport(?x,baseball):-playsFor(?x,yankees)0.82teamPlaysSport(?x,?y):-playsFor(?x,?z),playsSport(?z,?y)0.73teamPlaysSport(?x,baseball):-playsAgainst(?x,yankees)Table1.Hornclauseruleslearnedfromextractedinstances.Numbersindicatetheconditionalprobabilitythattheliteraltotheleftofthe\:-"willbesatisediftheliteralstoitsrightaresatised.ciallyforanotatingindividualwebpages,andforextractinginformationthatisstatedonlyinfrequentlyontheweb(e.g.,personalinformationthatappearsonlyonaperson'shomepage).Onedirectionforfutureworkistoexplorewhetherandhowamacro-readerlikeourscanhelptrainamicro-reader.Whilewebelievemachinereadingwillplayanimportantroleinpopulat-ingthesemanticweb,otherapproacheswillbevaluabletoo,anditisusefultounderstanddierentrolesthesedierentapproachescanplay.Forexample,publishingpre-existingdatabasesmaybeespeciallyusefulforprovidingdeepcoverageoverfairlynarrowdomains(e.g.,thecensusdata).Incontrast,ourap-proachofmacro-readingthewebmaybebettersuitedtopopulatingmorebroadontologies,especiallywithitemsthatarementionedfrequentlyontheweb(andhenceeasiesttomacro-read).Becauseitcanberetrainedtofairlyarbitraryontologies,ourapproachmightalsobeusefulformorespecializedapplicationswheremanualmethodsareprohibitivelyexpensive.AcknowledgementsThisresearchhasbeensupportedbyDARPA,Google,andtheBrazilianResearchAgencyCAPES.Yahoo!Inc.hasprovidedgraduatefellowshipsupportaswellasaccesstotheirM45computingcluster.References1.Nivre,J.:Incrementalnon-projectivedependencyparsing.HLT-NAACL(2007)396{4032.TjongKimSang,E.F.,DeMeulder,F.:IntroductiontotheCoNLL-2003sharedtask:Language-independentnamedentityrecognition.In:ProceedingsofCoNLL-2003.(2003)142{1473.Vorhees,E.:OverviewofTREC2007.In:TREC.(2007)4.K.Nigam,AndrewMcCallum,S.T.,Mitchell,T.:Textclassicationfromlabeledandunlabeleddocumentsusingem.MachineLearning39(2000)103{1345.Yarowsky,D.:Unsupervisedwordsensedisambiguationrivalingsupervisedmeth-ods.In:ACL.(1995)189{1966.Blum,A.,Mitchell,T.:Combininglabeledandunlabeleddatawithco-training.In:COLT.(1998)7.Rilo,E.,Jones,R.:Learningdictionariesforinformationextractionbymulti-levelbootstrapping.In:AAAI.(1999)8.Brin,S.:Extractingpatternsandrelationsfromtheworldwideweb.In:WebDB.(1998)9.Carlson,A.,Betteridge,J.,HruschkaJr.,E.R.,Mitchell,T.M.:Couplingsemi-supervisedlearningofcategoriesandrelations.In:ProceedingsoftheNAACLHLT2009WorkshoponSemi-supervisedLearningforNLP.(2009)10.Wang,R.C.,Cohen,W.W.:Language-independentsetexpansionofnamedentitiesusingtheweb.In:ICDM.(2007)