/
Improving Question Retrieval in Community Question Ans Improving Question Retrieval in Community Question Ans

Improving Question Retrieval in Community Question Ans - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
622 views
Uploaded On 2015-04-29

Improving Question Retrieval in Community Question Ans - PPT Presentation

iaaccn Abstract Community question answering cQA which pro vides a platform for people with diverse back ground to share information and knowledge has become an increasingly popular research topic In this paper we focus on the task of question re tri ID: 56532

iaaccn Abstract Community question answering

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Improving Question Retrieval in Communit..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ImprovingQuestionRetrievalinCommunityQuestionAnsweringUsingWorldKnowledgeGuangyouZhou,YangLiu,FangLiu,DaojianZeng,andJunZhaoNationalLaboratoryofPatternRecognitionInstituteofAutomation,ChineseAcademyofSciences95ZhongguancunEastRoad,Beijing100190,Chinagyzhou,liuyang09,”iu,djzeng,jzhao@nlpr.ia.ac.cnCommunityquestionanswering(cQA),whichpro-videsaplatformforpeoplewithdiverseback-groundtoshareinformationandknowledge,hasbecomeanincreasinglypopularresearchtopic.Inthispaper,wefocusonthetaskofquestionre- http://qna.live.com trieval,whichsurpassesotherknowledgebasesbythecov-erageofconcepts,richsemanticinformationandup-to-datecontent.Inparticular,we“rstbuildaneasy-to-usethesaurusfromWikipedia,whichexplicitlyderivetheconceptrelation-shipsbasedonthestructuralknowledgeinWikipedia,in-cludingsynonymy,polysemy,hypernymyandassociativere-lation.ThethesaurusfacilitatestheintegrationoftherichworldknowledgeofWikipediaintoquestions,becauseitre-solvessynonymsandintroducesmoregeneralandassocia-tiveconceptswhichmayhelpidentifyrelatedtopicsbetweenthequeriedquestionsandthehistoricalquestions.Besides,itprovidesarichcontextforpolysemyconceptsensedisam-biguation.Thenwetreatthedifferentrelationsinthethe-saurusaccordingtotheirdifferentimportance,inordertoim-provethetraditionalsimilaritymeasureforquestionretrieval.ExperimentsconductedonarealcQAdatasetshowthatwiththehelpofWikipediathesaurus,theperformanceofquestionretrievalisimprovedascomparedtothetraditionalmethods.Therestofthispaperisorganizedasfollows.Section2describesawaytobuildaconceptthesaurusbasedonthesemanticrelationsextractedfromWikipedia.Section3describesthemethodtoleveragetheworldknowledgeofWikipediaforquestionretrieval.Section4presentstheex-perimentalresults.Insection5,weconcludewithideasforfutureresearch.2WikipediaThesaurusInthissection,weproposeawaytominesynonym,hy-pernymandassociativerelationsexplicitlyforeachconceptthroughanalyzingtherichlinksinWikipedia,andbuilditasaneasy-to-usethesaurus.Wikipediaistodaythelargestencyclopediaintheworldandsurpassesotherknowledgebasesinitscoverageofcon-cepts,richsemanticknowledgeandup-to-datecontent.Re-cently,WikipediahasgainedawideinterestinIRcommu-nityandhasbeenusedformanyproblemsrangingfromdocumentclassi“cationclassi“cationGebrilovichandMarkovitch,2006;WangandDomeniconl,2008;Wangetal.,2007totextxtHuetal.,2008;2009a;2009b.EacharticleinWikipediadescribesasingletopic:itstitleisasuccinct,well-formedphrasethatresemblesaterminaconventionalthe-the-Milneetal.,2006.Eacharticlebelongstoatleastonecategory,andhyperlinksbetweenarticlescapturetheirsemanticrelations.Thesesemanticrelationsinclude:equiva-lence(synonym),hierarchicalrelations(hypernym)andasso-ciativerelation.However,Wikipediaisanopendataresourcebuiltforhumanuse,soitinevitableincludesmuchnoiseandthesemanticknowledgewithinitisnotsuitablefordirectuseinquestionretrievalincQA.Tomakeitcleanandeasy-to-useasathesaurus,we“rstpreprocesstheWikipediadatatocol-lectWikipediaconcepts,andthenexplicitlyderiverelation-shipsbetweenWikipediabasedonthestructuralknowledgeofWikipedia.2.1WikipediaConceptsEacharticleofWikipediadescribesasingletopicanditstitlecanbeusedtorepresenttheconcept,e.g.,UnitedStatesŽ.However,somearticlesaremeaninglesstheyareonlyusedforWikipediamanagementandadministration,suchas1980sŽ,ListofnewspapersŽ,etc.Followingthelitera-litera-Huetal.,2008,we“lterWikipediatitlesaccordingtotherulesdescribed(titlessatisfyoneofbelowwillbe“l-Thearticlebelongstocategoriesrelatedtochronology,e.g.,YearsŽ,DecadesŽandCenturiesŽ.The“rstletterisnotacapitalone.Thetitleisasinglestopword.2.2SemanticRelationsinWikipediaWikipediacontainsrichrelationstructures,suchassynonym(redirectlinkpages),polysemy(disambiguationpage),hy-pernym(hierarchicalrelation)andassociativerelation(inter-nalpagelink).AllthesesemanticrelationsareexpressedintheformofhyperlinksbetweenWikipediaarticlesarticlesMilneetal.,2006Wikipediacontainsonlyonearticleforanygivenconceptbyusingredirecthyperlinkstogroupequivalentconceptstothepreferredone.Theseredirectlinkscopewithcapitalizationandspellingvariations,abbreviations,synonyms,andcollo-quialisms.SynonyminWikipediamainlycomesfromtheseredirectlinks.Forexample,IBMŽisanentrywithalargenumberofredirectpages:synonyms(I.B.M,Bigblue,IBMIBMCaietal.,2011.Inaddition,Wikipediaarti-clesoftenmentionotherconcepts,whichalreadyhavecor-respondingarticlesinWikipedia.Theanchortextoneachhyperlinkmaybedifferentwiththetitleofthelinkedarti-cle.Thus,anchortextscanbeusedasanothersourceofsyn-onymymHuetal.,2008PolysemyInWikipedia,disambiguationpagesareprovidedforapol-ysemousconcept.Adisambiguationpagelistsallpossiblemeaningsassociatedwiththecorrespondingconcept,whereeachmeaningisdiscussedinanarticle.Forexample,thedis-ambiguationpageofthetermIBMŽlists3associatedcon-cepts,includingInclusionbodymyositisŽ,InjectionblowmoldingŽ,andInternationalBusinessMachineŽMachineŽCaietal.HypernymInWikipedia,botharticles(concepts)andcategoriesbelongtoatleastonecategory,andcategoriesareorganizedinahierarchicalstructure.Theresultinghierarchyisadirectedacyclicgraph,inwhichmultiplecategorizationschemesco-existsimultaneouslysimultaneouslyMilneetal.,2006.ToextracttherealhierarchicalrelationsfromWikipediacategories,weutilizethemethodsproposedininPonzettoandStrube,2007tode-rivegenerichierarchicalrelationfromcategorylinks.Thus,wecangethypernymforeachWikipediaconcept.AssociativeRelationEachWikipediaarticlecontainsalotofhyperlinks,whichex-pressrelatednessbetweenthem.AsMilneetal.al.2006]men-tionedthat,linksbetweenarticlesareonlytenuouslyrelated.Forexample,comparingthefollowingtwolinks:onefrom thearticleIBMŽtothearticleAppleInc.Ž,theotherfromthearticleIBMŽtothearticleSoftwareengineerŽengineerŽCaietal.,2011.Itisclearthattheformertwoarticlesaremorere-latedthanthelaterpair.SohowtomeasuretherelatednessofhyperlinkswithinarticlesinWikipediaisanimportantissue.Inthispaper,weadoptthreemeasurements:Out-linkcategory-based,whichhasbeendescribedininWangetal.,2007measureisbasedontheBOWsrepresen-tationofWikipediaarticles.Clearly,thismeasure(denotedBOWs)hasthesamelimitationsoftheBOWsapproachsinceitonlyconsidersthewordsappearedintextdocuments.Out-linkcategory-basedmeasure(denotedasOLC)com-parestheout-linkcategoriesoftwoassociativearticles.Theout-linkcategoryofagivenarticlearethecategoriestowhichout-linkarticlesfromtheoriginalonebelong.measurecapturesthelengthoftheshort-estpathconnectingthetwocategoriestheybelongto,intheacyclicgraphofthecategorytaxonomy.Thismeasureisnor-malizedbytakingintoaccountthedepthofthetaxonomyanddenotedasFollowingwingWangetal.,2007;WangandDomeniconl,2008;Caietal.,2011,theoverallstrengthofanassociativerelationbetweenconceptscanbewrittenas:overallBOWsOLC+(1re”ecttherelativeimportanceoftheindivid-ualmeasure.Usingequation(1),werankalltheout-linkedconceptsforeachgivenconcept.Thenwedenotetheout-linkconceptswithrelatednessabovecertainthresholdasassocia-tiveonesforeachgivenconcept.3ImprovingQuestionRetrievalwithWikipediaConcepts3.1TraditionalBOWsQuestionSimilarityTraditionalmethodsrepresenteachquestionasBOWs.AfterremovingthestopwordsandstemmedbyPorterstemmerthestemmedtermsconstructatf-idfvectorrepresentationforeachqueriedquestion.Similarly,foreachhistorical,thestemmedtermsalsoconstructatf-idfvector.Finally,thesimilaritybetweenthequeriedquestionandthehistoricalquestioninthevectorspaceis,then,calculatedasthecosinesimilaritybetweentermq,d 3.2MappingQuestionsintoWikipediaConceptsTousetheWikipediathesaurustoenrichquestions,oneofthekeyissuesishowtomapwordsinquestionstoWikipediaconcepts.FollowingtheliteratureliteratureHuetal.,2008,webuildaphraseindexwhichincludesthephrasesofWikipediacon-cepts,theirsynonym,andpolysemyinWikipediathesaurus.BasedonthegeneratedWikipediaphrasesindex,allcandi-datephrasescanberecognizedinthewebpage.Weuse http://truereader.com/manuals/onix/stopwords1.htmlhttp://tartarus.org/martin/PorterStemmer/.theForwardMaximumMatchingalgorithmalgorithmWongandChan,tosearchcandidatephrases,whichisadictionary-basedwordsegmentationapproach.Byperformingthispro-cess,itisnecessarytodowordsensedisambiguationto“nditsmostpropermeaningmentionedinquestionsifacandidateconceptisapolysemousone.Wangetal.al.2007]proposedadisambiguationmethodbyconsideringthedocumentsimilar-ityandcontextualinformation,andtheexperimentsshowedahighdisambiguationaccuracy.WeadoptWangetal.al.2007]smethodtodowordsensedisambiguationforthepolysemousconceptsinquestions.Figure1showsanexampleoftheidenti“edWikipediacon-ceptsforquestionusingtheabovemethodmethodCaietal..ThephrasesoftwareengineerŽinismappedintoWikipediaconceptSoftwareengineerŽ,BigBlueŽinmappedintoWikipediaconceptIBMŽ. 1 Figure1:Anexampleoftheidenti“edWikipediaconceptsforquestion3.3MeasuringQuestionSimilaritywithHypernymsInWikipedia,eachconceptbelongstooneormorecategories.Moreover,thesecategoriesfurtherbelongtomorehigherlevelcategories,forminganacycliccategorygraphgraphHuetal.,2008.ThesetofcategoriescontainedinthecategorygraphofagivenconceptisrepresentedasCate,...,cate.Inthecategorygraph,acategorymayhaveseveralpathslinkedtoaconcept.Wecalculatethedis-c,catecatec1,...,cm])bythelengthoftheshortestpathfromconcepttocategoryAsnotedbyHuetal.al.2008],thehigherlevelcategorieshavelessin”uencethanthoselowerlevelcategoriessincethelowerlevelcategoriesaremorespeci“candthereforecandepictthearticlesmoreaccurate.Inthispaper,wepresentthein”uenceofthlayercategoriesonconceptInfandde“nelnf.Forhigherlevelsofcategories,similartotoHuetal.,2008;Caietal.,2011weintroduceadecayfactoractor,1].Thus,wehavelnfµlnflnf.AseachWikipediaconcepthasmorethanonecategories,andeachcategoryhasmorethanoneparentcategories,alargevalueofwillin-troducetoomanycategories.Therefore,wesetinourexperiments.Thus,foreachconceptwecanbuildacate-goryvectorlnfc,cate,...,lnfc,catelnfc,catelnfc,cate,whichindi-catesthein”uenceofcategoryonconcept.Letdenotetheconceptrepresentationofhistoricalquestion,thecorrespondingcategoryvectorcanberepresentedas.Similarity,forqueriedquestionwealsorepresentitinthecategoryvectorspaceasThesimilaritybetweenthequeriedquestionandthehistori-calquestioninthecategoryspaceis,then,calculatedasthe I.B.M.(1.0) IBM PC Company(1.0) International Business Machines Corporation(1.0)... Lenovo(0.31) Apple Inc.(0.32) Computer(0.27)... Figure2:Anexampleofthesynonymsandassociativecon-ceptsforWikipediaconceptŽIBMŽ.cosinesimilaritybetweenq,d 3.4MeasuringQuestionSimilaritywithSynonymiesandAssociativeConceptsTofurtherimprovetheperformance,synonymsandasso-ciativeconceptsinWikipediacanbeusedtoincludemorerelatedconceptstoovercomethedatasparseness.ForeachconceptinWikipedia,asetofrelatedconceptsareselectedfromitssynonymsandassociativeconcepts,inisthethrelatedconceptsofistherelatednessbetween.Therelatednessisde“nedasfollows:aresynonyms;overallareassociativerelationsoverallisde“nedbyequation(1).Lettheconceptrepresentationofhistoricalquestion,thecorre-spondingsynonymandassociativevectorcanberepresentedFigure2givesanexampleofthesynonymsandassociativeconceptsforWikipediaconceptIBMŽ.ForconceptIBMŽ,asetofrelatedconceptsInternationalBusi-nessMachinesCorporationŽ,IBMPCCompanyŽ,I.B.M.Ž,AppleInc.Ž,LenovoŽ,.ForquestioninFigure1,theconceptvectorisSoftwareengineerŽ,IBMŽthecorrespondingsynonymandassociativevectorcanberep-resentedasSoftwareengineerWeexpandthesynonymandassociativeconceptstoandgetthe“nalconceptrepresentationSACSimilarity,forqueriedquestion,wealsorepresentitinthe“nalsynonymandassociativeconceptspaceasSAC.Thesimilaritybetweenthequeriedquestionandthehistoricalquestioninthe“nalsynonymandassociativeconceptspaceis,then,calculatedasthecosinesimilaritybe-SACSACq,dSACSAC SACSAC3.5TheCombinationofQuestionSimilarityIntheprevioussections,wedescribethemethodstoexploitWikipediacategory,synonymandassociativerelationsforquestionsimilaritycomputation.Inthissection,wecombine Words require(0.019),software(0.031),engineer(0.027), big(0.018),blue(0.022) Softwareengineering(1.0),Softwareengineers(1.0), Computerhardwarecompanies(1.0),Cloudcomputing(0.5) Hypernyms Internationalbusiness(0.5),computercompanies(0.5) Multinationalcompanies(1.0),Technologycompanies(0.25) ··· InternationalBusinessMachinesCorporation(1.0), Synonyms InternationalBusinessMachines(1.0), IBMcomputer(1.0),IBMCorporation(1.0),I.B.M.(1.0), ··· AppleInc.(0.32),IBMPersonalComputer(0.60), Associative Corporation(0.36),Computerscience(0.47), Concepts softwarearchitecture(0.72), ··· Table1:AnexampleofthedifferentvectorrepresentationforinFigure1.theabovequestionsimilarityscoresintotermmatchingscoretermq,dusingalinearcombination:Scoreq,dq,dq,dtermq,d,therelativeimportanceofhierarchi-calrelation(category)score,synonymandassociativerela-tionscore,andthetermmatchingscoreisadjustedthrough,respectively.Thatistosay,foraqueriedques-(orahistoricalquestion),werepresenttheques-tionthroughthreedifferentvectors:WordsŽ,HypernymsŽ,SynonymsŽandAssociativeConceptsŽSAC.AnexampleofthefeaturevectorsforquestioninFigure1areshowninTable11Caietal.,20114Experiments4.1WikipediaDataforBuildingtheThesaurusWikipediadatacanbeobtainedeasilyfromhttp://download.wikipedia.orgforfreeresearchuse.Itisavailableintheformofdatabasedumpsthatarereleasedperiodically.TheversionweusedinourexperimentswasreleasedonSep.9,2007.Weidenti“edover4milliondis-tinctentitiesthatconstitutethevocabularyofthesaurus.TheWikipediadumpweusecontainsabout126,465categoriesand1,590,321conceptsafterpre-precessingand“ltering.4.2cQADataforQuestionRetrievalWecollectthedatasetfromYahoo!AnswersandusethegetByCategoryfunctionprovidedinYahoo!AnswersAPItoobtaincQAthreadsfromtheYahoo!site.Morespeci“-cally,weutilizetheresolvedquestionsandtheresultingques-tionrepositorythatweuseforquestionretrievalcontains2,288,607questions.Eachresolvedquestionconsistsoffourparts:questiontitleŽ,questiondescriptionŽ,questionan-swersŽandquestioncategoryŽ.Forquestionretrieval,weonlyusethequestiontitleŽpart.Itisassumedthatthetitlesofthequestionsalreadyprovideenoughsemanticinforma-tionforunderstandingtheusersinformationneedsneedsDuanetal.,2008.Thereare26categoriesatthe“rstleveland1,262categoriesattheleaflevel.Eachquestionbelongstoauniqueleafcategory.Table2showsthedistributionacross“rst-levelcategoriesofthequestionsinthearchives. http://developer.yahoo.com/answers Category #Size Category #Size Arts&Humanities 86,744 Home&Garden 35,029 Business&Finance 105,453 Beauty&Style 37,350 Cars&Transportation 145,515 Pet 54,158 Education&Reference 80,782 Travel 305,283 Entertainment&Music 152,769 Health 132,716 Family&Relationships 34,743 Sports 214,317 Politics&Government 59,787 SocialScience 46,415 Pregnancy&Parenting 43,103 Dingout 46,933 Science&Mathematics 89,856 Food&Drink 45,055 Computers&Internet 90,546 News&Events 20,300 Games&Recreation 53,458 Environment 21,276 ConsumerElectronics 90,553 LocalBusinesses 51,551 Society&Culture 94,470 Yahoo!Products 150,445 Table2:Numberofquestionsineach“rst-levelcategory.WeusethesametestsetinpreviousworkorkCaoetal.,2009;.Thissetcontains252queriedquestionsandcanbefreelydownloadedforresearchcommunities.Foreachmethod,thetop20retrievalresultsarekept.Givenareturnedresultforeachqueriedquestion,anannotatorisaskedtola-belitwithrelevantŽorirrelevantŽ.Ifareturnedresultisconsideredsemanticallyequivalenttothequeriedquestion,theannotatorwilllabelitasrelevantŽ;otherwise,theanno-tatorwilllabelitasirrelevantŽ.Twoannotatorsareinvolvedintheannotationprocess.Ifacon”icthappens,athirdper-sonwillmakejudgementforthe“nalresult.Intheprocessofmanuallyjudgingquestions,theannotatorsarepresentedonlythequestions.4.3EvaluationMetricsWeevaluatetheperformanceofquestionretrievalusingthefollowingmetrics:MeanAveragePrecision(MAP)andPrecision@N(P@N).MAPrewardsmethodsthatreturnrel-evantquestionsearlyandalsorewardscorrectrankingoftheresults.P@Nreportsthefractionofthetop-questionsre-trievedthatarerelevant.Weperformasigni“canttest,i.e.,a-testwithadefaultsigni“cantlevelof0.05.4.4ExperimentalResultsInthissubsection,wepresenttheexperimentalresults.Todemonstratetheeffectivenessoftheproposedmethod,wein-troducethefollowingmethodsforcomparison:BOWs:ThismethodmeasuresthequestionsimilaritybasedonBOWsrepresentationbyusingcosinesimilar-ity.Category:ThismethodmeasuresthequestionsimilaritywithHypernymsderivedfromWikipediabyusingequa-tion(3).SAC:ThismethodmeasuresthequestionsimilaritywithSynonymiesandAssociativeConceptsderivedfromWikipediabyusingequation(4).BOWs Category:Thismethodmeasuresthequestionsimilaritybyusingequation(5)exceptthatwesetBOWs SAC:Thismethodmeasuresthequestionsimilar-itybyusingequation(5)exceptthatweset 6http://homepages.inf.ed.ac.uk/gcong/qa/ # Methods MAP P@10 1 BOWs 0.242 0.226 2 Category 0.364 0.235 3 SAC 0.397 0.244 4 BOWs Category 0.425 0.252 5 BOWs SAC 0.448 0.266 6 BOWs Category SAC 0.463 0.272 Table3:Experimentalresultsforquestionretrieval.BOWs Category SAC:Thismethodmeasurestheques-tionsimilaritybyusingequation(5).Theparametersusedinthepaperaresetinthefollowingway:ForSAC,theparametersinequation(1)aretunedaccordingtothemethodologysuggestedininWangandDomeniconl,2008;Caietal.,2011.Asaresult,thevaluesareusedinourexperi-ForBOWs Category,theparameterissetto0.Wetunetheparametersonasmalldevelopmentsetof50questions.ThisdevelopmentsetisalsoextractedfromYahoo!Answers,anditisnotincludedinthetestset.Wetunethevalueoffrom0.1,0.2upto1.0,andthuswecan“ndthepropervaluesofonthedevelopmentset.Finally,issetto0.3andissettoForBOWs SAC,theparameterissetto0.Wetunethevalueoffrom0.1,0.2upto1.0,andthuswecan“ndthepropervaluesofonthedevelopmentset.Finally,issetto0.4andissetto0.6.ForBOWs Category SAC,wetunethevalueoffrom0.1,0.2upto1.0,andthuswecan“ndthepropervaluesofonthedevelopmentset.Finally,issetto0.2,issetto0.3andissetto0.5.Table3presentstheexperimentalresultsforquestionre-trieval.FromTable3,wehaveseveralobservations:ComparingourproposedCategorySACexperimen-talresults,we“ndthathierarchicalrelation,synonymandassociativerelationcansigni“cantlyimprovetheperformanceofquestionretrieval(row1vs.row2androw3,thecomparisonsarestatisticallysigni“cantatsynonymandassociativerelationplaysamoreimpor-tancerolethanhierarchicalrelations(Categoryonlygets12.2%improvementofMAPoverBOWs,whileSACgets15.5%improvementofMAPoverBOWsCombiningthedifferentmethodscanfurtherimprovetheperformanceofquestionretrieval(row2vs.row4androw6;row3vs.row5androw6).4.5ComparisonwiththeState-of-the-artThispaperaimstotacklethelimitationofBOWsforques-tionretrieval.ManyresearchershaveproposedtheuseoftranslationmodelsmodelsBergeretal.,2000;Jeonetal.,2005; etal.,2007;Xueetal.,2008;Leeetal.,2008;BernhardandGurevych,2009;Zhouetal.,2011tocap-turethesemanticwordrelationsandsolvethelimitationofBOWsbyusingthetranslatedwords.Jeonetal.al.2005](Jeon2005)proposedaword-basedtranslationmodelforau-tomatically“xingthelexicalgapproblem.Experimentalresultsdemonstratedthattheword-basedtranslationmodelsigni“cantlyoutperformedthetraditionalmethods.Xueetet2008](Xue2008)proposedaword-basedtranslationlan-guagemodelforquestionretrieval.Theresultsindicatedthatword-basedtranslationlanguagemodelfurtherimprovedtheretrievalresultsandobtainedthestate-of-the-artperfor-mance.Zhouetal.al.2011](Zhou2011)proposedamono-lingualphrase-basedtranslationmodelforquestionretrieval.Toimplementtheword-basedtranslationmodels,weusetheGIZA++alignmenttoolkittrainedononemillionquestion-answerpairsfromanotherdatasettolearntheword-to-wordtranslationprobabilities.Forphrase-basedtranslationmodeldescribedininZhouetal.,2011,weemployMosestoolkittoextractthephrasetranslationandsetthemaximumlengthofphrasesto5.Recently,Singhetal.al.2012](Singh2012)ex-tendedtheword-basedtranslationmodelandexploredstrate-giestolearnthetranslationprobabilitiesbetweenwordsandtheconceptsusingthecQAarchivesandapopularentitycat-alog.Zhouetal.al.2012](Zhou2012)employedthebilingualtranslationandexpandedqueriedquestionswithtranslatedwordsforquestionretrieval.Furthermore,Zhouetal.al.2013](Zhou2013)borrowedthestatisticalmachinetranslationtoexpandthequestionrepresentationviaamatrixfactorizationframework.Besides,Caoetal.al.2009](Cao2009)andCaoetal.al.2010](Cao2010)alsoproposedtoutilizecategoryin-formationforquestionretrieval.Caoetal.al.2010]introducedthedifferentcombinationstocomputetheglobalrelevanceandlocalrelevance,thecombinationVSM+TRLMshowedthesuperiorperformancethanothers.Inthispaper,wealsocomparetheproposedmethodwiththecombinationVSM+TRLM.Toimplementthesetwomethods,weemploythesameparametersettingswithCaoetal.al.2009]andCaoetet2010].Table4showsthecomparisonwiththestate-of-the-artforquestionretrieval.Fromthistable,wecanseethatourpro-posedmethodisbetterthanpreviouswork.Theresultsindi-catethatusingtheworldknowledgeofWikipediaforques-tionretrievalismorehelpfulthanthetranslationmodels.Inaddition,wealso“ndthatsomeothermethods(Zhou2012,Zhou2013,Cao2010)obtainbetterperformancethanoursBOWs Category SAC)byusingtheexternalinformationbe-yondthetexts.However,theseexternalinformation(e.g.,bilingualtranslationorcategoryinformation)isorthogonaltoours,andwesuspectthatcombiningthebilingualtransla-tionorcategoryinformationintoourproposedmethodmightgetevenbetterperformance.Weleaveitforfutureresearch. aachen.de/Colleagues/och/software/GIZA++.htmlTheYahoo!WebscopedatasetYahooanswerscom-prehensivequestionsandanswersversion1.0,availableat http://www.statmt.org/moses/ # Methods MAP P@10 1 BOWs 0.242 0.226 2 Jeon2005 0.405 0.247 3 Xue2008 0.436 0.261 4 Zhou2011 0.452 0.268 5 Singh2012 0.450 0.267 6 Zhou2012 0.483 0.275 7 Zhou2013 0.564 0.291 8 Cao2009 0.408 0.247 9 Cao2010 0.456 0.269 10 BOWs Category SAC 0.463 0.272 Table4:Comparisonwiththestate-of-the-artforquestionre-trieval.5ConclusionsandFutureWorkInthispaper,we“rstproposeawaytobuildaconceptthe-basedonthesemanticrelationsextractedfromworldknowledgeofWikipedia.Then,wedevelopauni“edframe-worktoleveragethesesemanticrelationsinordertoenhancethesimilaritymeasureforquestionretrievalintheconceptspace.ExperimentsconductedonarealcQAdatasetshowthatwiththehelpofWikipediathesaurus,theperformanceofquestionretrievalisimprovedascomparedtothetraditionalInthefuture,wewouldliketocombinethebilingualtrans-lation(e.g.,(e.g.,Zhouetal.,2012;2013)orcategoryinforma-tion(e.g.,(e.g.,Caoetal.,2010)intoourproposedmethodforquestionretrieval.Besides,wealsowanttofurtherinvesti-gatetheuseoftheproposedmethodforotherkindsofdataset,suchascategorizedquestionsfromforumsitesandFAQAcknowledgmentsThisworkwassupportedbytheNationalNaturalScienceFoundationofChina(No.61070106,No.61272332andNo.61202329),theNationalHighTechnologyDevelopment863ProgramofChina(No.2012AA011102),theNationalBasicResearchProgramofChina(No.2012CB316300),Wethanktheanonymousreviewersfortheirinsightfulcomments.WealsothankDr.GaoCongforprovidingthedatasetandDr.LiCaiforsomediscussion.ReferencesencesBergeretal.,2000A.Berger,R.Caruana,D.Cohn,D.Freitag,andV.Mittal.Bridgingthelexicalchasm:statisticalapproachtoanswer-“nding.InAnnualInterna-tionalACMSIGIRConference(SIGIR),pages192…199,192…199,BernhardandGurevych,2009D.BernhardandI.Gurevych.Combininglexicalsemanticresourceswithquestion&answerarchivesfortranslation-basedanswer“nding.InAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages728…736,2009.2009.Caietal.,2011L.Cai,G.Zhou,K.Liu,andJ.Zhao.Large-scalequestionclassi“cationincqabyleveraging wikipediasemanticknowledge.InConferenceonIn-formationandKnowledgeManagement(CIKM),pages1321…1330,2011.2011.Caoetal.,2009X.Cao,G.Cong,B.Cui,C.Jensen,andC.Zhang.Theuseofcategorizationinformationinlan-guagemodelsforquestionretrieval.In,pages265…274,2009.2009.Caoetal.,2010X.Cao,G.Cong,B.Cui,andC.Jensen.Ageneralizedframeworkofexploringcategoryinforma-tionforquestionretrievalincommunityquestionanswerarchives.In,pages201…210,2010.2010.Duanetal.,2008H.Duan,Y.Cao,C.Y.Lin,andY.Yu.Searchingquestionsbyidentifyingquestionstopicsandquestionfocus.InAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages156…164,2008.2008.GebrilovichandMarkovitch,2006E.GebrilovichandS.Markovitch.Overcomingthebrittlenessbottleneckusingwikipedia:Enhancingtextcategoricationwithen-cyclopediaknowledge.InInternationalJointConferenceonArti“cialIntelligence(IJCAI),pages1301…1306,2006.2006.GuptaandGupta,2012P.GuptaandV.Gupta.Asurveyoftextquestionansweringtechniques.InternationalJournalofComputerApplications,53(4):1…8,2012.2012.Huetal.,2008J.Hu,L.Fang,Y.Cao,H.Zeng,H.Li,Q.Yang,andZ.Chen.Enhancingtextclusteringbylever-agingwikipediasemantics.InAnnualInternationalACMSIGIRConference(SIGIR),2008.2008.Huetal.,2009aX.Hu,N.Sun,C.Zhang,andT.-S.Chua.Explotinginternalandexternalsemanticsforthecluster-ingofshorttextsusingworldknowledge.InConferenceonInformationandKnowledgeManagement(CIKM)(CIKM)Huetal.,2009bX.Hu,X.Zhang,C.Lu,E.K.Park,andX.Zhou.Exploitingwikipediaasexternalknowledgefordocumentclustering.In,2009.2009.Jeonetal.,2005J.Jeon,W.Croft,andJ.Lee.Findingsim-ilarquestionsinlargequestionandanswerarchives.InConferenceonInformationandKnowledgeManagement,pages84…90,2005.2005.Leeetal.,2008J.T.Lee,S.B.Kim,Y.I.Song,andH.C.Rim.Bridginglexicalgapsbetweenqueriesandquestionsonlargeonlineq&acollectionswithcompacttranslationmodels.InConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages410…418,2008.2008.Maybury,2004Maybury.NewdirectionsinquestionAAAI/MITPress,2004.2004.Milneetal.,2006D.Milne,Q.Medelyan,andI.H.Wit-ten.Miningdomain-speci“cthesaurifromwikipedia:acasestudy.In,2006.2006.PonzettoandStrube,2007S.P.PonzettoandM.Strube.Derivingalargescaletaxonomyfromwikipedia.InsociationfortheAdvancementofArti“cialIntelligence,2007.2007.Riezleretal.,2007S.Riezler,A.Vasserman,I.Tsochan-taridis,V.Mittal,andY.Liu.Statisticalmachinetransla-tionforqueryexpansioninanswerretrieval.InMeetingoftheAssociationforComputationalLinguistics(ACL),pages464…471,2007.2007.Robertsonetal.,1994S.Robertson,S.Walker,S.Jones,M.Hancock-Beaulieu,andM.Gatford.Okapiattrec-3.,pages109…126,1994.1994.Singh,2012A.Singh.Entitybasedq&aretrieval.InferenceonEmpiricalMethodsinNaturalLanguagePro-cessingandComputationalNaturalLanguageLearning,pages1266…1277,2012.2012.WangandDomeniconl,2008P.WangandC.Domeniconl.Buildingsemantickernelsfortextclassi“cationusingwikipedia.In,2008.2008.Wangetal.,2007P.Wang,J.Hu,H.-J.Zeng,L.Chen,andZ.Chen.Improvingtextclassi“cationbyusingencyclo-pediaknowledge.InInternationalConferenceonDataMining(ICDM),2007.2007.Wangetal.,2009K.Wang,Z.Ming,andT-S.Chua.Asyntactictreematchingapproachto“ndingsimilarques-tionsincommunity-basedqaservices.InAnnualInterna-tionalACMSIGIRConference(SIGIR),pages187…194,187…194,Wangetal.,2010B.Wang,X.Wang,C.Sun,B.Liu,andL.Sun.Modelingsemanticrelevanceforquestion-answerpairsinwebsocialcommunities.InAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1230…1238,2010.2010.WongandChan,1996P.WongandC.Chan.Chinesewordsegmentationbasedonmaximummatchingandwordbindingforce.InInternationalConferenceonComputa-tionalLinguistics(COLING),1996.1996.Xueetal.2008X.Xue,J.Jeon,andW.B.Croft.Retrievalmodelsforquestionandanswerarchives.In31stAn-nualInternationalACMSIGIRConference(SIGIR),pages475…482,2008.2008.Zhouetal.,2011G.Zhou,L.Cai,J.Zhao,andK.Liu.Phrase-basedtranslationmodelforquestionretrievalincommunityquestionanswerarchives.In49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(ACL-HLT),pages653…662,2011.2011.Zhouetal.,2012G.Zhou,K.Liu,andJ.Zhao.Exploitingbilingualtranslationforquestionretrievalincommunity-basedquestionanswering.In24thInternationalCon-ferenceonComputationalLinguistics(COLING),pages3153…3170,2012.2012.Zhouetal.,2013G.Zhou,F.Liu,Y.Liu,S.He,andJ.Zhao.Statisticalmachinetranslationimprovesquestionretrievalincommunityquestionansweringviamatrixfac-torization.In51thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),2013.