/
Will my Question be Answered Predicting Question Answe Will my Question be Answered Predicting Question Answe

Will my Question be Answered Predicting Question Answe - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
553 views
Uploaded On 2015-04-29

Will my Question be Answered Predicting Question Answe - PPT Presentation

com Abstract All askers who post questions in Communitybased Question Answer ing CQA sites such as Yahoo Answers Quora or Baidus Zhidao expect to re ceive an answer and are frustrated when their questions remain unanswered We propose to provide a typ ID: 56522

com Abstract All askers who

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Will my Question be Answered Predicting ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

WillmyQuestionbeAnswered?Predicting“QuestionAnswerability”inCommunityQuestion-AnsweringSitesGideonDror,YoelleMaarekandIdanSzpektorYahoo!Labs,MATAM,Haifa31905,Israelfgideondr,yoelle,idang@yahoo-inc.comAbstract.AllaskerswhopostquestionsinCommunity-basedQuestionAnswer-ing(CQA)sitessuchasYahoo!Answers,QuoraorBaidu'sZhidao,expecttore-ceiveananswer,andarefrustratedwhentheirquestionsremainunanswered.Weproposetoprovideatypeof“headsup”toaskersbypredictinghowmanyan-swers,ifatall,theywillget.Givingapreemptivewarningtotheaskeratpostingtimeshouldreducethefrustrationeffectandhopefullyallowaskerstorephrasetheirquestionsifneeded.Tothebestofourknowledge,thisistherstattempttopredicttheactualnumberofanswers,inadditiontopredictingwhetherthequestionwillbeansweredornot.Tothiseffect,weintroduceanewpredictionmodel,specicallytailoredtohierarchicallystructuredCQAsites.Weconductedextensiveexperimentsonalargecorpuscomprising1yearofansweringactivityonYahoo!Answers,asopposedtoasingledayinpreviousstudies.Theseexper-imentsshowthattheF1weachievedis24%betterthaninpreviouswork,mostlyduethestructurebuiltintothenovelmodel.1IntroductionInspiteofthehugeprogressofWebsearchenginesinthelast20years,manyusers'needsstillremainunanswered.Queryassistancetoolssuchasquerycompletion,andrelatedqueries,cannot,asoftoday,dealwithcomplex,heterogeneousneeds.Inaddi-tion,therewillalwaysexistsubjectiveandnarrowneedsforwhichcontenthaslittlechancetohavebeenauthoredpriortothequerybeingissued.Community-basedQuestionAnswering(CQA)sites,suchasYahoo!Answers,Quora,StackOveroworBaiduZhidao,havebeenpreciselydevisedtoanswerthesedifferentneeds.TheseservicesdifferfromtheextensivelyinvestigatedFactoidQuestionAn-sweringthatfocusesonquestionssuchas“WhenwasMozartborn?”,forwhichunam-biguousanswerstypicallyexist,[1].ThoughCQAsitesalsofeaturefactoidquestions,theytypicallyaddressotherneeds,suchasopinionseeking,recommendations,open-endedquestionsorveryspecicneeds,e.g.“WhattypeofbirdshouldIget?”or“Whatwouldyouchooseasyourlastmeal?”.Questionsnotonlyreectdiverseneedsbutcanbeexpressedinverydifferentstyles,yet,allaskersexpecttoreceiveanswers,andaredisappointedotherwise.Unansweredquestionsarenotararephenomenon,reaching13%ofthequestionsintheYahoo!Answersdatasetthatwestudied,asdetailedlater,anduserswhosequestionsremain unansweredareconsiderablymorepronetochurningfromtheCQAservice[2].Onewaytoreducethisfrustrationistoproactivelyrecommendquestionspotentialanswer-ers,[3–6].However,theaskerhaslittleornoinuenceontheanswerers'behavior.Indeed,aquestionbyitselfmayexhibitsomecharacteristicsthatreduceitspotentialforanswerability.Examplesincludeapoororambiguouschoiceofwords,agiventypeofunderlyingsentiment,thetimeofthedaywhenthequestionwasposted,aswellassheersemanticreasonsifthequestionreferstoacomplexorrareneed.Inthiswork,wefocusontheaskers,investigateavarietyoffeaturestheycancon-trol,andattempttopredict,basedonthesefeatures,theexpectednumberofanswersanewquestionmightreceive,evenbeforeitisposted.Withsuchpredictions,askerscanbewarnedinadvance,andadjusttheirexpectations,iftheirquestionshavelittlechancestobeanswered.Thisworkrepresentsarststeptowardsthemoreambitiousgoalofassistingaskersinpostinganswerablequestions,bynotonlyindicatingtheex-pectednumberofanswersbutalsosuggestingadequaterephrasing.Furthemore,wecanimagineadditionalusagesofourpredictionmechanism,dependingonthesitepriori-ties.Forinstance,aCQAsitesuchasYahoo!Answersthatattemptstosatisfyallusersmightdecidetopromotequestionswithfewpredictedanswersinordertoachieveahigheransweringrate.AlternativelyasociallyorientedsitelikeQuora,mightprefertopromotequestionswithmanypredictedanswersinordertoencouragesocialinteractionbetweenanswerers.Wecasttheproblemofpredictingthenumberofanswersasaregressiontask,whilethespecialcaseofpredictingwhetheraquestionwillreceiveanyansweratallisviewedasaclassicationtask.WefocusonYahoo!Answers,oneofthemostvisitedCQAsiteswith30millionsquestionsandanswersamonthand2.4askedquestionspersecond[7].ForeachquestioninYahoo!Answers,wegenerateasetoffeaturesthatareextractedonlyfromthequestionattributesandareavailablebeforequestionsubmission.Thesefeaturescaptureasker'sattributes,thetextualcontentofthequestion,thecategorytowhichthequestionisassignedandthetimeofsubmission.Inspiteofthisrichfeatureset,off-the-shelfregressionandclassicationmodelsdonotprovideadequatepredictionsinourtasks.Therefore,weintroduceaseriesofmodelsthatbetteraddresstheuniqueattributesofourdataset.Ourmaincontributionsarethreefold:1.weintroduceanoveltaskofpredictingthenumberofexpectedanswersforaques-tionbeforeitisposted,2.wedevisehierarchicallearningmodelsthatconsiderthecategory-drivenstructureofYahoo!Answersandreecttheirassociatedheterogeneouscommunities,eachwithitsownansweringbehavior,andnally,3.weconductthelargestexperimenttodateonanswerability,aswestudyayear-longquestionandansweractivityonYahoo!Answers,asopposedtoaday-longdatasetinpreviouswork.2BackgroundWithmillionsofactiveusers,Yahoo!Answershostsaverylargeamountofquestionsandanswersonawidevarietyoftopicsandinmanylanguages.Thesystemiscontent- centric,asusersaresociallyinteractingbyengaginginmultipleactivitiesaroundaspecicquestion.Whenauserasksanewquestion,shealsoassignsittoaspeciccat-egory,withinapredenedhierarchyofcategories,whichshouldbestmatchthegeneraltopicofthequestion.Forexample,thequestion“WhatcanIdotoxmybumper?”wasassignedtothecategory`Cars&Transportation�Maintenance&Repairs'.Eachnewquestionremains“open”forfourdays(withanoptionforextension),orlessiftheaskerchoseabestanswerwithinthisperiod.Registeredusersmayansweraquestionaslongasitremains“open”.OneofthemainissuesinYahoo!Answers,andincommunity-basedquestionan-sweringingeneral,isthehighvarianceinperceivedquestionandanswerquality.Thisproblemdrewalotofresearchinrecentyears.Somestudiesattemptedtoassessthequalityofanswers[8–11],orquestions[12,13],andrankthemaccordingly.Otherslookedatactiveusersforvarioustaskssuch,scoringtheir“reliability”asasignalforhighqualityanswersorvotes[14–16],identifyingspammers[17],predictingwhethertheaskerofaquestionwillbesatisedwiththereceivedanswers[18,19]ormatchingquestionstospecicusers[3–5].Ourresearchbelongstothesamegeneralschoolofworkbutfocusesonestimatingthenumberofanswersaquestionwillreceive.Priorworkthatanalyzesquestions,diditinretrospect,eitherafterthequestionshadbeenanswered[9],orasarankingtaskforagivencollectionofquestions[12,13].Incontrast,weaimatpredictingthenumberofanswersforeverynewquestionbeforeitissubmitted.Inarelatedwork,RichardsonandWhite[20]studiedwhetheraquestionwillre-ceiveananswerornot.Yet,theyconductedtheirstudyinadifferentenvironment,anIM-basedsynchronoussystem,inwhichpotentialanswerersareknown.Giventhisen-vironmenttheycouldleveragefeaturespertainingtothepotentialanswerers,suchasreputation.Inaddition,theyconsideredthespecicstyleofmessagessentoverIM,includingwhetheranewlinewasenteredandwhethersomepolitewordsareadded.Theirexperimentwasofasmallscaleon1,725questions,forwhichtheyshowedim-provementoverthemajoritybaseline.WenotethattheirdatasetislessskewedthaninYahoo!Answers.Indeedtheirdatasetcountedabout42%ofunansweredquestions,whileYahoo!Answersdatasetstypicallycountabout13%ofunansweredquestions.Wewilllaterdiscussthechallengesinvolvedindealingwithsuchaskeweddataset.AmorerelatedpriorworkthatinvestigatedquestionanswerabilityisYangetal.[21],whoaddressedthesametaskofcoarse(yes/no)answerabilityasabovebutinthesamesettingsasours,namelyYahoo!Answers.Yangetal.approachedthetaskasaclassi-cationproblemwithvariousfeaturesrangingfromcontentanalysis,suchascategorymatching,politewordsandhiddentopics,toaskerreputationandtimeofday.Theyusedaone-daydatasetofYahoo!Answersquestionsandobservedthesameratioofunansweredquestionsaswedidinourone-yeardataset,namely13%.Failingtocon-structaclassierforthisheavilyskeweddataset,Yangetal.resortedtolearningfromanarticiallybalancedtrainingset,whichresultedinimprovementsoverthemajoritybaseline.Inthispaper,wealsoaddressthisclassicationtask,withthesametypeofskeweddataset.However,unlikeYangetal.,weattempttoimproveoverthemajoritybaselinewithoutarticiallybalancingthedataset. Finallyanothermajordifferentiatorwiththeabovepreviousworkisthatwedonotstopatsimplypredictingwhetheraquestionwillbeansweredornot,butpredicttheexactnumberofanswersthequestionwouldreceive.3PredictingQuestionAnswerabilityOnekeyrequirementofourwork,aswellasadifferentiatorwithtypicalpriorworkonquestionanalysis,isthatwewanttopredictanswerabilitybeforethequestionisposted.Thisimposesconstraintsonthetypeofdataandsignalswecanleverage.Namely,wecanonlyusedatathatisintrinsictoanewquestionbeforesubmission.InthecaseofYahoo!Answers,thisincludes:(a)thetitleandthebodyofthequestion,(b)thecategorytowhichthequestionisassigned,(c)theidentityoftheuserwhoaskedthequestionand(d)thedateandtimethequestionisbeingposted.Weviewthepredictionoftheexpectednumberofanswersasaregressionprob-lem,inwhichatargetfunction(a.k.athemodel)^y=f(x)islearned,withxbeingavector-spacerepresentationofagivenquestion,and^y2Ranestimatefory,thenum-berofanswersthisquestionwillactuallyreceive.Allthedifferentmodelswepresentinthissectionarelearnedfromatrainingsetofexamplequestionsandtheirknownnumberofanswers,D=f(xi;yi)g.Thepredictiontaskofwhetheraquestionwillreceiveanansweratallisaddressedasaclassicationtask.Itissimilarlymodeledbyatargetfunction^y=f(x)andthesamevectorspacerepresentationofaquestion,yet,thetrainingtargetisbinary,withanswered(unanswered)questionsbeingthepositive(negative)examples.Tofullypresentourmodelsforthetwotasks,wenextspecifyhowaquestionrepre-sentationxisgenerated,andthenintroduceforeachtasknovelmodels(e.g.f(x))thataddresstheuniquepropertiesofthedataset.3.1QuestionFeaturesInourapproach,eachquestionisrepresentedbyafeaturevector.Foranynewquestion,weextractvariousattributesthatbelongtothreemaintypesofinformation:questionmetadata,questioncontent,anduserdata.Intherestofthispaperweusethetermfea-turefamilytodenoteasingleattributeextractedfromthedata.Questionattributesmaybenumerical,categoricalorset-valued(e.g.thesetofwordtokensinthetitle).Hence,inordertoallowlearningbygradient-basedmethods,wetransformedallcategoricalattributestobinaryfeatures,andbinnedmostofthenumericattributes.Forexample,thecategoryofaquestionisrepresentedas1287binaryfeaturesandthehouritwaspostedisrepresentedas24binaryfeatures.Tables3.1,2and3describethedifferentfeaturefamiliesweextract,groupedaccordingtotheirinformationsource:thequestiontext,theaskerandquestionmetadata.3.2RegressionModelsFollowingthedescriptionofthefeaturesextractedfromeachquestion,wenowintro-ducedifferentmodels(byorderofcomplexity)thatusethequestionfeaturevectorin Table1.Featuresextractedfromtitleandbodytexts FeatureFamily Description #Features Titletokens Thetokensextractedfromthetitle,notincludingstopwords 45,011 Bodytokens Thetokensextractedfromthebody,notincludingstopwords 45,508 Titlesentiment Thepositiveandnegativesentimentscoresofthetitle,calculatedbytheSentiStrengthtool[22] 2 Bodysentiment Themeanpositiveandnegativesentimentscoresofthesentencesinthebody 2 SupervisedLDA ThenumberofanswersestimatedbysupervisedLatentDirichletAllocation(SLDA)[23],whichwastrainedoverasmallsubsetofthetrainingset 1 TitleWH WH-words(what,when,where...)extractedfromthequestion'stitle 11 BodyWH WH-wordsextractedfromthequestion'sbody 11 Titlelength Thetitlelengthmeasuredbythenumberoftokensafterstopwordremoval,binnedonalinearscale 10 Bodylength Thebodylength,binnedonanexponentialscalesincethislengthisnotconstrainted 20 TitleURL ThenumberofURLsthatappearwithinthequestiontitle 1 BodyURL ThenumberofURLsthatappearwithinthequestionbody 1 ordertopredictthenumberofanswers.Weremindthereaderthatourtrainingsetcon-sistsofpairsD=f(xi;yi)g,wherexi2RFistheFdimensionalfeaturevectorrepresentationofquestionqi,andyi2f0;1;2:::gistheknownnumberofanswersforqi.BaselineModelYangetal.[21]comparetheperformanceofseveralclassiers,linearandnon-linear,onasimilardataset.TheyreportthatalinearSVMsignicantlyoutper-formsallotherclassiers.Giventhesendings,aswellasthefactthatalinearmodelisbothrobust[24]andcanbetrainedveryefcientlyforlargescaleproblems,wechosealinearmodelf(xi)=wTxi+basourbaselinemodel.FeatureAugmentationModelOneoftheuniquecharacteristicsoftheYahoo!An-swerssiteisthatitconsistsofquestionsbelongingtoavarietyofcategories,eachwithitscommunityofaskersandanswerers,temporalactivitypatterns,jargonetc.,andthatthecategoriesareorganizedinatopicaltaxonomy.Thisstructure,whichisinherenttothedata,suggeststhatmorecomplexmodelsmightbeusefulinmodelingthedata.Oneeffectivewayofincorporatingthecategorystructureofthedatainaregressionmodelistoenrichthefeatureswithcategoryinformation.Specically,weborrowedtheideafrom[25],whichoriginallyutilizedsuchinformationfordomainadaptation.Toformallydescribethismodel,weconsidertheYahoo!Answerscategorytaxon-omyasarootedtreeTwith“AllCategories”asitsroot.Whenreferringtothecategorytreewewilluseinterchangeablythetermnodeandcategory.Wedenotethecategory Table2.Featuresextractedbasedontheasker FeatureFamily Description #Features AskerID Theidentityoftheasker,ifitaskedatleast50questionsinthetrainingset.WeignoreaskerswhoaskedfewerquestionssincetheirIDstatisticsareunreliable 175,714 Mean#ofanswers Thepastmeannumberofanswerstheaskerreceivedforherques-tions,binnedonanexponentialscale 26 #ofquestions Thepastnumberofquestionsaskedbytheasker,binnedonalinearscaleandonanexponentialscale 26 Log#ofquestions Thelogarithmofthetotalnumberofquestionspostedbytheaskerinthetrainingset,andthesquareofthelogarithm.Forbothfeaturesweadd1totheargumentofthelogarithmtohandletestuserswithnotrainingquestions. 2 Table3.Featuresextractedfromthequestion'smetadata FeatureFamily Description #Features Category TheIDofthecategorythatthequestionisassignedto 1,287 ParentCategory TheIDoftheparentcategoryoftheassignedcategoryforthequestion,basedonthecategorytaxonomy 119 Hour Thehouratwhichthequestionwasposted,capturingdailypat-terns 24 Dayofweek Theday-of-weekinwhichthequestionwasposted,capturingweeklypatterns 7 Weekofyear Theweekintheyearinwhichthequestionwasposted,capturingyearlypatterns 51 ofaquestionqibyC(qi).WefurtherdenotebyP(c)thesetofallnodesonthepathfromthetreeroottonodec(includingcandtheroot).Fornotationalpurposes,weuseabinaryrepresentationforP(c):P(c)2f0;1gjTj,wherejTjisthenumberofnodesinthecategorytree.Thefeatureaugmentationmodelrepresentseachquestionqibybxi2RFjTjwherebxi=P(C(qi)) xiwhere representstheKroneckerproduct.Forexample,givenquestionqithatisassignedtocategory`Dogs',therespectivenodepathinTis`AllQuestions/Pets/Dogs'.Thefeaturevectorbxiforqiisallzerosexceptforthreecopiesofxicorrespondingtoeachofthenodes`AllQuestions',`Pets'and`Dogs'.Therationalebehindthisrepresentationistoallowaseparatesetoffeaturesforeachcategory,therebylearningcategoryspecicpatterns.Theseincludelearningpatternsforleafcategories,butalsolearninglowerresolutionpatternsforintermediatenodesinthetree,whichcorrespondtoparentandtopcategoriesinYahoo!Answers.Thispermitsagoodtradeoffbetweenhighresolutionmodelingandrobustness,obtainedbythehigherlevelcategorycomponentssharedbymanyexamples. (a)Treeofcategories (b)TreeofmodelsFig.1.Anillustrationofthesubtreemodelstructure.Shadednodesin(a)representcategoriespopulatedwithquestions,whileunshadednodesarepurelynavigational.SubtreeModelAnalternativetothefeatureaugmentationmodelistotrainseverallinearmodels,eachspecializingonadifferentsubtreeofT.LetusnoteasubsetofthedatasetasDc=f(xi;yi)jqi2S(c)g,whereS(c)isthesetofcategoriesinthecategorysubtreerootedatnodec.WealsonoteamodeltrainedonDcasfc.Sincethereisaone-to-onecorrespondencebetweenmodelsandnodesinT,thesetofmodelsfccanbeorganizedasatreeisomorphictoT.Modelsindeeperlevelsofthetreearespecializedonfewercategoriesthanmodelsclosertotheroot.Figure1illustratesacategorytreeanditscorrespondingmodeltreestructure.Theshadednodesin1(a)representcategoriestowhichsometrainingquestionsareassigned.Onesimplisticwayofusingthemodeltreestructureistoapplytherootmodel(fAinFigure1(b))toalltestquestions.Notethatthisisidenticaltothebaselinemodel.Yet,therearemanyotherwaystomodelthedatausingthemodeltree.Specically,anysetofnodesthatalsoactsasatreecutdenesaregressionmodel,inwhichthenumberofanswersforagivenquestionqiispredictedbytherstmodelinthesetencounteredwhentraversingfromthecategoryci,assignedtoqi,totherootofT.Inthiswork,weshalllimitourselvestothreesuchcuts:TOP:qiispredictedbymodelfTop(ci),whereTop(c)isthecategoryinP(c)directlyconnectedtotheroot.PARENT:qiispredictedbymodelfParent(ci)NODE:qiispredictedbymodelfciInFigure1theTOPmodelreferstoffB,fCg,thePARENTmodelreferstoffB,fC,fF,fHgandtheNODEmodelreferstoffD,fE,...fMg.EnsembleofSubtreeModelsInordertofurtherexploitthestructureofthecategorytaxonomyinYahoo!Answers,thequestionsineachcategorycareaddressedbyallmodelsinthepathbetweenthiscategoryandthetreeroot,underthesubtreeframe-workdescribedabove.Eachmodelinthispathintroducesadifferentbalancebetweenrobustnessandspecicity.Forexample,therootmodelisthemostrobust,butalsotheleastspecicintermsoftheidiomaticattributesofthetargetcategoryc.Attheotherendofthespectrum,fcisspecicallytrainedforc,butitismoreproneforoverttingthedata,especiallyforcategorieswithfewtrainingexamples. Insteadofpickingjustonemodelonthepathfromctotheroot,theensemblemodelforclearnstocombineallsubtreemodelsbytrainingametalinearmodel:f(xi)=Xc02P(c) cc0fc0(xi)+bc(1)wherefc0arethesubtreemodelsdescribedpreviouslyandtheweights cc0andbcareoptimizedoveravalidationset.Forexample,theensemblemodelforquestionsassignedtocategoryEinFigure1(a)aremodeledbyalinearcombinationofmodelsfA,fBandfE,whicharetrainedontrainingsetsD,DBandDErespectively.3.3ClassicationModelsThetaskofpredictingwhetheraquestionwillbeansweredornotisanimportantspe-cialcaseoftheregressiontask.Inthisclassicationtask,wetreatquestionsthatwerenotansweredasnegativeexamplesandquestionsthatwereansweredasthepositiveexamples.Weemphasizethatourdatasetisskewed,withthenegativeclassconstitutingonly12.68%ofthedataset.Furthermore,asalreadynotedin[26],thedistributionofthenumberofanswersperquestionisveryskewed,withalongtailofquestionshavinghighnumberofanswers.Asdescribedinthebackgroundsection,thistaskwasstudiedbyYangetal.[21],whofailedtoprovideasolutionfortheunbalanceddataset.Instead,theyarticiallybal-ancedtheclassesintheirtrainingsetbysampling,whichmayreducetheperformanceoftheclassieronthestillskewedtestset.UnlikeYangetal.,whousedoff-the-shelfclas-siersforthetask,wedevisedclassiersthatspecicallyaddresstheclassimbalanceattributeofthedata.Wenoticedthataquestionthatreceivedoneortwoanswerscouldhaveeasilygoneunanswered,whilethisisunlikelyforquestionswithdozenanswersormore.Whenprojectingthenumberofanswersyiintotwovalues,thisdifferencebe-tweenpositiveexamplesislostandmayproduceinferiormodels.Thefollowingmodelsattempttodealwiththisissue.BaselineModelYangetal.[21]foundthatlinearSVMprovidessuperiorperformanceonthistask.Accordingly,wechooseasbaselinealinearmodel,f(xi)=wTxi+btrainedwithhingeloss.WetrainthemodelonthebinarizeddatasetD0=f(xi;sign(yi�1=2))g(seeourexperimentformoredetails).FeatureAugmentationModelInthismodel,wetrainthesamebaselineclassierpresentedabove.Yetthefeaturevectorfedintothemodelistheaugmentedfeaturerepresentationintroducedfortheregressionmodels.EnsembleModelInordertocapturetheintuitionthatnot“allpositiveexamplesareequal”,weuseanideacloselyrelatedtoworksbasedonErrorCorrectingOutputCod-ingformulti-classclassication[27].Specically,weconstructaseriesofbinaryclas-sicationdatasetsDt=f(xi;zti)gwherezti=sign(yi�1=2�t)andt=0;1;2;:::.Inthisseries,D0isadatasetwherequestionswithoneormoreanswersareconsidered positive,whileinD10onlyexampleswithmorethan10answersareconsideredposi-tive.Wenotethatthesedatasetshavevaryingdegreesofimbalancebetweenthepositiveandthenegativeclasses.DenotingbyfttheclassiertrainedonDt,weconstructthenalensembleclassierbyusingalogisticregressionf(x)=(Xt tft(x)+b)(2)where(u)=(1+e�u)�1andthecoefcients tandbarelearnedbyminimizingthelog-likelihoodlossonthevalidationset.EnsembleofFeatureAugmentationModelsInthismodel,wetrainthesameensem-bleclassierpresentedabove.Yetthefeaturevectorfedintothemodelistheaugmentedfeaturerepresentationintroducedfortheregressionmodels.ClassicationEnsembleofSubtreeModelsAsourlastclassicationmodel,wedi-rectlyutilizeregressionpredictionstodifferentiatebetweenpositiveexamples.Weusethesamemodeltreestructureusedintheregressionbyensembleofsubtreemodels.Allmodelsarelinearregressionmodelstrainedexactlyasintheregressionproblem,inordertopredictthenumberofanswersforeachquestion.Thenalensemblemodelisalogisticregressionfunctionoftheoutputsoftheindividualregressionmodels:f(xi)=(Xc02P(c) cc0fc0(xi)+bc)(3)wherefc0arethesubtreeregressionmodelsandtheweights cc0andbcaretrainedusingthevalidationset.4ExperimentsWedescribeheretheexperimentsweconductedtotestourregressionandclassicationmodels,startingwithourexperimentalsetup,thenpresentingourresultsandanalyses.4.1ExperimentalSetupOurdatasetconsistsofauniformsampleof10millionquestionsoutofallnon-spamEnglishquestionssubmittedtoYahoo!Answersin2009.Thequestionsinthisdatasetwereaskedbymorethan3milliondifferentusersandwereassignedto1;287categoriesoutofthe1;569categories.Asignicantfractionofthesampledquestions(12.67%)remainedunanswered.Theaveragenumberofanswersperquestionis4:56(=6:11).Thedistributionofthenumberofanswersfollowsapproximatelyageometricdistribu-tion.Thedistributionsofquestionsamongusersandamongcategoriesareextremelyskewed,withalongtailofuserswhopostedoneortwoquestionsandsparselypop-ulatedcategories.ThesedistributionsaredepictedinFigure2,showingapowerlaw (a)Questionsperuser (b)QuestionspercategoryFig.2.Distributionofnumberofquestionsdepictedasafunctionofranksbehaviorforthequestionsperaskerdistribution.Alargefractionofthecategorieshavequiteafewquestions,forexample,abouthalfofallcategoriesinourdatasetcountlessthan50examples.Werandomlydividedourdatasetintothreesets:80%training,15%testand5%validation(forhyper-parametertuning).Veryfewquestionsinthedatasetattractedhun-dredsofanswers.Toeliminatetheilleffectofthesequestionsonmodeltraining,wemodiedthemaximumnumberofanswersperquestionto64.Thisresultedinchangingthetargetofabout0:03%ofthequestions.Duetoitsspeed,robustnessandscalability,weusedtheVowpalWabbittool1when-everpossible.Allregressionmodelsweretrainedwithsquaredloss,exceptforensem-bleofsubtreemodels,Eq.1,whosecoefcientswerelearnedbyaleastsquarest.AllclassicationmodelsweretrainedusingVowpalWabbitwithhingeloss,exceptfortheensemblemodels,Eq.2and3,whosecoefcientswerelearnedbymaximizingthelog-likelihoodofthevalidationsetusingStochasticGradientDescent.Wenotethatforanodec,whereensemblemodelsshouldhavebeentrainedbasedonlessthan50val-idationexamples,werefrainedfromtrainingtheensemblemodelandusedtheNODEsubtreemodelofcasasinglecomponentoftheensemblemodel.Table4comparesbetweenthevarioustrainedmodelswithrespecttothenumberofbasiclinearmodelsusedincompositemodelsandtheaveragenumberoffeaturesobservedperlinearmodel.Themeta-parametersoftheensemblemodels(Eq.1and3)werenotincludedinthecounting.4.2ResultsTheperformanceofthedifferentregressionmodelsonourdatasetwasmeasuredbyRootMeanSquareError(RMSE)[28]andbythePearsoncorrelationbetweenthepre-dictionsandthetarget.Table5presentstheseresults.Ascanbeseen,allourmodelsoutperformthebaselineoff-the-shelflinearregressionmodel,withthebestperforming 1http://hunch.net/˜vw/ Table4.Detailsoftheregressionandclassicationmodels,includingthenumberoflinearmod-elsineachcompositemodel,andtheaveragenumberoffeaturesusedbyeachlinearmodel Regression Model #linear features models permodel Baseline 1 267,781 Featureaugmentation 1 12,731,748 Subtree-TOP 26 88,358 Subtree-PARENT 119 26,986 Subtree-NODE 924 9,221 Ens.ofsubtreemodels 955 13,360 Classication Model #linear features models permodel Baseline 1 267,781 Featureaugmentation 1 12,731,748 Ensemble 7 267,781 FeatureaugmentationEns. 7 12,731,748 Ens.ofsubtreemodels 955 13,360 Table5.Testperformancefortheregressionmodels Model RMSE Pearson Correlation Baseline 5.076 0.503 Featureaugmentation 4.946 0.539 Subtree-TOP 4.905 0.550 Subtree-PARENT 4.894 0.552 Subtree-NODE 4.845 0.564 Ens.ofsubtreemodels 4.606 0.620 Table6.Testperformancefortheclassica-tionmodels Model AUC Baseline 0.619 Featureaugmentation 0.646 Ensemble 0.725 Featureaugmentationensemble 0.739 Ensembleofsubtreemodels 0.781 modelachievingabout10%relativeimprovement.Theseresultsindicatetheimpor-tanceofexplicitlymodelingthedifferentansweringpatternswithintheheterogeneouscommunitiesinYahoo!Answers,ascapturedbycategories.Interestingly,thefeature-augmentationmodel,whichattemptstocombinebetweencategoriesandtheirances-tors,performsworsethananyspecicsubtreemodel.Oneofthereasonsforthisisthehugenumberofparametersthismodelhadtotrain(seeTable4),comparedtotheensembleofseparatelytrainedsubtreemodels,eachrequiringconsiderablyfewerpa-rameterstotune.At-testbasedonthePearsoncorrelationsshowsthateachmodelinTable5issignicantlybetterthantheprecedingone,withP-valuesclosetozero.TheperformanceofmodelsfortheclassicationtaskwasmeasuredbytheareaundertheROCCurve(AUC)[29].AUCisapreferredperformancemeasurewhenclassdistributionsareskewed,sinceitmeasurestheprobabilitythatapositiveexampleisscoredhigherthananegativeexample.Specically,theAUCofamajoritymodelisalways0.5,independentlyofthedistributionofthetargets.InspectingtheclassicationresultsinTable6,wecanseethatallthenovelmodelsimproveoverthebaselineclassier,withthebestperformingensembleofsubtreesclassierachievinganAUCof0:781,asubstantialrelativeimprovementof26%overthebaseline'sresultof0:619.At-testbasedontheestimatedvarianceofAUC[30] showsthateachmodelinTable6isstatisticallysignicantlysuperiortoitspredecessorwithP-valuespracticallyzero.Wenextexamineinmoredepththeperformanceoftheensembleofclassiersandtheensembleofsubtreeregressors(thethirdandfthentriesinTable6respectively).Weseethattheensembleofclassiersexplicitlymodelsthedifferencesbetweenques-tionswithmanyandfewanswers,signicantlyimprovingoverthebaseline.Yet,theensembleofsubtreeregressorsnotonlymodelsthispropertyofthedatabutalsothedifferencesinansweringpatternswithindifferentcategories.Itshigherperformancein-dicatesthatbothattributesarekeyfactorsinprediction.Thusthetaskofpredictingtheactualnumberofanswershasadditionalbenets,itallowsforabetterunderstandingofthestructureofthedataset,whichalsohelpsfortheclassicationtask.Finally,wecomparedourresultstothoseofYangetal.[21].TheymeasuredtheF1valueonthepredictionsoftheminorityclassofunansweredquestions,forwhichtheirbestclassierachievedanF1of0:325.Ourbestmodelforthismeasurewasagaintheensembleofsubtreemodelsclassier,whichachievedanF1of0:403.Thisisasubstantialincreaseof24%overYangetal.'sbestresult,showingagainthebenetsofastructuredclassier.4.3ErrorAnalysisWeinvestigatedwhereourmodelserrbymeasuringtheaverageperformanceofourbestperformingmodelsasafunctionofthenumberofanswerspertestquestion,seeFigure3.Wesplitthetestexamplesintodisjointsetscharacterizedbyaxednum-berofanswersperquestionandaveragedtheRMSEofourbestregressoroneachset(Figure3(a)).SinceourclassierisnotoptimizedfortheAccuracymeasure,wesetaspecicthresholdontheclassieroutput,choosingthe12:672percentileoftestexam-pleswithlowestscoresasnegatives.Figure3(b)showstheerrorrateforthisthreshold.Wenotethattheerrorrateforzeroanswersreferstofalsepositivesrateandforallothercasesitreferstothefalsenegativesrate.Figure3(a)exhibitsaclearminimumintheregionmostpopulatedwithquestions,whichshowsthattheregressorisoptimizedforpredictingvaluesnear0.AlthoughtheRMSEincreasessubstantiallywiththenumberofanswers,itisstillmoderate.Ingen-eral,theRMSEweobtainedisapproximatelylineartothesquarerootofthenumberofanswers.Specically,forquestionswithlargenumberofanswers,theRMSEismuchsmallerthanthetruenumberofanswers.Forexample,forquestionswithmorethan10answers,whichconstituteabout13%ofthedataset,theactualnumberofanswersisap-proximatelytwicetheRMSEonaverage.Thisshowsthebenetofusingtheregressionmodelsasinputtoaanswered/unansweredclassier,aswedidinourbestperformingclassier.Thisisreected,forexample,intheverylowerrorrates(0.0064orless)forquestionswithmorethan10answersinFigure3(b).Whiletheregressionoutputeffectivelydirectstheclassiertothecorrectdecisionforquestionswitharound5ormoreanswers,Figure3(b)stillexhibitssubstantialerrorratesforquestionswithveryfewornoanswers.Thisisduetotheinherentrandomnessintheansweringprocess,inwhichquestionsthatreceivedveryfewanswerscouldhave 2Thisisthefractionofnegativeexamplesinourtrainingset. (a)Regressiontask (b)ClassicationtaskFig.3.PerformanceoftheSubtreeEnsemblemodelsasafunctionofthenumberofanswerseasilygoneunansweredandviceversaandarethusdifculttopredictaccurately.Infuturework,wewanttoimprovetheseresultsbyemployingboostingapproachesandconstructingspecializedclassiersforquestionswithveryfewanswers.4.4TemporalAnalysisIntuitively,thetimeatwhichaquestionispostedshouldplayaroleinasocial-mediasite,therefore,likeYangetal.[21],weusetemporalfeatures.Ourdataset,whichspansoveroneyear,conrmstheirreportedpatternsofhourlyanswering:questionspostedatnightaremostlikelytobeanswered,while“afternoonquestions”areabout40%morelikelytoremainunanswered.Toextendthisanalysistolongertimeperiods,weanalyzedweeklyandyearlypat-terns.Werstcalculatedthemeannumberofanswersperquestionandthefractionofunansweredquestionsasafunctionofthedayofweek,asshowninFigure4.Aclearpatterncanbeobserved:questionsaremoreoftenansweredtowardstheendoftheweek,withasharppeakonFridaysandasteepdeclineovertheweekend.Thedif-ferencesbetweenthedaysarehighlystatisticallysignicant(t-test,twosidedtests).ThetwographsinFigure4exhibitextremelysimilarcharacteristics,indicatingthatthefractionofunansweredquestionsisnegativelycorrelatedwiththeaveragenumberofanswersperquestion.Thissuggeststhatbothphenomenaarecontrolledbyasupplyanddemandequilibrium.Thiscanbeexplainedbytwohypotheses:(a)bothphenomenaaredrivenbyanincreaseinquestions(Yangetal.'shypothesis)or(b)bothphenomenaaredrivenbyadecreaseinthenumberofanswers.Totesttheabovetwohypotheses,weextractedthenumberofquestions,numberofanswersandfractionofunansweredquestionsonadailybasis.EachdayisrepresentedinFigure5asasinglepoint,asweplotthedailyfractionofunansweredquestionsasafunctionofthedailyaveragenumberofanswersperquestion(Figure5(a))andasafunctionofthetotalnumberofdailyquestions(Figure5(b)).Wenotethatwhilesomeanswersareprovidedonawindowoftimelongerthanaday,thisisararephenomenon. (a)Meananswers (b)Frac.unansweredFig.4.Meannumberofanswersandfractionofnumberofanswersasafunctionofthedayoftheweek,where'1'correspondstoMondayand'7'toSunday (a) (b)Fig.5.ThedailyfractionofunansweredquestionsasafunctionofthedailymeannumberofanswersandasafunctionofthetotalnumberofquestionsThevastmajorityofanswersareobtainedwithinabouttwentyminutesfromtheques-tionpostingtime[5],henceourdailyanalysis.Figure5(a)exhibitsastrongnegativecorrelation(Pearsoncorrelationr=�0:631),whilealmostnoeffectisobservedinFigure5(b)(r=0:010).Wefurthertestedthecorrelationbetweenthedailytotalnumberofanswersandthefractionoffractionofunansweredquestions,andhereaswellasignicantnegativecorrelationwasobserved(r=�0:386).Thesendingssupportthehypothesisthatdeciencyinanswerersisthekeyfactoraffectingthefractionofunansweredquestions,andnottheoverallnumberofquestions,whichwasYangetal'shypothesis.Thisresultisimportant,becauseitimpliesthatmorequestionsinacommunity-basedquestionansweringsitewillnotreducetheperformanceofthesite,aslongasanactivecommunityofanswerersstrivesatitscore. 5ConclusionsInthispaper,weinvestigatedtheanswerabilityofquestionsincommunity-basedques-tionansweringsites.Wewentbeyondpreviousworkthatreturnedabinaryresultofwhetherornotthequestionwillbeanswered.Wefocusedonthenoveltaskofpre-dictingtheactualnumberofexpectedanswersfornewquestionsincommunity-basedquestionansweringsites,soastoreturnfeedbacktoaskersbeforetheyposttheirques-tions.Weintroducedaseriesofnovelregressionandclassicationmodelsexplicitlydesignedforleveragingtheuniqueattributesofcategory-organizedcommunity-basedquestionansweringsites.Weobservedthatthesecategorieshostdiversecommunitieswithdifferentansweringpatterns.OurmodelsweretestedoveralargesetofquestionsfromYahoo!Answers,showingsignicantimprovementoverpreviousworkandbaselinemodels.Ourresultsconrmedourintuitionthatpredictinganswerabilityatanergrainedlevelisbenecial.Theyalsoshowedthestrongeffectofthedifferentcommunitiesinteractingwithquestionsonthenumberofanswersaquestionwillreceive.Finally,wediscoveredanimportantandsomehowcounter-intuitivefact,namelythatanincreasednumberofquestionswillnotnegativelyimpactanswerability,aslongasthecommunityofanswerersismaintained.Weconstructedmodelsthatareperformantatscale:eventheensemblemodelsareextremelyfastatinferencetime.Infuturework,weintendtoincreaseresponsetimeevenfurtherandconsiderincrementalaspectsinordertoreturnpredictionsastheaskertypes,thusproviding,inreal-time,dynamicfeedbackandamoreengagingexperience.Tocomplementthisscenario,wearealsointerestedinprovidingquestionrephrasingsuggestionsforafullassistancesolution.References1.Voorhees,E.M.,Tice,D.M.:Buildingaquestionansweringtestcollection.In:SIGIR.(2000)2.Dror,G.,Pelleg,D.,Rokhlenko,O.,Szpektor,I.:Churnpredictioninnewusersofyahoo!answers.In:WWW(CompanionVolume).(2012)829–8343.Li,B.,King,I.:Routingquestionstoappropriateanswerersincommunityquestionanswer-ingservices.In:CIKM.(2010)1585–15884.Horowitz,D.,Kamvar,S.D.:Theanatomyofalarge-scalesocialsearchengine.In:Proceed-ingsofthe19thinternationalconferenceonWorldwideweb.WWW'10,NewYork,NY,USA,ACM(2010)431–4405.Dror,G.,Koren,Y.,Maarek,Y.,Szpektor,I.:Iwanttoanswer;whohasaquestion?:Yahoo!answersrecommendersystem.In:KDD.(2011)1109–11176.Szpektor,I.,Maarek,Y.,Pelleg,D.:Whenrelevanceisnotenough:Promotingdiversityandfreshnessinpersonalizedquestionrecommendation.In:WWW.(2013)7.Rao,L.:Yahoomailandimusersupdatetheirstatus800milliontimesamonth.TechCrunch(Oct282009)8.Jeon,J.,Croft,W.B.,Lee,J.H.,Park,S.:Aframeworktopredictthequalityofanswerswithnon-textualfeatures.In:Proceedingsofthe29thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval.SIGIR'06,NewYork,NY,USA,ACM(2006)228–2359.Agichtein,E.,Castillo,C.,Donato,D.,Gionis,A.,Mishne,G.:Findinghighqualitycontentinsocialmedia,withanapplicationtocommunity-basedquestionanswering.In:Proceed-ingsofACMWSDM.WSDM'08,Stanford,CA,USA,ACMPress(February2008) 10.Shah,C.,Pomerantz,J.:Evaluatingandpredictinganswerqualityincommunityqa.In:Proceedingsofthe33rdinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval.SIGIR'10,NewYork,NY,USA,ACM(2010)411–41811.Surdeanu,M.,Ciaramita,M.,Zaragoza,H.:Learningtorankanswersonlargeonlineqacollections.In:ACL.(2008)719–72712.Song,Y.I.,Lin,C.Y.,Cao,Y.,Rim,H.C.:Questionutility:Anovelstaticrankingofquestionsearch.In:AAAI.(2008)1231–123613.Sun,K.,Cao,Y.,Song,X.,Song,Y.I.,Wang,X.,Lin,C.Y.:Learningtorecommendquestionsbasedonuserratings.In:CIKM.(2009)751–75814.Jurczyk,P.,Agichtein,E.:Discoveringauthoritiesinquestionanswercommunitiesbyusinglinkanalysis.In:ProceedingsofthesixteenthACMconferenceonConferenceoninforma-tionandknowledgemanagement.CIKM'07,NewYork,NY,USA,ACM(2007)919–92215.Bian,J.,Liu,Y.,Zhou,D.,Agichtein,E.,Zha,H.:Learningtorecognizereliableusersandcontentinsocialmediawithcoupledmutualreinforcement.In:WWW.(2009)51–6016.Lee,C.T.,Rodrigues,E.M.,Kazai,G.,Milic-Frayling,N.,Ignjatovic,A.:Modelforvoterscoringandbestanswerselectionincommunityq&aservices.In:WebIntelligence.(2009)116–12317.Lee,K.,Caverlee,J.,Webb,S.:Uncoveringsocialspammers:socialhoneypots+machinelearning.In:SIGIR.(2010)435–44218.Liu,Y.,Agichtein,E.:You'vegotanswers:Towardspersonalizedmodelsforpredictingsuccessincommunityquestionanswering.In:ACL(ShortPapers).(2008)97–10019.Agichtein,E.,Liu,Y.,Bian,J.:Modelinginformation-seekersatisfactionincommunityquestionanswering.ACMTransactionsonKnowledgeDiscoveryfromData3(2)(April2009)10:1–10:2720.Richardson,M.,White,R.W.:Supportingsynchronoussocialq&athroughoutthequestionlifecycle.In:WWW.(2011)755–76421.Yang,L.,Bao,S.,Lin,Q.,Wu,X.,Han,D.,Su,Z.,Yu,Y.:Analyzingandpredictingnot-answeredquestionsincommunity-basedquestionansweringservices.In:AAAI.(2011)22.Thelwall,M.,Buckley,K.,Paltoglou,G.,Cai,D.,Kappas,A.:Sentimentinshortstrengthdetectioninformaltext.J.Am.Soc.Inf.Sci.Technol.61(12)(December2010)2544–255823.Blei,D.,McAuliffe,J.:Supervisedtopicmodels.InPlatt,J.,Koller,D.,Singer,Y.,Roweis,S.,eds.:AdvancesinNeuralInformationProcessingSystems20.MITPress,Cambridge,MA(2008)24.Draper,N.R.,Smith,H.:AppliedRegressionAnalysis(WileySeriesinProbabilityandStatistics).Thirdedn.Wiley-Interscience(April1998)25.DaumeIII,H.:Frustratinglyeasydomainadaptation.In:Proceedingsofthe45thAnnualMeetingoftheAssociationofComputationalLinguistics,Prague,CzechRepublic,Associ-ationforComputationalLinguistics(June2007)256–26326.Adamic,L.A.,Zhang,J.,Bakshy,E.,Ackerman,M.S.:Knowledgesharingandyahooan-swers:everyoneknowssomething.In:Proceedingsofthe17thinternationalconferenceonWorldWideWeb.WWW'08,NewYork,NY,USA,ACM(2008)665–67427.Dietterich,T.G.,Bakiri,G.:Solvingmulticlasslearningproblemsviaerror-correctingoutputcodes.JournalofArticialIntelligenceResearch2(1995)28.Bibby,J.,Toutenburg,H.:PredictionandImprovedEstimationinLinearModels.JohnWiley&Sons,Inc.,NewYork,NY,USA(1978)29.Provost,F.J.,Fawcett,T.:Analysisandvisualizationofclassierperformance:Comparisonunderimpreciseclassandcostdistributions.In:KDD.(1997)43–4830.Cortes,C.,Mohri,M.:Condenceintervalsfortheareaundertheroccurve.In:NIPS.(2004)