/
effortandbetterutilizingthehumaneditorialexpertstofocusoncategorizingd effortandbetterutilizingthehumaneditorialexpertstofocusoncategorizingd

effortandbetterutilizingthehumaneditorialexpertstofocusoncategorizingd - PDF document

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
358 views
Uploaded On 2016-05-06

effortandbetterutilizingthehumaneditorialexpertstofocusoncategorizingd - PPT Presentation

classi ID: 307138

classi

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "effortandbetterutilizingthehumaneditoria..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

effortandbetterutilizingthehumaneditorialexpertstofocusoncategorizingdifÞcultads.Theadimageandlandingpagefeaturesextractedinthisadcategorizationsystemcanalsobeusedtoim-provethematchingandrankingstepsofadselectionalgorithmsindisplayadservingsystems.Welearnmultipleone-versus-restSVMmodelstocategorizethe classiÞcation;I.4.9[ImageProcessingandComputerVision]:ApplicationsGeneralTermsDisplayAdCategorizationKeywordsDisplayAdvertising,Multi-ClassclassiÞ andleveragingthemforuseincategorization.SpeciÞcally,wepresentresultsfromtrainingmachinelearnedcategorizationmod-elsusingthefeaturesextractedfromthetextinthedisplayadsthemselvesalongwithotherfeaturesfromtheweb-pageofthead-vertiserthattheuserisre-directedtowhenclickingthead(alsoknownasthelandingpageofthead).Anautomatedmachinelearnedcategorizationsystemtocatego-rizedisplayadshasmultipleusesincluding:¥ImprovetheefÞciencyofmanualcategorizationbysuggest-ingalistofplausiblecategoriesfromwhichthehumanedi-torscanchoosethebestsetofcategories.¥BackÞllthecategoriesofadswhichthehumaneditorsarenotabletoclassifyontime.Thisisespeciallyusefulforadsfromadvertisingcampaignsofshortdurationwherethecampaignsmayevenendbythetimetheadspopupintheadcategorizationeditorialqueue.Betterutilizethehumaneditorialjudgmentsbyclassifyingtheadsthatareeasytocategorizeusingthetoolitself,andrequesthumaneditorialhelponlyforthoseadsthataredifÞ-culttocategorizeinanautomatedmanner.¥Thefeaturesfromtheadimagecreativesandtheirlandingpagescanbeusedasachannelofattributesinmatchingandrankingalgorithmsusedinad-selection.Theoutlineofthepaperisasfollows.Section2describesthedatasetandthefeaturesusedtotrainthecategorizationmodelstopredictthecategoryofthedisplayad.Section3discussestheex-perimentsetup,thetrainingpipelineandtheevaluationmetrics.Section4presentstheperformanceresultsoftheadcategorizationmodelsandSection5concludeswithsomeideasforfutureworktoimprovethecategorizationmodels.2.DISPLAYADCATEGORIZATION2.1DatasetWeanalyzedarandomsampleofdisplayadsfromthedisplayadcampaigndatabaseofalargeadnetworkwhichcontainstheURLsofthedisplayadimagesandtheURLsofthelandingpages,alongwithotherattributesoftheadvertiser.Thesedisplayadswereusedinasampleofdisplayadcampaignsthathaveruninthepastontheadnetwork.However,sincethedisplayadimagesandland-ingpageswerenotsaved,wecrawledboththeimageandlandingpageURLstoextracttheimagecreativesandthecontentsofthelandingpages.Someoftheadscontainedmultipleimages.Forexample,Flashoranimatedadstypicallyalsohadacorrespond-ingstaticimagecreativeinordertodisplayonbrowsersthatdonotsupportFlashoranimation.Similarly,someadshadmultiplelandingpages,dependingonwhichpartoftheadisclicked.Ontheotherhand,somecampaignshadadsservedbythirdpartyadserversinwhichcasetheimageURLsweresinglepixelsthatre-directedtootheradserversandwerethereforenolongeravailable,orhadimageURLswhichwerenotvalidanylongerandhencedidnotreturnvalidadimagecreativesduringthecrawl.Inaddition,someofthelandingpageURLsareJavascriptsthatgeneratedthe adswithmanuallylabeledcategories1500192100Imagesonlyadswithonlyimages18916012.6LandingPageonlyadswithonlylandingpages29292119.5BothImagesandLandingPageadswithbothimagesandlandingpages50304133.5NeitherImagesnorLandingPageadswithneitherimagesandlandingpages51607034.4 0 100000 200000 300000 400000 500000 6000005M1M500K100K50K10K5K1K500100500Number of AdsImage Area RangesDistribution of Image Area RangesFigure1:Distributionofpixelareasofthecreativewiththelargestareaforeachad.Thespikesat0pixeland1pixelcorre-spondtoadsthatdidnothaveavalidimagecreativefromthecrawl,andimagesthatweresinglepixelsusedtore-directtheadcallstothirdpartyadservers.(iii)contentfeaturesextractedfromthetextinthebodyoftheland-ingpageofthead,and(iv)featuresextractedfromonlythetextinthetitleandthemetatagofthelandingpage.Weelaboratemoreoneachofthesefourfeaturetypesbelow.2.2.1AdvertiserCategoryFeatureThisfeatureisthecategoryoftheadvertiserfromthesametax-onomyintowhichtheadsarecategorized.Theseadvertisercat-egoriesareassignedbyhumaneditorsdependingontheindustryoftheadvertiser.Ofthetotal1,501,192categorizedads,848,786(56%)alsohaveanadvertisercategory.Lessthanhalfofthecate-gorizedadsdonothaveacorrespondingadvertisercategory.Amongtheadswithadvertisercategories,thereisnooverlapbetweentheadvertiserandadcategoriesinaboutone-thirdoftheads.Hence,inmorethantwo-thirdsoftheadstheadvertisercategorycannotbeusedtoreliablycategorizetheads.However,whentheadvertisercategoryisavailable,theadvertisercategoryandtheadcategoryare 0 200000 400000 600000 800000 1e+06 1.2e+065M1M500K100K50K10K5K1K500100500Number of AdsFilesize RangesDistribution of Ad FilesizesFigure2:Distributionofsizesofthelargestlandingpageforeachad.Thespikeat0bytescorrespondstoadswherethecrawlwasnotsuccessfulinretrievingthecontentsofthelang-ingpage.identicalinaboutone-thirdoftheadsandonesetofcategoriesisasubsetoftheothersetinabout40%ofthesecases.Theadvertisercategorymaynotcoveralltheproductsforwhichtheadvertiser the,of wassettooneand :TP/(TP+FP),Recallatoperationpoint:TP/(TP+FN),F1scoreatoperationpoint,andmax-F1score,themaximumF1scoreachievedforanypointalongthePrecision-Recallcurve.TParethetruepositivesattheoperationthreshold,TNthetruenegatives,FPthefalsepositives,andFNthefalsenegatives.AccuracyandAUCarethetwomostwidelyexploredmeasuresthroughouttheinformationretrievallit- ExperimentAUCMax-F1 Cruiseshotels ed.However,theydonottrytoinferacorrelationbetweenthetopicalcategoryoftheadanditslandingpage.Instead,herewedemonstratelandingpage(augmentedwithOCRfeatures)cat-egorizationwithinamuchlargerinteresttaxonomyofover1000nodeswhichisasigniÞcantlyhardertask,butprovidesuswithatopicalcategoryforthead.Thisisessentialforapplyingsuccess-fullysubsequentadvertisingstrategiessuchascontentmatchingorbehavioraltargeting.Choietal.[7]alsostudytheeffectofland-ingpagefeaturesinimprovingadrelevanceintextualadvertising.TheyshowthataugmentingtheadtextualfeatureswithfeaturesfromthecontentofthepageincreasesmetricssuchastheDis-countedCumulativeGain(DCG)whencomputingtherelevance terparts.Furthermore,pagesusedforplacingcontextualadsalsotypicallycontainrichandcleancontentsuchasÞnancialwebsitesandnewsarticles.Landingpageareusuallysparserincontentandoftencontainpoorercontentquality,containinglinkstootherwebsites,multipleimagesorlinksofferingotherproductswithdiverseanduncertainorigin.Inastudyofdisplayadcategorization,Edelman[10]discussestheproblemofbrandsafetyorsensitivity[15]attheRightMe-diaadexchange.Advertiserstherearerequiredtomarktheiradsintoseveralcategories,suchassuggestive,violent,deceptive,andpublishersthendecidewhethersuchadsaresafetoshowontheircontent.Asadvertisersdonothaveaclearunderstandingofthesecategories,theyoftenendupmanuallylabelingtheiradsintocate-gorieswhichpreventthemfrombeingdisplayedeffectively.There- rstattemptatusingOCRfea-turestoautomaticallycategorizedisplayadsonalargescale.WearecurrentlyworkingonenhancingthequalityandcoverageofthecategorizationmodelsbypostprocessingthenoisyOCRfeatures(correctingthemusingeditdistancemetrictothenearestwordinadictionary),performingfeatureselectiontoreducethedimension-alityoffeaturesbyeitherretainingfeatureswithhighersupportorfeatureswiththehighesttf-idfvaluesinthecorpus,andexperi-mentingwiththecategoryofthelandingpageitselfasafeaturebothinsteadofandinadditiontotheLPandContentfeatures.Wearealsoexperimentingwithusingdifferentmodelingtechniques,including(i)trainingmodelsusingdifferentfeaturesetsinisola-tion,andcombiningtheresultsusingvotingmethods,(ii)usingunlabeleddatainasemi-supervisedsettingtoincreasethevolumeoflabeleddata.Finally,encouragedbytheinitialresultsfromthisexperimentofusingimagefeaturestotraincategorizationmodels,weareexperimentingwithmoreadvancedimagefeaturesusingcomputervisiontechniquesincludingthetexture,andwhetheranimagecontainsafaceornot.7.REFERENCES[1]http://code.google.com/p/tesseract-ocr/.[2]http://www.imagemagick.org/script/index.php/.[3]HilaBecker,AndreiBroder,EvgeniyGabrilovich,VanjaJosifovski,andBoPang.Contexttransferinsearchadvertising.InSIGIRÕ09:Proc.of32ndInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages656Ð657,2009.[4]HilaBecker,AndreiBroder,EvgeniyGabrilovich,VanjaJosifovski,andBoPang.Whathappensafteranadclick?:quantifyingtheimpactoflandingpagesinwebadvertising.InCIKMÕ09:Proc.ofthe18thACMConferenceonInformationandKnowledgeManagement,pages57Ð66,2009.[5]RonBekkermanandJamesAllan.Usingbigramsintextcategorization,2003.[6]AndreiBroder,MarcusFontoura,VanjaJosifovski,andLanceRiedel.Asemanticapproachtocontextualadvertising.InSIGIRÕ07:Proc.ofthe30thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages559Ð566,NewYork,NY,USA,2007.ACM.[7]YejinChoi,MarcusFontoura,EvgeniyGabrilovich,VanjaJosifovski,MauricioMediano,andBoPang.Usinglandingpagesforsponsoredsearchadselection.InWWWÕ10:Proceedingsofthe19thinternationalconferenceonWorld 23rdInternationalConferenceonMachineLearning

Related Contents


Next Show more