Exposing Inconsistent Web Search Results with Bobble X - PDF document

Exposing Inconsistent Web Search Results with Bobble X
Exposing Inconsistent Web Search Results with Bobble X

Exposing Inconsistent Web Search Results with Bobble X - Description


Snoeren Georgia Institute of Technology University of California San Diego xxing8weiddoozan3feamsterwenke gatechedu snoerencsucsdedu Abstract Given their critical role as gateways to Web content the search results a Web search engine provides to its ID: 62641 Download Pdf

Tags

Snoeren Georgia Institute

Download Section

Please download the presentation from below link :


Download Pdf - The PPT/PDF document "Exposing Inconsistent Web Search Results..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Embed / Share - Exposing Inconsistent Web Search Results with Bobble X


Presentation on theme: "Exposing Inconsistent Web Search Results with Bobble X"— Presentation transcript


mightpreventthemfromseeing.Moreover,personalizationfrequentlyoccurswithouttheuser'sinvolvement—orevenexplicitagreement—sousersmaynotevenbeawarethattheirsearchresultshavebeentailoredaccordingtotheirproleandpreferences.Thegoalofourworkistoexposeandcharacterizeinconsistenciesthatresultfrompersonalization.Inparticular,weseektoquantifytheextenttowhichsearchpersonal-izationalgorithmsreturnresultsthatareinconsistentwiththosethatwouldbereturnedtootherusers,andexposeanydifferencestotheuser—inrealtime.WepresentBobble,aChromeWebbrowserextensionthatallowsuserstoseehowthesearchresultsthatGooglereturnstothemdifferfromtheresultsthatarereturnedtootherusers.Bobblecapturesauser'ssearchqueryandreissuesitfromasubsetofover300world-widevantagepoints,includingbothdedicatedPlanetLabmeasurementnodesandthehostsofotherconsentingBobbleusers.Incontrasttoresearchtoolsthathavebeendevelopedtomeasuresearchpersonalizationofine[5],weintenduserstouseBobblewhiletheybrowsetheWeb,providingthemcriticalinsightintohowtheironlineexperienceisbeingpotentiallydistortedbypersonalization.TounderstandthenatureoftheinconsistenciesuncoveredbyBobble,westudymorethan75,000realsearchqueriesissuedbyhundredsofBobbleusersoverninemonths.Wequantifytheextenttowhichpersonalizationaffectssearchresultsanddeterminehowusers'Googlesearchresultsvarybasedonfactorsrangingfromtheirgeographiclocationstotheirpastsearchhistories.OurstudystudyfocusesexclusivelyonGooglesearch,oneofthemorewidelyusedsearchengines,butweexpectthatsimilarphenom-enaexistforotherpopularsearchengines.Wendthat98%ofGoogleWebsearchesreturnatleastonesetofinconsistentsearchresults—typicallyfromavantagepointinadifferentgeographicregionthantheuser,eventhoughBobbleperformsthesesearcheswithoutexposinganyinformationthatlinkstothesearchers'Googleproles.Insum,ourstudyprovidestherstlarge-scaleglimpseintothenatureofinconsis-tentresultsthatarisefromsearchpersonalizationandopensmanyavenuesforfutureresearch.Wequantifyonhowgeographyandsearchhistorymayinuencesearchre-sults,butothershavenotedthatmanyotherfactors(e.g.,devicetype,timeofday)mayalsoaffecttheresultsthatauserseesforagivensearchterm[5].Bobblehasbeendeployedandpubliclyavailablefor21months;usersandresearcherscanextendittomeasurehowotherfactorsmightinduceinconsistenciesinsearchresults.2RelatedWorkResearchershavepreviouslystudiedmeanstopersonalizeWebsearchresults.Douetal.performedalarge-scaleevaluationandanalysisofvepersonalizedsearchalgorithmsusingatwelve-dayMSNquerylog[2].Theyndthatprole-basedpersonalizationalgorithmsaresometimesunstable.Teevanetal.conductauserstudytoinvestigatethevalueofpersonalizedWebsearch[11].Incontrast,wearelessinterestedinthedistinctionbetweendifferentpersonalizationmethods,andfocusinsteadontheeffectsofasinglesearchpersonalizationalgorithm.Weaimtoquantifytheeffectsofdifferentpersonalizationfactorsonsearchinconsistency.Inacontemporaneousstudy,Hannaketal.measurethepersonalizationofGooglesearch.Thebulkoftheireffortfocusesonunderstandingthefeaturesleadingtoperson- withsamebrowser withChromeagent p-value Windows 11/1,000 16/1,000 0.1725 Linux 23/1,000 21/1,000 0.7517 Mac 15/1,000 15/1,000 1.0 Table1:Thenumberoftermsthatgenerateinconsistentsetsofsearchresultswhensearching1,000distincttermsfromChromebrowsers/agentondifferentOSes.theBobbleserverforpendingsearchterms(Step3)andreissuethemlocallyassearchqueriestoGooglewithoutsigningintoaGoogleaccountorrevealingGoogleatrackablebrowsercookie(Step4).EachagentpushestheresultsitreceivesfromGoogletotheBobbleserver.Toestablishabaselineforcomparinginconsistenciesinsearchresults,wewouldideallyliketoalsoreissuetheuser'squerylocallyfromaseparatebrowsersessionthatisnotsignedintoGoogleanddoesnotpasssessioncookiestoGoogle.Wecalltheseanonymousqueries“organic”,astheyareasfreeaspossiblefromuser-specicinuences(incontrasttoqueriesthatareissuedwhenauserisloggedinorpassingbrowsercookiestoGoogle).Unfortunately,collectingtrueorganicresultsischalleng-ingduetothetechnicalandusabilityobstaclessurroundingloggingtheuseroutinordertoissuesuchaqueryfromanextensionrunningwithinthesameWebbrowser.In-stead,BobblecollectsorganicsearchresultsbyissuingaduplicatequeryfromanearbyChromebrowseragent.(Section3.2presentsadetaileddiscussionoftheeffectsofusinganearbyagenttostand-infortheuser'sbrowser.)3.2ValidationToevaluatewhetherBobbleaccuratelyreportsresultsthatregularuserswouldactuallyreceive,werstvalidatethatBobble'sChromebrowseragentcorrectlyemulatesmajorversionreleasesofChromebrowsers—specically,thattheresultsreturnedtoaBobbleagentreectthosethatwouldbereturnedtoanactualqueryissuedbyauserinherWebbrowser.Second,wemeasuretheeffectsofcollectingorganicsearchresultsindirectlybyissuingqueriesfromnearbyagentsasopposedtoinsidetheuser'sbrowser.DoBobbleagentsemulatebrowserbehavior?WebeginbyensuringthattheGooglesearchresultscollectedusingtheChromebrowseragentdonotdifferstatisticallyfromtheresultsobtainedwhenthequeryisissuedfromtheGooglehomepageviewedwiththeChromebrowseritself.Werandomlyselect1,000uniquesearchtermsfromthedailytop-20GoogletrendingsearchtermsbetweenAugust2011andDecember2011andsearcheachofthesetermsthreetimesfrommachinesrunningLinux,Windows,andMacoperatingsystems.Oneachmachine,werunaChromebrowseragentandtwoGoogleChromebrowserswiththesamereleaseversion.WeusetheSeleniumChromedriver[9]toautomatethetwoChromebrowsersandonebrowseragenttoperformthesameGooglesearchsimultaneously.OnemightexpectthatsimultaneouslyissuedqueriesfromidenticalWebbrowserswouldreturnidenticalsetsofresults,sincethequeriesdonotinvolveanysearchhis-toryandareissuedfromthesamelocationatessentiallythesametime.Whilethisexpectationgenerallyringstrue,itisnotalwaysthecase.Table1showsthenumberof Fig.2:CDFplot:thedistributionofthenumberofsearchqueries. Fig.3:Thedistributionofthenumberofsearchquerieswhensendingqueriestogoogle.comandaGoogleIPaddress,re-spectively.searchresults.ThisresultindicatesthatorganicsearchresultsofmostGooglesearchqueriesaretailoredonthebasisofthelocationwherethesesearchesareperformed,eventhoughGoogleusersneithersignintotheiraccountsnoruncovertheirbrowsercookiestoGooglepersonalizedsearchservices.Inthefollowingsection,wefurtherdesignacarefulexaminationtoexplorewhethertheobservedsearchinconsistencyresultsfromlocation-basedpersonalizationratherthandatadiversityacrossdifferentGoogledatacenters.Toquantifytheeffectofgeographiclocationonsearchinconsistency,weclassiedtheinconsistentsearchresultsinthreeways:–Atleastonesearchresultappearsinthetop-threesearchresultsofotherPlanetLabnodesbutnotatallinaGoogleuser'sorganicsearchresultset.Wendthat23,394outof76,307searchqueries(30.66%)giverisetothissituation.–Atleastonesearchresultsappearsinthetop-10(butnottop-3)searchresultsofotherPlanetLabnodes,butdoesnotappearinaGoogleuser'sorganicsearchresultset;65,939outof76,307searchqueries(86.41%)tthissituation.–AtleastonesearchresultappearsintheGoogleuser'sorganicsearchresultsetbutdoesnotappearinsearchresultsofotherPlanetLabnodes;1,434searchqueriesoutof76,307searchqueries(1.88%)tthissituation.Consideringthefactthatthetop-10Googlesearchresultsreceiveabout90%ofclicksandthetop-3Googlesearchresultsusuallyreceivethemostattention[10],thein-consistencythatarisesduetolocationlikelyhassignicantimplicationsforauser'sexperience.5.2DistributedindexinconsistenciesTovalidatetheobservedsearchinconsistencyisinfactderivedfromlocation-basedpersonalizationratherthandatadiversityacrossdifferentdatacenters,weconductanexperiment.Inparticular,wemodifyBobbletoattempttoisolatetheinconsistencycon-tributedbylocation-basedpersonalizationfromthatcontributedbyinconsistenciesin Fig.4:%ofsearchresultschangedateachrank. Signed-indataset Signed-outdataset Location Prole Location Prole 97.64% 64.19% 97.80% 58.77% Table2:Howlocationanduserprolecontributetosearchinconsistency.Loca-tionhasmoreeffectoninconsistencythansearchhistorydoes.etal.(seeFigure5inpreviouswork[5]).Onepossiblereasonforthisdiscrepancyisthedifferenceinthemeasurementmethod.PreviousworkrecruiteddifferntGoogleuserstosearchthesamesetofkeywords,wherethekeywordswerechosensuchthattheyweredeemedtonotberelatedtouserproles.Incontrast,weperformourstudyinamorenaturalsettingbecauseitmeasurestheinuenceoftheprole-basedpersonaliza-tionusingeachuser'sownsearchqueries.Becauseauser'spastqueriesaretypicallyrelevanttopersonalizationthatmayoccurinthefuture,weobservethatprole-basedpersonalizationhasmoreinuenceonGoogleusers'searchresults.Inadditiontoinconsistenciesinthesearchresultsets,wealsodiscoveredthefol-lowinginconsistencies:–Forsigned-inusers,22,405outof66,138searchqueries(33.88%)haveatleastonesearchresultthatshowsintheprole-basedpersonalizedsearchresultsetbutnotintheorganicsearchresultset.–Foranonymoususers,3,148outof10,169searchqueries(30.96%)haveatleastonesearchresultthatshowsintheprole-basedpersonalizedsearchresultsetbutnotintheorganicsearchresultset.–Forsigned-inusers,7,352outof66,138searchqueries(11.12%)haveatleastonesearchresultthatshowsinthetop3oftheorganicsearchresultsetbutnotintheproled-basedpersonalizedsearchresultset.–Foranonymoususers,1,484outof10,169searchqueries(14.59%)haveatleastonesearchresultthatshowsinthetop3oforganicsearchresultbutnotintheproled-basedpersonalizedsearchresultset.Table2alsoshowsthattheGooglesearchinconsistenciesresultingfromsigned-inusers'prolesarestrongerthanthoseresultingfromsigned-outusers'proles.Finally,wealsoobservelocation-basedfactorsintroducemoreinconsistenciesthanprole-basedfactorsdo.7ConclusionWehavedesigned,implemented,anddeployedBobble,adistributedsystemthattracksandmonitorstheinconsistencyofsearchresultsforusersearchqueries.UsingBob-ble,wecollectusersearchtermsandresultsandmeasurethesearchinconsistencythatarisefrombothgeographiclocationandsearchhistory.Wendthatthegeographic

Shom More....