/
Exposing Inconsistent Web Search Results with Bobble X Exposing Inconsistent Web Search Results with Bobble X

Exposing Inconsistent Web Search Results with Bobble X - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
443 views
Uploaded On 2015-05-07

Exposing Inconsistent Web Search Results with Bobble X - PPT Presentation

Snoeren Georgia Institute of Technology University of California San Diego xxing8weiddoozan3feamsterwenke gatechedu snoerencsucsdedu Abstract Given their critical role as gateways to Web content the search results a Web search engine provides to its ID: 62641

Snoeren Georgia Institute

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Exposing Inconsistent Web Search Results..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

mightpreventthemfromseeing.Moreover,personalizationfrequentlyoccurswithouttheuser'sinvolvement—orevenexplicitagreement—sousersmaynotevenbeawarethattheirsearchresultshavebeentailoredaccordingtotheirproleandpreferences.Thegoalofourworkistoexposeandcharacterizeinconsistenciesthatresultfrompersonalization.Inparticular,weseektoquantifytheextenttowhichsearchpersonal-izationalgorithmsreturnresultsthatareinconsistentwiththosethatwouldbereturnedtootherusers,andexposeanydifferencestotheuser—inrealtime.WepresentBobble,aChromeWebbrowserextensionthatallowsuserstoseehowthesearchresultsthatGooglereturnstothemdifferfromtheresultsthatarereturnedtootherusers.Bobblecapturesauser'ssearchqueryandreissuesitfromasubsetofover300world-widevantagepoints,includingbothdedicatedPlanetLabmeasurementnodesandthehostsofotherconsentingBobbleusers.Incontrasttoresearchtoolsthathavebeendevelopedtomeasuresearchpersonalizationofine[5],weintenduserstouseBobblewhiletheybrowsetheWeb,providingthemcriticalinsightintohowtheironlineexperienceisbeingpotentiallydistortedbypersonalization.TounderstandthenatureoftheinconsistenciesuncoveredbyBobble,westudymorethan75,000realsearchqueriesissuedbyhundredsofBobbleusersoverninemonths.Wequantifytheextenttowhichpersonalizationaffectssearchresultsanddeterminehowusers'Googlesearchresultsvarybasedonfactorsrangingfromtheirgeographiclocationstotheirpastsearchhistories.OurstudystudyfocusesexclusivelyonGooglesearch,oneofthemorewidelyusedsearchengines,butweexpectthatsimilarphenom-enaexistforotherpopularsearchengines.Wendthat98%ofGoogleWebsearchesreturnatleastonesetofinconsistentsearchresults—typicallyfromavantagepointinadifferentgeographicregionthantheuser,eventhoughBobbleperformsthesesearcheswithoutexposinganyinformationthatlinkstothesearchers'Googleproles.Insum,ourstudyprovidestherstlarge-scaleglimpseintothenatureofinconsis-tentresultsthatarisefromsearchpersonalizationandopensmanyavenuesforfutureresearch.Wequantifyonhowgeographyandsearchhistorymayinuencesearchre-sults,butothershavenotedthatmanyotherfactors(e.g.,devicetype,timeofday)mayalsoaffecttheresultsthatauserseesforagivensearchterm[5].Bobblehasbeendeployedandpubliclyavailablefor21months;usersandresearcherscanextendittomeasurehowotherfactorsmightinduceinconsistenciesinsearchresults.2RelatedWorkResearchershavepreviouslystudiedmeanstopersonalizeWebsearchresults.Douetal.performedalarge-scaleevaluationandanalysisofvepersonalizedsearchalgorithmsusingatwelve-dayMSNquerylog[2].Theyndthatprole-basedpersonalizationalgorithmsaresometimesunstable.Teevanetal.conductauserstudytoinvestigatethevalueofpersonalizedWebsearch[11].Incontrast,wearelessinterestedinthedistinctionbetweendifferentpersonalizationmethods,andfocusinsteadontheeffectsofasinglesearchpersonalizationalgorithm.Weaimtoquantifytheeffectsofdifferentpersonalizationfactorsonsearchinconsistency.Inacontemporaneousstudy,Hannaketal.measurethepersonalizationofGooglesearch.Thebulkoftheireffortfocusesonunderstandingthefeaturesleadingtoperson- withsamebrowser withChromeagent p-value Windows 11/1,000 16/1,000 0.1725 Linux 23/1,000 21/1,000 0.7517 Mac 15/1,000 15/1,000 1.0 Table1:Thenumberoftermsthatgenerateinconsistentsetsofsearchresultswhensearching1,000distincttermsfromChromebrowsers/agentondifferentOSes.theBobbleserverforpendingsearchterms(Step3)andreissuethemlocallyassearchqueriestoGooglewithoutsigningintoaGoogleaccountorrevealingGoogleatrackablebrowsercookie(Step4).EachagentpushestheresultsitreceivesfromGoogletotheBobbleserver.Toestablishabaselineforcomparinginconsistenciesinsearchresults,wewouldideallyliketoalsoreissuetheuser'squerylocallyfromaseparatebrowsersessionthatisnotsignedintoGoogleanddoesnotpasssessioncookiestoGoogle.Wecalltheseanonymousqueries“organic”,astheyareasfreeaspossiblefromuser-specicinuences(incontrasttoqueriesthatareissuedwhenauserisloggedinorpassingbrowsercookiestoGoogle).Unfortunately,collectingtrueorganicresultsischalleng-ingduetothetechnicalandusabilityobstaclessurroundingloggingtheuseroutinordertoissuesuchaqueryfromanextensionrunningwithinthesameWebbrowser.In-stead,BobblecollectsorganicsearchresultsbyissuingaduplicatequeryfromanearbyChromebrowseragent.(Section3.2presentsadetaileddiscussionoftheeffectsofusinganearbyagenttostand-infortheuser'sbrowser.)3.2ValidationToevaluatewhetherBobbleaccuratelyreportsresultsthatregularuserswouldactuallyreceive,werstvalidatethatBobble'sChromebrowseragentcorrectlyemulatesmajorversionreleasesofChromebrowsers—specically,thattheresultsreturnedtoaBobbleagentreectthosethatwouldbereturnedtoanactualqueryissuedbyauserinherWebbrowser.Second,wemeasuretheeffectsofcollectingorganicsearchresultsindirectlybyissuingqueriesfromnearbyagentsasopposedtoinsidetheuser'sbrowser.DoBobbleagentsemulatebrowserbehavior?WebeginbyensuringthattheGooglesearchresultscollectedusingtheChromebrowseragentdonotdifferstatisticallyfromtheresultsobtainedwhenthequeryisissuedfromtheGooglehomepageviewedwiththeChromebrowseritself.Werandomlyselect1,000uniquesearchtermsfromthedailytop-20GoogletrendingsearchtermsbetweenAugust2011andDecember2011andsearcheachofthesetermsthreetimesfrommachinesrunningLinux,Windows,andMacoperatingsystems.Oneachmachine,werunaChromebrowseragentandtwoGoogleChromebrowserswiththesamereleaseversion.WeusetheSeleniumChromedriver[9]toautomatethetwoChromebrowsersandonebrowseragenttoperformthesameGooglesearchsimultaneously.OnemightexpectthatsimultaneouslyissuedqueriesfromidenticalWebbrowserswouldreturnidenticalsetsofresults,sincethequeriesdonotinvolveanysearchhis-toryandareissuedfromthesamelocationatessentiallythesametime.Whilethisexpectationgenerallyringstrue,itisnotalwaysthecase.Table1showsthenumberof Fig.2:CDFplot:thedistributionofthenumberofsearchqueries. Fig.3:Thedistributionofthenumberofsearchquerieswhensendingqueriestogoogle.comandaGoogleIPaddress,re-spectively.searchresults.ThisresultindicatesthatorganicsearchresultsofmostGooglesearchqueriesaretailoredonthebasisofthelocationwherethesesearchesareperformed,eventhoughGoogleusersneithersignintotheiraccountsnoruncovertheirbrowsercookiestoGooglepersonalizedsearchservices.Inthefollowingsection,wefurtherdesignacarefulexaminationtoexplorewhethertheobservedsearchinconsistencyresultsfromlocation-basedpersonalizationratherthandatadiversityacrossdifferentGoogledatacenters.Toquantifytheeffectofgeographiclocationonsearchinconsistency,weclassiedtheinconsistentsearchresultsinthreeways:–Atleastonesearchresultappearsinthetop-threesearchresultsofotherPlanetLabnodesbutnotatallinaGoogleuser'sorganicsearchresultset.Wendthat23,394outof76,307searchqueries(30.66%)giverisetothissituation.–Atleastonesearchresultsappearsinthetop-10(butnottop-3)searchresultsofotherPlanetLabnodes,butdoesnotappearinaGoogleuser'sorganicsearchresultset;65,939outof76,307searchqueries(86.41%)tthissituation.–AtleastonesearchresultappearsintheGoogleuser'sorganicsearchresultsetbutdoesnotappearinsearchresultsofotherPlanetLabnodes;1,434searchqueriesoutof76,307searchqueries(1.88%)tthissituation.Consideringthefactthatthetop-10Googlesearchresultsreceiveabout90%ofclicksandthetop-3Googlesearchresultsusuallyreceivethemostattention[10],thein-consistencythatarisesduetolocationlikelyhassignicantimplicationsforauser'sexperience.5.2DistributedindexinconsistenciesTovalidatetheobservedsearchinconsistencyisinfactderivedfromlocation-basedpersonalizationratherthandatadiversityacrossdifferentdatacenters,weconductanexperiment.Inparticular,wemodifyBobbletoattempttoisolatetheinconsistencycon-tributedbylocation-basedpersonalizationfromthatcontributedbyinconsistenciesin Fig.4:%ofsearchresultschangedateachrank. Signed-indataset Signed-outdataset Location Prole Location Prole 97.64% 64.19% 97.80% 58.77% Table2:Howlocationanduserprolecontributetosearchinconsistency.Loca-tionhasmoreeffectoninconsistencythansearchhistorydoes.etal.(seeFigure5inpreviouswork[5]).Onepossiblereasonforthisdiscrepancyisthedifferenceinthemeasurementmethod.PreviousworkrecruiteddifferntGoogleuserstosearchthesamesetofkeywords,wherethekeywordswerechosensuchthattheyweredeemedtonotberelatedtouserproles.Incontrast,weperformourstudyinamorenaturalsettingbecauseitmeasurestheinuenceoftheprole-basedpersonaliza-tionusingeachuser'sownsearchqueries.Becauseauser'spastqueriesaretypicallyrelevanttopersonalizationthatmayoccurinthefuture,weobservethatprole-basedpersonalizationhasmoreinuenceonGoogleusers'searchresults.Inadditiontoinconsistenciesinthesearchresultsets,wealsodiscoveredthefol-lowinginconsistencies:–Forsigned-inusers,22,405outof66,138searchqueries(33.88%)haveatleastonesearchresultthatshowsintheprole-basedpersonalizedsearchresultsetbutnotintheorganicsearchresultset.–Foranonymoususers,3,148outof10,169searchqueries(30.96%)haveatleastonesearchresultthatshowsintheprole-basedpersonalizedsearchresultsetbutnotintheorganicsearchresultset.–Forsigned-inusers,7,352outof66,138searchqueries(11.12%)haveatleastonesearchresultthatshowsinthetop3oftheorganicsearchresultsetbutnotintheproled-basedpersonalizedsearchresultset.–Foranonymoususers,1,484outof10,169searchqueries(14.59%)haveatleastonesearchresultthatshowsinthetop3oforganicsearchresultbutnotintheproled-basedpersonalizedsearchresultset.Table2alsoshowsthattheGooglesearchinconsistenciesresultingfromsigned-inusers'prolesarestrongerthanthoseresultingfromsigned-outusers'proles.Finally,wealsoobservelocation-basedfactorsintroducemoreinconsistenciesthanprole-basedfactorsdo.7ConclusionWehavedesigned,implemented,anddeployedBobble,adistributedsystemthattracksandmonitorstheinconsistencyofsearchresultsforusersearchqueries.UsingBob-ble,wecollectusersearchtermsandresultsandmeasurethesearchinconsistencythatarisefrombothgeographiclocationandsearchhistory.Wendthatthegeographic