/
Contin uousl Adaptive Contin uous Queries ver Streams Sam uel Madden Mehul Shah Joseph Contin uousl Adaptive Contin uous Queries ver Streams Sam uel Madden Mehul Shah Joseph

Contin uousl Adaptive Contin uous Queries ver Streams Sam uel Madden Mehul Shah Joseph - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
484 views
Uploaded On 2015-03-10

Contin uousl Adaptive Contin uous Queries ver Streams Sam uel Madden Mehul Shah Joseph - PPT Presentation

Hellerstein Vija yshankar Raman UC Ber ele IBM Almaden Research Center madden mashah jmh csberk ele edu ra vijayusibmcom ABSTRA CT present continuously adapti e continuous query CA CQ im plementation based on the eddy query processing frame ork sho ID: 43571

Hellerstein Vija yshankar Raman

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Contin uousl Adaptive Contin uous Querie..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ContinuouslyAdaptiveContinuousQueriesoverStreamsSamuelMadden,MehulShah,JosephM.HellersteinVijayshankarRamanUCBerkeleyIBMAlmadenResearchCentermadden,mashah,jmh@cs.berkeley.eduravijay@us.ibm.comABSTRACTWepresentacontinuouslyadaptive,continuousquery(CACQ)im-plementationbasedontheeddyqueryprocessingframework.Weshowthatourdesignprovidessignicantperformancebenetsoverexistingapproachestoevaluatingcontinuousqueries,notonlybe-causeofitsadaptivity,butalsobecauseoftheaggressivecross-querysharingofworkandspacethatitenables.Bybreakingtheabstractionofsharedrelationalalgebraexpressions,ourTelegraphCACQimplementationisabletosharephysicaloperators–bothselectionsandjoinstate–ataverynegrain.Weaugmentthesefeatureswithagrouped-lterindextosimultaneouslyevaluatemul-tipleselectionpredicates.Weincludemeasurementsoftheper-formanceofourcoresystem,alongwithacomparisontoexistingcontinuousqueryapproaches.1.INTRODUCTIONTraditionalqueryprocessorsutilizearequest-responseparadigmwhereauserposesalogicalqueryagainstadatabaseandaqueryengineprocessesthatquerytogenerateaniteanswerset.Re-cently,therehasbeeninterestinthecontinuousqueryparadigm,inwhichusersregisterlogicalspecicationsofinterestoverstreamingdatasources,andacontinuousqueryengineltersandsynthesizesthedatasourcestodeliverstreaming,unboundedresultstousers(e.g.,[13,3]).Anaspectofcontinuousqueryprocessingthathasbeenoverlookedintheliteraturetodateistheneedforadaptivitytochange:unboundedquerieswill,bydenition,runlongenoughtoexperiencechangesinsystemanddatapropertiesaswellassystemworkloadduringtheirrun.Acontinuousqueryengineshouldadaptgracefullytothesechanges,inordertoensureefcientprocessingovertime.Withthismotivationinmind,weusedtheTelegraphadaptivedataowengine[8]asaplatformforacontinuousqueryengine;inthispaperwediscussourcontinuousqueryimplementation.Weshowhowtheeddy[1],acontinuouslyadaptivequeryprocessingoperator,canbeappliedtocontinuousqueries.Ourarchitecture,whichwedubContinuouslyAdaptiveContinuousQueries(CACQ),ThisworkhasbeensupportedinpartbytheNationalSci-enceFoundationunderITR/IISgrant0086057andITR/SIgrant0122599,byDARPAundercontractN66001-99-2-8913,andbyIBM,Microsoft,Siemens,andtheUCMICROprogram.WorkdonewhileauthorwasatUCBerkeley.Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.ACMSIGMOD'2002June4­6,Madison,Wisconsin,USACopyright2002ACM1­58113­497­5/02/06...$5.00.offerssignicantperformanceandrobustnessgainsrelativetoex-istingcontinuousquerysystems.Interestingly,ourschemepro-videsbenetseveninscenarioswherenochangeisevident,duetoitsabilitytosharecomputationandstorageacrossqueriesmoreaggressivelythanearlierapproachesthatusedstaticqueryplans.Ourinterestincontinuousqueriesaroseinthecontextofourworkonhandlingstreamsofdatafromsensornetworks[14].Re-searchersfromtheTinyOSandSmartDustprojectsatUCBerke-leyortheOxygenprojectatMIT[9,11,7]predictthatouren-vironmentswillsoonbeteemingwithtensofthousandsofsmall,low-power,wirelesssensors.Eachofthesedeviceswillproduceastreamofdata,andthosestreamswillneedtobemonitoredandcombinedtodetectinterestingchangesintheenvironment.Toclarifythetechniquespresentedinthispaper,weconsiderascenariofromsensornetworks.Oneapplicationofsensornet-worksisbuildingmonitoring,whereavarietyofsensorssuchaslight,temperature,sound,vibration,structuralstrain,andmagneticeldaredistributedthroughoutabuildingtoallowoccupantsandsupervisorsofthatbuildingtomonitorenvironmentalpropertiesorhumanactivity.Forinstance,structuralengineersmightwishtousevibrationsensorstodetectearthquakesandstrainsensorstoassessstructuralintegrity.Employeesmightwishtouselightormotionsensorstotelliftheirbossisinherofce.Buildingman-agerscouldusetemperatureandmotionreadingstoautomaticallyadjustheatingandlighting.Autonomousdevicessuchaslightingsystems,doorlocks,sprinklersystemsandwindowshadescouldregisterqueriestodrivetheirbehavior.Weassumethatforeachdistincttypeofsensorthereisonelogicalsensor-reading“table”ordatasourcethataccumulatesthereadingsfromallthesensorsofthattype.Eachentryinareadingstablecontainsasensorid,atimestamp,andasensorvalue.Inalargeofcebuilding,theremightbeseveralthousandsuchsensorsfeedingdozensoflogicaltables,withthousandsofcontinuousqueries.Thisscenarioillustratestherequirementsofourcontinuousquerysystem:therearenumerouslongrunningqueriesposedoveranum-berofunboundedstreamsofsensorreadings.Assensorreadingsarrive,queriescurrentlyinthesystemmustbeappliedtothem,andupdatestoqueriesmustbedisseminatedtotheuserswhoregisteredthequeries.Userscanposeorcancelqueriesatanytime,sotheop-erationsthatmustbeappliedtoanygiventuplevarydependingonthecurrentsetofqueriesinthesystem.OurCACQdesignincorporatesfoursignicantinnovationsthatmakeitbettersuitedtocontinuousqueryprocessingoverstreamsthanothercontinuousquerysystems.First,weusetheeddyoper-atortoprovidecontinuousadaptivitytothechangingquerywork-load,datadeliveryrates,andoverallsystemperformance.Second,weexplicitlyencodetheworkwhichhasbeenperformedonatuple,itslineage,withinthetuple,allowingoperatorsfrommanyqueriestobeappliedtoasingletuple.Third,weuseanefcientpred- icateindexforapplyingmanydifferentselectionstoasingletu-ple.Finally,wesplitjoinsintounaryoperatorscalledSteMs(StateModules)thatallowpipelinedjoincomputationandsharingofstatebetweenjoinsindifferentqueries.Thenextsectionmotivateseachofthesetechniqueswithspecicexamples.2.CHALLENGESANDCONTRIBUTIONSThechallengeindesigningacontinuousquerysystemistomin-imizetheamountofstorageandcomputationthatisrequiredtosatisfymanysimultaneousqueriesrunninginthesystem.Giventhousandsofqueriesoverdozensoflogicalsources,querieswilloverlapsignicantlyinthedatasourcestheyrequire.Itishighlylikelythatqueriesoverthesamesourcewillcontainselectionpred-icatesoveroverlappingrangesofattributes,orrequestthatthesamepairsofsourcesbejoined.Toefcientlyprocesstheoutstandingqueries,thecontinuousqueryprocessormustleveragethisoverlapasmuchaspossible.Queryprocessingisfurthercomplicatedbythelongrunningnatureofcontinuousqueries:querycostestimatesthatweresoundwhenaquerywasrstposedmaybedeadwrongbythetimethequeryisremovedfromthesystem.InthissectionwediscussthefourmaincontributionsofCACQforaddressingthesechallenges.2.1AdaptingtoLongRunningQueriesToillustratetheproblemsthatcanarisewhenastaticqueryopti-mizerisusedtobuildqueryplansforlongrunningqueries,consideranexamplefromourbuildingmonitoringscenario:manyqueriesmayrequestbuildinglocationswherethelightsareon,sinceil-luminatedroomscorrespondtoareasthatareoccupiedbypeople.Thus,manyquerieswillincludeaselectionpredicatelookingforlightlevelsabovesomethreshold.Duringnormalworkinghours,thispredicateisnotveryselective:mostareasofthebuildingarelit.Thus,astaticqueryoptimizerwouldnormallyplacethepredicatetowardsthetopofthequeryplan.However,atnight,fewlocationsarelit(oroccupied),sothisbecomesaveryselectivepredicatethatshouldbepushedtothebottomoftheplan.Astaticoptimizercan-noteasilychangeitsdecision;itispossiblethattheoptimizercouldbeperiodicallyre-run,butdecidingwhentodosowouldbecom-plicated.Moreover,itisdifcultinatraditionalqueryengine–in-cludingonedesignedforcontinuousqueries–tomodifytheorderofoperationsinaqueryplanwhilethequeryisinight.Eddiescir-cumventthisproblemthroughcontinuousadaptivity:theroutethatatupletakesthroughoperatorsinaqueryisdynamicallychosensothattupleswhicharriveduringworkinghourscanhaveoperatorsappliedinadifferentorderthantuplesthatarriveatnight.Inordertoenablethisexiblerouting,asystemthatuseseddiesbyneces-sityincorporatesqueryprocessingalgorithmsthatareamenabletoin-ightreorderingofoperations[1].Theeddydeterminestheor-derinwhichtoapplyoperatorsbyobservingtheirrecentcostandselectivityandroutingtuplesaccordingly.ThebasicmechanismforcontinuousadaptivityisdiscussedinSection3.1below.2.2ExplicitTupleLineageAsaresultofthereorderingendemictoeddies,thepaththateachtupletakesthroughtheoperators–itslineage–isexplicitlyencodedinthetuple.Differenttuplesaccumulatedifferentlineagesovertime,butintheendaneddyproducesacorrectqueryresult.Notethataqueryprocessingoperatorconnectedtoaneddy–forexample,apipelinedhashjoin[28]–mayprocesstupleswithdif-ferentlineages,dependingonwhetherornottheyhavebeenpre-viouslyroutedthroughselections,otherjoins,etc.Thiscontrastswithsystemsbasedonstaticqueryplans,inwhichthestateofin-termediatetuplesisimplicitinthequeryplan.Queryoperatorsinastaticplanoperateontuplesofasinglelineage.BecauseTele-graphisdesignedtoworkcorrectlywitheddies,itsqueryoperatorscorrectlyhandletupleswithinaquerythathavemultipledifferentlineages.InCACQweextendthisabilitytomultipleoverlappingqueries,maximizingthesharingofworkandstateacrossqueries.Asanexample,consideranumberofqueriesoverourbuild-ingnetwork,eachofwhichislookingtotemporallyjointempera-tureandlightsensorreadingsabovesomethreshold,withthelightthresholdvaryingfromquerytoquery.Eachqueryconsistsoftwooperators:aselectionoverlightreadingsandawindowedjoin[22]withinsometimewindowbetweenthetwosetsofreadings.Allquerieshavethesamejoinpredicate,buteachqueryselectsadif-ferentsetoflighttuples(someofwhichsatisfymultiplequeries).OurCQgoalsdictatethatweshouldtrytoshareworkwheneverpossible;sinceallqueriescontainajoinwithanidenticalpredi-cate(equalityjoinontimebetweenlightandtemperaturetuples),anobvioustrickwouldbetoperformonlyasinglejoin.Adetaileddiscussionofourtechniquesformaintainingatuple'slineageisprovidedinSection3.2below.2.3GroupedFilter:PredicateIndexOurthirdtechniqueforcontinuousqueryprocessingisapredi-cateindexingoperatorcalledagroupedlterthatreducescomputa-tionwhenselectionpredicateshavecommonalities.Wemaintainagrouped-lterindexforeachattributeofeachsourcethatappearsinaquery,andusethatindextoefcientlycomputeoverlappingpor-tionsofrangequeries.DetailsofthegroupedlterarediscussedinSection3.2.6;fornow,itshouldbethoughtofasanopaqueobjectthattakesinmultiplepredicatesandatuple,andefcientlyreturnsthesetofpredicatesthatacceptthetuple.Considerourbuildingmonitoringscenarioagain:differentusersmayhavedif-ferentpreferencesfortemperatureintheirofces,andthecentralheatingsystemmayusethesensornetworktodeterminetempera-tureinthoseofces.Theheatingsystemcoulddecidetodirectheattoaparticularpartofthebuildingbyposinganumberofcontinuousquerieslookingfortemperaturesundertheuser-speciedthresholdineachofce.Eachqueryisthusapairofselectionsonlocationandtemperature.Itisverylikelythattemperaturepredicateswilloverlap:thecomfortrangeformostpeopleisfairlysimilar.Thus,withanindexovertemperaturepredicates,wecanavoidapplyingeachpredicateindependently:weaskthegrouped-ltertondallpredicatesrequestingatemperatureabovethevalueofthetuple'stemperatureeld.Thetupleisthenlteredbybuildinglocationandoutputtothequeriesthatmatch,whichtheheatingsystemusestoadjustthebuildingtemperatureintheappropriatelocation.2.4SteMs:MultiwayPipelinedJoinsUsersmayissuequeriesthatjoindatafromdistinctbutover-lappingsubsetsofthesources.Forexample,continuingwithourbuildingmonitoringscenario,imaginethatoneuserwantstheblindsinaregionofthethebuildingtocloseifitisbothwarmandsunnyatthesametime,whileanotheruserwantsthewindowstoopenifitisbothwarmandquietatthesametime.Assumethatreadingsfromthetemperature,light,andsoundsensorsarealltaggedwithalo-cationandtimeandarriveintimeorder.Asnewdataarrives,ourcontinuousquerysystemmustcomputeajoinontheloca-tionandtimeattributesoverthesesourcesandstreamresultstoclientssotheycanreactquicklytochangingconditions.Moreover,acontinuousquerysystemmustsimultaneouslyhandlenumeroussuchqueriesthathavevaryingoverlapinthesetofsourcestheycombine.Tofullltheserequirementsforcomputingjoinsoverstream-ing(andnon-streaming)sources,ourCACQsystememploystwotechniques.First,wemodifyournotionofjoininastandardway[22]:tuplescanonlyjoiniftheyco-occurinatimewindow.This modicationboundsthestateweneedtomaintaintocomputejoins.Second,weuseaspace-efcientgeneralizationofdoubly-pipelinedjoins[28]withinoureddyframework.Foreachincomingsource,webuildanindexon-the-yandencapsulatetheindexinaunaryoperatorcalledaSteM,introducedin[18].SteMsareexposedtotheeddyasrstclassoperators,andtheeddyencapsulatesthelogicforcomputingjoinsovertheincomingsourcesusingSteMs.Thistechniquepermitsustoperformamultiwaypipelinedjoin.Thatis,itallowsustoincrementallycomputeajoinoveranysubsetofthesourcesandstreamtheresultstotheuser.Moreover,thistechniqueallowsustosharethestateusedforcomputingjoinsacrossnumer-ousqueries.WedescribethedetailsofthisschemeinSection3.Inthefollowingsection,wediscusstheimplementationofourCACQsystem,focusingonthefourtechniquespresentedabove.3.IMPLEMENTATIONWeimplementedourCACQsysteminthecontextoftheTele-graphqueryprocessingenginethathasbeendevelopedoverthepasttwoyearsbytheUCBerkeleyDatabasegroup[8].Itsup-portsread-onlySQL-stylequeries(withoutnestedqueries)overavarietyofdatasources:les,networkandsensorstreams,andwebpages.Streamsaretreatedasinniterelationaldatasources,andwebpagesaremappedintorelationaltablesviasimple,regular-expressionbased-wrappers[12].Insteadofaconventionalqueryplan,Telegraphusestheeddyoperatortodynamicallyroutetuplesarrivingfromdatasourcesintooperatorsthatoperateonthosetu-ples.Telegraphprovidesoperatorstoperformbasicdataowoper-ationslikeselectandjoin.GivenTelegraphasourdevelopmentplatform,wenowdiscussourCACQimplementation.Wewilldescribetechniquestofullyimplementselect-project-join(SPJ)querieswithoutnestingorag-gregation.Inthiswork,weonlydescribequeriesoverstreamingdata.Itisassumedthatqueriesapplytodatapresentinthesystemfromthemomentthequeryisregisteredandanyfuturedatawhichmayappearuntilthequeryisremovedfromthesystem.Queriesoverhistoricalornon-streamingdataarenotapartofthisimple-mentation,althoughwewillturntothembrieyintheSection6onrelatedwork.Throughoutthiswork,wemapstreamelementssuchassensorreadingsontorelations,asproposedin[17].Thisallowsqueriesposedoverstreamingdatatorefertorelationsandrelationalat-tributes.Thismappingisdoneintheobviousway:eacheldofastream-elementcorrespondstoanattributeoftherelationrepre-sentingthatstream.Weassumethateachstreamelementhasthesameeldsandatimestampindicatingwhenitwasproduced.Giventhesecaveats,wenowpresentthedesignofoursystem.Forclarityofexposition,weconsiderdesignsinincreasingorderofcomplexity.WebeginwitharudimentaryCACQsysteminwhichasinglequerywithoutjoinsrunsoverasinglesourcewithmultipleattributes.Wethenshowhowmultiplequerieswithoutjoinscanbeprocessedsimultaneously,sharingtuplesandselectionoperators.Finally,weshowhowjoinscanbeaddedtothesystemand,howtheysharestateviaSteMs.3.1SingleQuerywithoutJoinsWithonlyasinglequery,theCACQsystemisverysimilartoTelegraphwithastandardeddyoperator,asin[1].Thequeryisde-composedintoasetofoperatorsthatconstitutetheprocessingthatmustbeappliedtoeverytupleowingintothesystem.Sincewearenotconsideringjoinsatthemoment,theonlyoperatorsthatcanex-istarescanoperators,whichfetchtuples,andselectionoperators,whichlterthosetuplesbasedonauser-speciedBooleanpredi-cate.Fornow,weassumethatqueriescontainonlyconjunctions(ANDs)ofpredicates;wediscussdisjunctivepredicates(ORs)inSection3.2.7.Atthecoreofthesystemisasingleeddythatroutestuplestoop-eratorsforprocessing.Eachoperatorhasaninputqueueoftupleswaitingtobeprocessedbyit.Operatorsdequeuetuples,processthem,andreturnthemtotheeddyforfurtherrouting.Theeddymaintainsapooloftuplesthatarewaitingtobeplacedonthein-putqueueofsomeoperator.Whenthepoolisempty,theeddycanscheduleascanoperatortocausemoretuplestobefetchedorpro-duced.Noticethattheeddycanvarytherouteatupletakesthroughoperatorsinthesystemonaper-tuplebasis.Alsonotethattuplesarenevercopied:onceallocated,theyarepassedbyreferencebe-tweenoperators.ForadetaileddiscussionofTelegraph,see[23].3.1.1RoutingintheSingleQueryCaseAsin[1],tofacilitaterouting,theeddymaintainstwobitvectorswitheachtuple.Eachbitvectorcontainsanumberofbitsequaltothenumberofoperatorsinthesystem.Thesevectorsareusedtotracktheoperatorswhichhaveormaystillbeappliedtoatuple.Therst,thereadybits,indicatewhichoperatorscanbeappliedtoatuple.Inthesingletablecase,anytuplecanberoutedtoanyoperator,sothereadybitsareinitiallyallset.Thesecondbitvectorcontainsthedonebitsthatindicatetheoperatorstowhichatuplehasalreadybeenrouted.Initiallyallofthedonebitsarecleared.Onceallofatuple'sdonebitshavebeenset,itcanbeout-put.Forthesimpleselectioncase,thedonebitsarethecomplementofthereadybits:onceatuplehasbeenprocessedbyaparticularoperator,thatoperator'sreadybitisclearedanditsdonebitisset.InSection3.3,wewillseecaseswherethetwobitmapswillnotbecomplementsofeachother.Ouronlyqueryprocessingoperatorsofar,theselectionopera-tor,usesthesebitsasfollows:whenatuplearrives,itappliesitspredicatetothetuple.Ifthetupledoesnotsatisfytheselectionpredicate,theoperatordiscardsthetuplebydeallocatingitandnotreturningittotheeddy.Ifthetuplesatisesthepredicate,theop-erator'sdonebitissetanditsreadybitiscleared;thetupleisthenreturnedtotheeddyforfurtherprocessing.Thetotalstorageoverheadofthesevectors,inbitspertuple,istwicethenumberofoperatorsinthequery.ThenalelementofthissimplesinglequeryCACQsystemisawaytodeterminetheorderinwhichtuplesareroutedthroughoperators.Thisisapolicydecision:anyorderingwilleventuallyresultinatuplebeingfullyprocessed,butsomeorderings,suchasthosewhichplacehighlyselectiveselectionsearlierinrouting,willbemoreefcientthanothers.Theeddyemploysaroutingpolicytochoosethetupletorouteandthenextoperatortoprocessit.Theroutingpolicyimplementstheper-tupleadaptivityofeddies.Inthequerycase,assumingallselectionscostthesametoapply,thepolicyshouldroutetomoreselectiveoperatorsrst.WediscussroutingpolicesinourCACQsysteminSection4.GiventhissimplesinglequeryCACQapproach,wenowde-scribehowtoextendthissolutiontoworkwithmultiplequeries,eachofwhichqueriesasinglesource.3.2MultipleQuerieswithoutJoinsThegoalofthemultiple-querysolutionintheabsenceofjoinsistouseasingleeddytoroutetuplesamongallofthecontinuousqueriescurrentlyinthesystem.Inoursolution,tuplesarenevercopied:twodifferentquerieswithtwodifferentpredicatesoverthesamerelationshouldoperateoverexactlythesametuples.Thisisimportantfortworeasons:tuplesoccupystoragethatmustbeconserved,andcopyingtakesvaluableprocessorcycles.Akeypartofthemultiplequerysolutionwithoutjoinsisthegroupedlterthatallowsustoshareworkbetweenmultipleselectionsovertheattributeofarelation.Wepresentthedesignofthisdatastructure Eddys2s1s3Filter over S.as5s4s6Filter over S.bQueriesQ1=[s1,s4]Q2=[s2,s5]Q3=[s3,s6]Data Source S[a,b]Q1=from S select * where s1(s.a) ,s4(s.b)Q2=from S select * where s2(s.a) ,s5(s.b)Q3=from S select * where s3(s.a) ,s6(s.b)Figure1:TheBasicContinuousQueryArchitectureinSection3.2.6below.Figure1showsthebasicarchitecture.Userssubmitqueries,,and,consistingofselectionpredicatesthrough\n overtwoelds, and ofsource.Allqueriesaresubmittedtoasingleeddy,withjustonelteroperator(anditsassociatedgroupedlter)foreld ,andoneforeld .Theeddytrackswhentuplesarereadytobeoutputtoeachquery,andsendsthetuplesbacktotheappropriateend-usersasrequired.Wewillrefertothissingleeddy,andtheoperatorsassociatedwithallqueriesrunningintoitasaow.TherestofthisSectionusesthisexampletoshowhowqueriesareaddedandtuplesroutedthroughowswithoutjoins.Therearetwomodicationsthatmustbemadetothesinglequerysystemtoallowittohandlemultiplequeries:newqueriesmustbecombinedwitholdqueries,sharingoperatorswhereverpossible,andtuplesmustbeproperlyroutedthroughthemergedsetofqueriesandoutputtotheappropriateendusers.3.2.1AddingaQueryCombiningoldandnewqueriesisnotcomplicated.Anewquerywhichscansthesamerelationasanexistingquerywillshareexist-ingscanoperator.Similarly,anewquerywithaselectionoversomeattribute forwhichagroupedlteralreadyexistswillsim-plyadditspredicateover tothelter.Asanexample,considerthecasewhereasecondqueryisaddedtoaneddythatalreadyhasasinglequeryoverasinglesource.We'llbeginwith\rfromFigure1.Byitself,thisqueryconsistsofthreeoperators:ascanonandtwoselections,oneover andoneover .Now,arrives,whichalsocontainsascanon,aselectionover andaselectionover .Webeginbymatchingthescans:bothareoverthesamerelation,sowedonotneedtoinstantiateanewscanfor.Similarly,wecanaddthepredicatesfromtheselectionsintothegroupedlterscreatedover and wheninstantiating.(Remember,wearenotconsideringthecasewhereisinterestedinhistoricaltuples;ifthiswerethecase,wewouldhavetocreateadifferentscanoperator.)3.2.2RoutingintheMultipleQueryCaseWenowturntoroutingtuplesthroughaow.Weusethesameapproachasinthesinglequerycase:theeddyrepeatedlyusesitsroutingpolicytochooseatupletorouteandanoperatortowhichthetupleshouldberouted.Acomplexityariseswhenrout-ingthroughapredicateindex.Whenacceptsatupleandrejectsit,weneedtorecordthisinformationsomewhere,sinceitmeansthatthetuplemustnotbeoutputto,butmightbeoutputto(ifdoesnotrejectit.)Oursolutionistoencodeinformationaboutqueriesthatacceptorrejectatupleinthetupleitself,justaswealreadystorethereadyanddonebitswiththetuple.Weallocateabitmap,queri-esCompleted,withonebitperquery,andstoreitinthetuple.Whenaquery'sbitisset,itindicatesthatthistuplehasalreadybeenoutputorrejectedbythequery,sothetupledoesnotneedtobeoutputtothatquery.Thus,willhaveitsbitturnedoninthisbitmaskwhenrejectsatuple,andwillhaveitsbitturnedonwhenatupleisoutputtoit.ThequeriesCompletedbitmap,alongwiththereadyanddonebits,completelyencodethenotionofatuple'slineagedis-cussedabove.Lineagedoesnotsimplycaptureatuple'spaththroughasinglequery:itconciselyexpressesatuple'spaththroughallqueriesinthesystem.Bylookingatanytupleatanypointintheow,itispossibletodeterminewherethattuplehasbeen(viaitsdonebits),whereitmustgonext(viaitsreadybits)and,mostimportantly,whereitmaystillbeoutput(viaitsqueriesCom-pletedbits.)InourCACQapproach,weareneverdependentonthestructureofthequeryplanforimplicitinformationaboutatuple'slineage.Thismeansthatanyoperatorcommontoanytwoqueriescanbetreatedasasingleoperatorthathandlestuplesforbothqueries.Similarly,anytuplecommontotwoqueriescanbeusedwithoutcopying.Thefactthatthetuplemaybeoutputtoonlyoneofthequeriesisexplicitlyencodedinthequeri-esCompletedbits.Inexistingcontinuousquerysystems(likeNiagaraCQ[3])thatuseastaticqueryplan,apairofoperatorsthatcouldotherwisebemergedmustbekeptseparateifthequeryop-timizercannotguaranteethatthesetoftuplesowingintooneop-eratorisidenticaltothesetoftuplesowingintotheother.Again,thisleadstoextracopiesofeachtuplewhichwouldnotbeallocatedintheCACQapproach.Asanimplementationdetail,wehavechosentopreallocatethequeriesCompletedbitsofeachtupleasaxedsizebitmapratherthanattemptingtodynamicallyresizethebitmapineverytupleasnewqueriesarrive.Dynamicresizingcouldbeveryexpen-siveiftherearemanytuplesowinginthesystemwhenaqueryarrives.Notethatthislimitsthemaximumnumberofqueriesthatmaybeinthesystematanyonetime.Inourapproach,thequeri-esCompletedbitforqueriesthatdonotexistatthetimethetupleiscreatedaresettoone.Thismeansthatthetuplewillnotbeoutputtothequeriesthatarrivewhileitisinthesystem.Similarly,whenaqueryisremovedfromthesystem,wesetallofthequeri-esCompletedbitsforthequeryinin-ighttuplestoone.ThisallowsustoreusethequeriesCompletedbitassociatedwiththederegisteredqueryinanewquery–inighttupleswillneverbeoutputtonewqueries,inaccordancewiththespecicationthatqueriesinCACQareonlyoverdatathatarrivesafterthequery.3.2.3OutputtingTuplesWehavenowshownhowtotrackthequeriestowhichatuplemayormaynotneedtobeoutputto,butamechanismtodeterminewhenatupleshouldbeoutputisstillneeded.Wewillaccomplishthisbyassociatingacompactquerysignature,thecompletion-Mask,witheachquery.Thecompletionbitmaskisthesamesizeasthedonebitmap,andhasabitturnedonforeachoperatorthatmustprocessthetuplebeforeitcanbeoutput.Todetermineifatupleshouldbeoutputtoaquerythathasnotalreadyoutputorrejected,weAND'scompletionMaskwith'sdonebits;ifthevalueisequaltothecompletionMask,thetuplecanbeoutputto.Wemaintainaseparatedatastructure,outQueues,whichassociateaqueryIDtoanoutputqueuethatwilldelivertu-plestotheuserwhoposedeachquery.Theabovesystemwillproperlymergetogetherqueriesandroutetuplesthroughthem.Thereishowever,anoptimizationthatsignif-icantlyimprovesthespaceefciencyofthisapproach.Considerwhathappenswhenanewquery,,withasingleselectionoversomesourceisaddedtothequeriesshowninFigure1.Thisquerysharesnooperatorswiththeotherqueriesinthesystemandantuplewillneverberoutedthroughoneoftheselectionop-eratorson,butspacemustbereservedineverytupleforthedoneandreadybitsoftheselectionsonandineverytuple'squeriesCompletedbitsfor,,and.Inasystemwithmanyqueriesovermanysources,thiscouldleadtoasignicant Source State for SOperatorsQueriescompletionMasks[1: S.a � x, 2: S.b = c, 3: S.c � y][1,2]1: 1102: 001Source State for ROperatorscompletionMasks[1: R.d � z][3]1: 1Queries1. from S select S.a where S.a � x, S.b = c2. from S select S.c where S.c � y3. from R select R.d where R.d � zInput QueriesFigure2:ContinuousQueryDataStructureswasteofspaceineverytuple.Thesolutionistopartitionourstatebysource.Statethatwaspreviouslysystem-wide,namelyinformationaboutqueriesandop-erators,nowbecomesspecictoaparticulardatasource.EachtupleistaggedwithasourceIdwhichtellstheeddywhichscanoperatorcreatedatupleandisusedtodeterminehowthattuple'sdone,ready,andqueriesCompletedbitsareinterpreted.TwoauxiliarydatastructuresarecreatedforeachsourceId:anoperatorstablethatliststheoperatorsthatapplytothesourceandaqueriestablethatliststhequeriesoverthesource.TheentryintheoperatorslistcorrespondstothebitinthedoneandreadybitmasksfortupleswiththissourceId.Similarly,entriesinqueriescorrespondtobitsinatuple'squeriesCom-pletedbitmap.Wemustalsoinsurethatthecompletion-Masksbuiltaboveareassociatedwiththeappropriatesource'soperatorslist.Figure2showsthesedatastructuresforthreesamplequeries.Figure3showstheextraeldsstoredinthetupleheaderforroutinginCACQeddies.Theeldsthatareinheritedfromthesingle-querytupleareshownrst,withtheadditionaleldsforthemulti-querycaseshownnext.3.2.4PerformanceEffectsofAdditionalStorageWebeginthisSectionbyestimatingtheamountofstoragere-quiredtomaintaintheper-tuple,per-source,andper-querystateinourCACQsystem.Table1summarizesthestorageoverheadoftheseadditionaldatastructures(notincludingthereadyanddonebitsfromthebasicEddiesimplementation.)Inthistable,  \n \rreferstothenumberofdistinctselectionoperatorsreachablepersource;thisisthetotalnumberofpredicateindices(equaltothenumberofat-tributes),dividedbythetotalnumberofqueriedsources.\rreferstotheaveragenumberofqueriesthatinvolvetuplesfromthissource;intheabsenceofjoinsthisisequaltotheaveragenumberofqueriesdividedbythenumberofsources.Table2givesrepresenta-tivevaluesfromourbuildingmonitoringscenariofortheseparam-eters,assumingthe(veryaggressive)goalof100,000simultaneousqueries.TheseareusedtoprovidenumericvaluesinTable1.No-ticethattheadditionalcostofcontinuousqueriesisjust6.83MB,Figure3:ContinuousQueryTupleFormatthemajorityofwhichistheoutputqueuesforqueries.Thestateperactivetupleis2.5KB,whichseemstroublesomeuntilonecon-sidersthecaseinwhich100,000queriesarerunindependently.Inthiscase,100,000copiesofeachtuplewillbemade,whichfor200bytetuplesis20MBofstaterequiredforallcopiesofeachtuple.InthenextSection,wepresentexperimentalevidenceshowinghowqueryperformancerelatestotuplesizeinCACQ.3.2.5StorageOverheadInorderforthesystemtobeabletoscaletoanarbitrarynumberofqueries,wemustbeabletoextendthesizeoftupleswithoutseriouslyimpactingperformance.Itisexpectedthatlargertupleswillleadtoslowerperformance,simplybecausemorebitshavetobeallocatedandcopiedeverytimeatupleisoperatedupon.AllofourexperimentalresultsaregeneratedfromtheactualTelegraphqueryenginerunninginrealtime.Asacontinuousdatasource,weuseageneratedstreamSofrandomtuples,eachwithsixelds:ansequencenumberuniquelyidentifyingthetupleandveintegervaluedelds,S.athroughS.ewithrandomvaluesuniformlysampledfromtherange[0,100).TheserverwasanunloadedPentiumIII933MHzwith256megabytesofRAM.TelegraphwasrunningundertheSunHotSpotJDK1.3.1,onDebianLinuxwitha2.4.2Kernel.Clientconnec-tionscamefromaseparatemachine,runningHotSpotJDK1.3.Toavoidvariationsinnetworklatency,tupleswerecountedanddis-cardedjustbeforebeingsentbacktotheclientmachine.Inthesestudies,weran5simultaneousqueriesoverthesourceSdescribedabove.Wevariedthetuplestatesizefrom15bits/tuple(theminimumrequiredfor5queries)to3000bits/tuple(thedefaultvalueusedforotherexperiments)andmeasuredthetuplethrough-putrate.Wepurposelykeptthenumberofqueriessmalltomeasuretheimpactofadditionaltuplestateindependentlyfromthecostofadditionalqueriesandoperators,whoseperformancewewilldis-cussinSection3.2.8.TheresultsareshowninFigure4.Noticethattuplethroughputdropsoffbymorethanafactorofvebe-tween15and3000bitspertuple,butthattheslopeofthecurveisdecreasing,suchthataddingmanymorebitswillnotsignicantlydecreasequeryperformance.Infact,thetailofthegraphispro-portionalto\r .Thisrepresentsthememorybandwidthofoursystem:thereisaxednumberofbytespersecondwecanallocateandformatfortuples.Longertuplesrequiremorebytes,andsofewerofthemcanbeallocatedandformattedinonesecond.Theseresultsdemonstratealthoughtuplesizedoesdramaticallyeffectperformance,itdoesnotcrippleourqueryprocessor.Itisim-portanttobearinmindthatthisamountofsourcestateisenoughtorunonethousandsimultaneousqueries–anamountofparallelismthatwouldseverelystressanyotherdatabasesystem.3.2.6PredicateIndex:TheGroupedFilterAspreviouslymentioned,ourCACQsystemincludesapredicateindexthatallowsustogroupselectionpredicates,combiningallselectionsoverasingleeldintoagrouped-lteroperator,whichTable2:Parameters,CACQMonitoringScenario.ParameterValue "!$#&%(')#"*100,000*&+"!,%.-#(*5(light,temperature,sound,accel.,mag.)/0)0)%('1!$02#"*15(3attributespersource)3,%(#"4.')-/02#"*150,000(avg.1.5lters/query)5 6\n \r7835\r20,000sizeof9':;0=4bytessizeof�9!,073?!,0=@A!B#!$#(64bytessizeof9C%.#"4.')-/D02#(100bytessizeof�93,#%./02+&%D100bytes Table1:ContinuousQueryStorage.ExtradatastructuresrequiredforcontinuousqueriesaFlow,withestimatesfor100,000QueriesStructureSizeExpressionEstimatedSize(bytes)PerSourceState(SS)queriessizeof9':B0=5\r 80,000operatorssizeof�93$#%./D02+"%D95 6\n \r789*&-/:400completionMask5\r \n \r\r 7,500PerTupleState(TS)tupleQueryMask\n \r !\r 2,500PerQueryState(QS)outQueuessizeof�9!,073?!,0=@A!B#!$#(64TotalFlowState*+"!,%.-#(*"SS "!$#&%('#(*#QS/-0)'%$#&0)!3'&#"*TS6.83MB+/-0)'%$#&0)!3'&#"*2.5kB2004006008001000120014001600180020000150030004500600075009000Tuple ThroughputTuple State (bits)Tuple Throughput vs. Tuple StateFigure4:EddyPerformanceVs.TupleSizecanapplymanyrangepredicatesoveranordereddomaintoasingletuplemoreefcientlythanapplyingeachpredicateindependently.Whenaselectionoperatorisencounteredinanewquery,itssourceeldischeckedtoseeifitmatchesanalreadyinstantiatedgroupedlter.Ifso,itspredicateismergedintothatlter.Other-wise,anewgroupedlteriscreatedwithjustasinglepredicate.Agroupedlterconsistsoffourdatastructures:agreater-thanbalancedbinarytree,aless-thantree,anequalityhash-table,andaninequalityhash-table.Whenanewpredicatearrives,itisin-sertedintotheappropriatedatastructure(i.e.(predicatesareputintothegreater-thantree)atthelocationspeciedbyitsconstantvalue;greater-than-or-equal-toandless-than-or-equal-topredicatesareinsertedintobothatreeandtheequalityhash-table.Notethatwedonothavetostoretheentirepredicatedatastructure:wekeepthepredicate'sconstantvalueandthequery-idofthequeryitap-pliesto(inparticular,wedonotneedtostorethedatabasetableoreldthepredicateappliesto.)Whenatuplearrivesatthelter,eachofthesedatastructuresisprobedwiththevalueofthetuple.Forthegreater-thantree,allpredicatesthataretotheleftofthevalueofthetuplearematches;likewise,fortheless-than-tree,allpredicatestotherightofthevalueofthetuplearematches(seeFigure5.)Fortheequalityhash,amatchonlyoccursifthevalueofthetupleisinthetable.Con-versely,intheinequalitycase,alltuplesarematchesexceptthosethatappearinthetable.Asmatchesareretrieved,abit-maskofqueriesismarkedwiththosequerieswhosepredicatesthetuplepasses.Onceallofthematcheshavebeenfound,themaskisscanned,andthetuple'squeriesCompletedbitmapismodiedtoindicatethatthetu-pleshouldnotbeoutputtothosequerieswhichthetupledidnotpass–inthisway,thesequeriesarepreventedfromseeingnon-matchingtuples.Figure5illustratesthesedatastructuresforasetofpredicatesoverasingleeld.Aprobetupleisshownontheright,andthegrayboxesindicatethematchingpredicateswithinthedatastructures.Inadditiontosignicantlyreducingthenumberofpredicatesthatmustbeevaluatedwhenmanypredicatesexistoverasingleeld,groupedpredicatesareinterestingforanotherreason:theyrepresentasignicantreductioninthenumberofoperatorsthroughwhichaneddymustrouteatypicaltuple.Thisprovidesanumberofbenets:First,itservestoreducetheaveragetuplesize,sincetuplesneedfeweroperator-bitsintheirheaders.Second,itreducesthesizeoftheoperator-statestoredwitheachsource.Finally,iteliminatesalargenumberofroutingstepsintheowofthetu- S.a � 1 S.a � 71711 S.a � 11Predicates forSource S.a:S.a � 1S.a � 7S.a � 11S.a 3S.a 5S.a = 6S.a = 83 S.a � 3 S.a � 56TupleS.a : 8= Matches TupleSubmitted PredicatesGrouped Filter For S.a=�¹Figure5:GroupedFilterExample:Thegroupedlterissearchedformatchingpredicateswhenatuplearrives.Thegrayedregionscorrespondtomatchingpredicatesforthetupleintheupperright.ple:eventhougheachstepisnotparticularlyexpensive,routingatuplethroughthousandsoflterswillincuranon-trivialroutingoverhead.3.2.7QuerieswithDisjunctionUptothispoint,wehaveonlyconsideredquerieswithANDpredicates.TohandleORs,wefollowstandardpractice[20]andreducebooleanexpressionintoconjunctivenormalform,forex-ample:)*#+-,)\n+-,)*/."+Theeddyisstillfreetochoosetheorderinwhichtuplesareroutedthroughtheselectionoperatorsorgroupedlters.Becausemanyqueriesmayshareeachpredicateinthedisjunction,wecannotshortcircuittheevaluationofsuchexpressionsbyabortingtheevaluationofotherdisjunctswhenonedisjunctfails,orskippingtheevalua-tionofotherpredicatesinaconjunctwhenonepredicatesucceeds.Instead,wechoosetoassociateanadditionalbitperdisjunctwitheachtuple,andwhenanypredicatefromthatdisjunctevaluatestotrue,wesetthebit.Then,wemodifythelogicthatdeterminesifatupleshouldbeoutputtoaquerytocheckthatthebitissetforev-erydisjunct.Weomitadetaileddescriptionoftheimplementationandoverheadofthissolutionduetoalackofspace.Thus,ourCACQsystemfullysupportsexpressionscontainingANDsandORsofselectionpredicates.Wenowturntoaper-formanceassessmentofthiscompletemulti-query,singlesourcesystem.PerformanceofCACQwithoutJoinsOneofthestatedgoalsforoursystemwastoallowittoscaletoalargenumberofsimultaneousqueriesoveranumberofdatasources.Webelieveoursystemhasmadesignicantprogressto-wardsthatgoal.Todemonstratethis,werantwoexperiments:Intherst,wemeasuredtheeffectofincreasingthenumberofqueries;Inthesecond,wevariedthenumberofdatasourcesoverwhichthosequerieswereposed.Queriesinbothscenarioswererandomlygenerated.Randomly 200400600800100012001400160018002000220005101520253035404550Total Tuples Output Per SecondNumber of QueriesTuple Throughput vs. Number of QueriesContinuous Query EddiesConventional Eddies(b)05001000150020002500300002468101214161820Tuples Per SecondNumber of SourcesTuple Throughput vs. Number of Sources(20 Queries, 1 Predicate Each)(a)Figure6:EddyPerformanceVs.NumberofQueries(a)andNumberofSources(b).generatedquerieshada50%chanceofhavingapredicateoveranyeld;ifapredicateexistedforagiveneld,thepredicatewasran-domlyanduniformlyselectedfromtheset#(.Equalityandinequalitypredicateswereomittedbecauserandomlygener-atedequalityquerieswillrarelyoverlap.Thecomparisonvaluewasarandomuniformselectionfromtherange[0,100).TomeasuretheperformanceofourCACQimplementationagainstthenumberofqueries,weissuedmanyqueriesagainstasingledatasourceandmeasuredthenumberoftuplesoutputfromthesys-tem.WecomparedtheperformanceofcontinuousqueriesagainstthebasicimplementationinTelegraph,inwhicheachqueryrunswithitsowneddyandoperators.Figure6(a)showstheresultsfromtheseexperiments.Noticethatforthecontinuousquerycase,throughputincreasessharplytoabout20queries,atwhichpointthesystemisfullyutilized;thesystemcannothandlemorequerieswithoutdecreasingtheper-querydeliveryrate.Itcontinuestoscaleatthisthroughputratetoftyqueriesandbeyond.Theexistingeddyimplementationreachesmaximumthroughputatvequeries,withatotaltuplethroughputofabouthalfofourcontinuousquerysystem.Tomeasuretheabilityofoursystemtoscaletoalargenumberofsources,weexperimentedwithrunningtwentyqueriesoveravariablenumberofsourcesidenticaltotheSsourcede-scribedabove.Eachqueryhadasinglepredicaterandomlyselectedasabove,alloverelda,withthesourceforeachqueryrandomlychosenfromtheavailablesources.Multiplequerieswereissuedoverthesamesource.Figure6(b)plotsthenumberoftuplesoutputversusthenumberofsources.Asexpected,additionalsourcesde-creasethetuplethroughputsomewhat.Thisisduetotwofactors:rst,therearenowmanymorescanoperatorsthatmustbesched-uledbytheeddy.Second,becauseltersofindependentstreamscannotbecombined,manymorelter-operatorsarecreatedandalargernumberofpredicatesevaluatedasmoresourcesareadded.3.3MultipleQuerieswithJoinsThusfar,wehaveonlypresentedqueriescontainingselectionop-erators.Inthissection,wepresentourmechanismforcomputingjoinsoverstreamingsources.Asmentionedbefore,wehavetwore-quirementsforjoinprocessinginourCACQsystem.First,wemustinsurethatthejoinoperationsarepipelinedtosupportcontinuousstreamingresultssothatusersmayreceiveupdatesquickly.Sec-ond,wemustscalewiththenumberofqueries,whereeachquerycanspecifyajoinandpredicatesoveranysubsetofthesources.Toaccomplishthesegoals,weuseageneralization,withinoureddyframework,ofdoubly-pipelinedhash-joinscalledSteMs[18]Light\rTemp\rNoise\rL\rT\rL T\rN\r(a)ATreeofDoubly-PipelinedHashJoinsEddy\rLight\rTemp\rNoise\rL\rT\rN\rSteMs\r(b)EddyandSteMsFigure7:ConventionalQueryPlansvs.CACQwhichallowsmultiway-pipelinedjoincomputationforanysubsetoftheincomingsources.Thisschemereducesthestateneededforjoincomputationbysharingthein-ightindexstructuresbuiltamongthevariousjoinsspecied.3.3.1SteMs:Multiway­PipelinedJoinsOnegoalofourCACQsystemistoallowuserstoquicklyre-acttochangingconditionsintheirinputdata.Thus,weensureourcomputationispipelined,sowecanquicklyproducenewresultsfromthedatacollectedsofarwhenanewtuplearrivesfromanysource.Resultsmustbeproducedincrementallyforallqueries,notjustsomeofthespeciedqueries.Wefullltheserequirementsbyusingaspace-efcientgeneralizationofdoubly-pipelinedhashjoinscalledSteMs.SteMswererstdevelopedin[18]inthecon-textofadaptivequeryprocessing.First,wereviewdoubly-pipelinedjoins,theirproperties,andwhycascadesofsuchjoinscanbeinefcient.Adoubly-pipelinedhashjoinisabinaryjoinwhichmaintainsanin-ighthashindexoneachofitsinputrelations,callthemand.Whenanewtuplearrivesfromoneoftheinputrelations,say,itisrstinsertedintotheindexfor,andthenusedtoprobetheindexofformatches.Notethatboththeinsertionandprobephasesforonetuplemustcompletebeforethenexttuplecanbeprocessed.Inordertobuildpipelinedjoinsovermorethantwosources,wecancomposesuchjoinsintoatreeofjoins,exactlyasonewouldinastaticqueryplan.AnexampleisshowninFigure7(a),wherejoiningreadingsfromlight,temperature,andnoisesensorsarejoined.Thereareseveraldisadvantagesofcomputingjoinsinthisman-ner.First,intermediateresultsmustbematerializedinthehashindicesof“internal”joinsineachplan.Evenwithleft-deepplanswhichjoinsources,  \nadditionalin-ightindicesareneededforintermediateresults.Forexample,inFigure7(a),intermediatetuplesoflightandtemperaturereadingsarestoredinthelefthashindexinthetopmostjoin.Wecalltheseintermediateindices.Second,thisschemedoesnotscalewellwiththenumberofuserqueries.Forexample,imaginewehavesourcesandonequeryforeachpossible3-wayjoinoverthesourcesonthetimeattribute.Thenthereareatleast \rqueriesrequiringtheuseofaninter-mediateindex.Inthisexample,agivensource,,needstoprobeatleast \rintermediateindices,whichcontainjoinedtuplesfromtheothersources,tosatisfythequeriesthatrangeover.Theseindicescanalsobesharedamongtheothersourcestosatisfytherestofthequeries.Thus,weneedtomaintainatleast \rintermediateindicestosupportpipelinedjoinsforallthequeries.Thiscanbeasignicantamountofstate.Consideranexamplewith15distincttypesofsensors.Wewouldneedtomaintain \rintermediateindicestosupportourhypotheticalqueries.Further,imagineeachtupleis128bytes,foreachjoineverytuplematches exactlyoneothertuple,thesensorsproduceatupleeverysecond,andtheindicesretainthelasthourofreadings.Theneachindexonasinglestreamwouldbe0.46MB,andeachintermediateindexwouldbe0.9MB.Justtosupport .\rdistinctqueries,thetotalsizeforallin-ightindiceswouldbe101MB.Third,pipelinedjoinsarrangedinaqueryplandonotpermitne-grainadaptivityoftheformofferedbytheeddy.Everytimethejoinorderchanges,wemustrecomputesomeintermediatein-dices.InourCACQsystem,weavoidtheseproblemsbypromot-ingtransientindicesonindividualsourcestorstclassoperators,calledSteMs,andplacethemintoaneddy.Thecascadeofdoubly-pipelinedjoinsinourexampleinFigure7(a)wouldbeconvertedtotheaplaninFigure7(b)intheeddyframework.SteMsintheCACQsystemaresimplyoperatorsthatencapsulateasingleindexbuiltonastreamusingaparticularattributeasthekey.Theseindicescanbehashindicestosupportequalityjoins,whichariseinourbuildingmonitoringscenario.OrtheycanbeB-treesorothertypesofindices,dependingonthequeryworkload.SteMscanbepassedtuplesthatareinserted(orbuilt)intotheindex,ortuplesthatareusedtosearch(orprobe)theindex.ASteMwillreturnalltuplespassedtoitbytheeddybacktotheeddy.Thereadyanddonebitsaremarkedtoindicatetheoperatorsthatstillneedtoprocessthetuple.Inaddition,anintermediatetuplethatistheconcatenationofthetupleusedtoprobetheindexandthematch,isoutputforeachmatch(alsomarkedappropriately).Amultiwayjoiniscomputedbystartingwithanewtuplepushedintotheeddyfromsomesource,asingletontuple,androutingitthroughSteMstoproducethejoinedresult(s).Forexample,imag-ineaquerythatrangesoverallthreesourcesinFigure7(b).Whenanewlighttuplearrives,onepossiblerouteisthatitrstisinsertedintothelightSteM.ThenitissenttoprobethetemperatureSteMandjoinedwithatemperaturereading.ThentheintermediatetupleissenttothenoiseSteM,joinedwithnoisereadings,andresultingtupleisthenoutput.Foraquerythatrangesoveronlythelightandtemperaturesources,theeddycanoutputtheintermediatetu-pleproducedaftertheprobeintothetemperatureSteM.NotethatatupleusedtoprobeSteMscanbeeitherasingletonoraninter-mediatetuple.ThustheSteMcanapplyanypredicatecontainingitsindexedsourceandattribute.Theeddyroutesthesetuplesbyobeyingsomeconstraintsforcorrectness,andfollowingaroutingpolicyforefciency.Becausewehaveinterposedaneddybetweentheindices,wehavelosttheatomic“buildthenprobe”propertyofpipelinedjoins,leadingtotwoconstraintsontheeddyandSteMstoensurecorrect-ness.TherstconstraintisthatasingletontuplemustbeinsertedintoallitsassociatedSteMsbeforeitisroutedtoanyoftheotherSteMswithwhichitneedstobejoined.Whenitisinserted,itistaggedwithagloballyuniquesequencenumber.Thus,SteMsonlyindexsingletontuples.Thesecondconstraintisthatanintermedi-atetuplereturnedfromaSteMisvalidonlyifthesequencenumberofthetupleusedtoprobetheSteMisgreaterthan(i.e.itarrivedlater)thesequencenumberoftheindexedtuple.Allvalidinter-mediatetuplesretainthelargerofthetwosequencenumbers,andinvalidtuplesarediscardedbytheSteM.Theseconstraintsmain-tainthe“buildthenprobe”propertybetweenanytwotuplesthatarejoined,andaresufcienttopreventduplicatetuplesfromaris-ing.Withintheseconstraints,theeddyisfreetochoosetheorderinwhichtorouteatupletogenerateresults.Theseroutingdecisionsarediscussednext.Thus,thereareseveraladvantagestousingSteMswithaneddyforjoinprocessing.First,onlyasingleSteMisbuiltforeachsource,andtheseSteMsaresharedacrossallthejoinsamongallthequeriesposed.Contrastthescalabilityofthisschemewiththescalabilityofpipelinedjoinsinatree.Usingourpreviousexam-plewith15sensors,SteMswouldonlyneedtomaintain6.9MBofdatatosupportanysubsetofthepossible32K(\n.)joins,com-paredwith101MBtosupportonly455queries.Second,wecancomputejoinsinapipelinedfashionforallpossiblejoinsoverthesources.Third,thejoinorderisdecidedonaper-tuplebasis,pro-vidingne-grainadaptivity.3.3.2RoutingwithJoinsRoutingtuplesinaneddyinourCACQsysteminvolvestwocomputations.Therstistodeterminethesetofoperatorstowhichatuplecanbesentnextorthesetofqueriestowhichitcanbeoutput.Thesecondistochoosefromthecandidateoperatorstheonethatwillprocessthetuplenext.Fortherstcomputation,weneedtomaintainadditionaldata-structuresandaugmentourcurrentonestohandlegeneratingandroutingintermediatetuples.Fortheseconddecision,theroutingpolicyforSteMsisthesameastheoneusedintheno-joincasedescribedabove.First,weneedtoaugmentthestateassociatedwitheachsource.WeaddaseparateSteMslistcontainingSteMoperatorswithwhichthesourceneedstobejoined.Thequerylistremainsthesame;itincludesthequeriesthatrangeoveronlythatsource.Thus,themasksinthecompletionMasklistarepaddedwithsforeachSteMintheSteMslist.Similarly,weaugmentthereadyanddonebitsinthetuplestatetoincludebitsforthenewSteMs.ThesechangesprovideaschemeforroutingsingletontuplesintoSteMs;wenowdescribedata-structuresthathandleintermediatetuples.Intermediatetuplescancontaindatafromsomesubsetofsourcesowingintotheeddy.Giveninputsources,thereareatmost\npossibletypesofintermediatetuples.Analogoustothestatewemaintainforexistingsources,wecreateavirtualsource,withauniquesourceId,whenanintermediatetupleofaparticulartypeisrstmaterialized.Thus,eachsourceorvirtualsourceisassoci-atedwithsomedistinctsubsetofthesourcesinthesystem.Witheachvirtualsource,weassociateanoperatorslist,SteMslist,querylist,andcompletionMasklist.Allqueriesthatrangeoverallthesourcesofavirtualsourceareinthequerieslistcorrespondingtothatvirtualsource.Theoperatorslististheunionofallselectionoperatorsthatneedtobeappliedforeachqueryinthequerieslist.TheSteMslistcontainsalltheSteMsmoduleswithwhichanin-termediatetupleneedstobejoinedtosatisfyquerieswhichrangeoveradditionalsources.ThecompletionMasklistcontainsabitmaskforeachquery.Likewise,eachcompletionMaskindi-cateswhichoperatorsintheoperatorslistneedtoprocessatuplebeforeitcanbeoutput.Whenanintermediatetupleisformed,itsqueriesCompletedbitmapisclearedandistaggedwiththesourceIdofitsnewvir-tualsource.ThereadybitsaresettoreecttheoperatorsintheoperatorsandSteMslistthatstillneedtoprocessthetuple.Also,thedonebitsaresettoindicatewhichoperatorsandSteMshavealreadyprocessedthetuple.Asusual,theeddycomparesthecom-pletionMasktothedonebitstodeterminetothequeriesanintermediatetuplecanbeoutputto.Similarly,theeddyusesthereadybitstodeterminetheSteMsandselectionoperatorsatuplecanbesentto.Weomitthedetailsforefcientlyperformingthesebitvectorinitializationandmanipulationsduetolackofspace.Whenanewqueryarrivesintothesystem,itisrstaddedtothequerieslistofthevirtualsourcecorrespondingtothesourcesoverwhichthequeryranges.Ifavirtualsourcedoesnotexist,itiscreated.WedeterminetheselectionoperatorsandtheSteMsthatthequerywillneed.TheselectionoperatorsarefoldedintothesystemasdescribedinSection3.2.SteMsaretreatednodifferentlythanselectionoperators.IfnewaSteMisadded,thenthatSteMisaddedtotheSteMslistforallexistingsourcesandvirtualsourceswhichcontainthesourceassociatedwiththeSteM. 3.3.3PurgingSteMsBecauseourCACQsystemisdesignedtooperateoverstreams,amechanismisneededtolimitthestatethataccumulatesinjoinsasstreamsowendlesslyintothem.Thismechanism,proposedin[22],istolimitthenumberoftuplesinaparticularSteMbyimposingawindowonthestream.Windowsspecifyaparticularnumberoftuplesorperiodoftimetowhichthejoinapplies;tuplesoutsidethewindowarenotincludedinthejoin.Thus,theyareakeycomponentofourjoinsolution,although,fromaresearchperspective,theyhavebeenthoroughlydiscussed.Weallowwindowstobespeciedasacomponentofajoinpred-icate.Inthecurrentimplementation,windowsaresimplyaxednumberoftuples;extendingthesystemtoallowwindowsoveraxedtimeperiodwouldbefairlysimple.Ourwindowsaresliding:thewindowalwaysincorporatesthemostrecentdatainthestream.Asnewtuplesowin,oldtuplesareforcedoutofthewindow.SinceSteMsmaycontainmultiplepredicates,wecannotsimplydiscardtuplesfromtheindexthatdonotfallwithinthewindowofaparticularpredicate.Wekeepthemaximumnumberoftuplesspeciedamongallthewindowsassociatedwiththepredicates.Foragivenpredicate,werejectmatchesthatareoutsideofthatpredicateswindowbutstillwithintheindex.Inthisway,wedonothavetocreatemultipleSteMstosupportdifferentwindowsizes.Inthenextsection,wediscussbuildingaroutingpolicytoef-cientlyroutetuplesbetweenoperatorsinacontinuouseddy.4.ROUTINGPOLICIESTheroutingpolicyisresponsibleforchoosingthetupletopro-cessnextandtheoperatortoprocessit.Theoriginaleddyim-plementationusedtwoideasforrouting:therst,calledback-pressure,limitsthesizeoftheinputqueuesofoperators,cappingtherateatwhichtheeddycanroutetuplestoslowoperators.Thiscausesmoretuplestoberoutedtofastoperatorsearlyinqueryexecution,whichisintuitivelyagoodidea,sincethosefastopera-torswilllteroutsometuplesbeforetheyreachthesloweroper-ators.Thesecondapproachaugmentsback-pressurewithaticketscheme,wherebytheeddygivesatickettoanoperatorwheneveritconsumesatupleandtakesaticketawaywheneveritsendsatuplebacktotheeddy.Inthisway,higherselectivityoperatorsaccumulatemoretickets.Whenchoosinganoperatortowhichanewtupleshouldberouted,theticket-routingpolicyconductsalotterybetweentheoperators,withthechancesofaparticularop-eratorwinningproportionalthenumberofticketsitowns.Thus,higherselectivityoperatorswillreceivemoretuplesearlyintheirpaththroughtheeddy.Wehaveimplementedavariantoftheticketscheme.Inourvari-ant,agrouped-lterorSteMisgivenanumberofticketsequaltothenumberofpredicatesitapplies,andpenalizedanumberofticketsequaltothenumberofpredicatesitapplieswhenitre-turnsatuplebacktotheeddy.ASteMthatoutputsmoretuplesthanitreceivescouldthusaccumulatenegativetickets;welowerboundthenumberofticketsanymodulesreceivesatone.Multi-pleSteMswithonlyoneticketwillbescheduledviaback-pressure,sincehighercardinalityjoins(whichshouldbescheduledtowardsthetopoftheplan)willrequirelongertocompletelyprocesseachinputtuple.Thus,highlyselectivegroupedlterswillreceivemoretickets,andtupleswillberoutedtotheseltersearlierinprocess-ing.Inthisway,wefavorlow-selectivityviaticketsandquickworkviabackpressure.Weweightthevalueofthatworkbythenumberofpredicatesappliedbyeachoperator.Wenowpresentaperformanceevaluationofourmodiedticket-basedroutingschemeasitfunctionswithanumberofselection-onlyqueries.WewilldiscusstheperformanceofourroutingpolicywithrespecttojoinsasapartoftheexperimentsinSectionbelow.Table3:QueriesforRoutingSchemeComparison.1.fromSselectindexwherea�902.fromSselectindexwherea�90andb�703.fromSselectindexwherea�90andb�70andc�504.fromSselectindexwherea�90andb�70andandc�50andd�305.fromSselectindexwherea�90andandb�70andc�50andd�30ande�104.1TicketBasedRoutingStudiesThemodiedticket-basedroutingschemepresentedaboveisde-signedtoorderlter-operatorssuchthatthemostselectivegroupedlterthatappliestothemostpredicatesisappliedrst.Wecomparethisschemetothreealternatives.Intherandomscheme,tuplesareroutedtoarandomoperatorthattheyhavenotpreviouslyvisited.Intheoptimalscheme,tuplesareroutedtotheminimumsetofltersrequiredtoprocessthetuple.Thisisahy-potheticalschemethatprovidesanupperboundonthequalityofaroutingscheme.Foranygiventuple,itappliesthesmallestnumberofpossiblelters.Theoptimalapproachordersselectionsfrommosttoleastse-lective,andalwaysappliestheminthatorder.Determiningthisoptimalorderingisnotalwayspossible,sincetheunderlyingdis-tributionofanattributemaybeunknownornotcloselymatchanystatisticsgatheredforthatattribute.However,fortheworkloadshowninTable3,clearlytheoptimalorderingplacesrstappliestheselectionover ,thentheselectionover ,then,,and.Allofthetupleswillpassthroughthe selection.Onlyten-percentofthetupleswillpass ,thirtypercentofthosewillpass ,andsoon.Thisleadstothefollowingexpressionfortheexpectednumberoflterseachtuplewillenterinthisapproach:;AlltuplesapplyS.alter9 D;TuplesthatapplyS.blter9 D9\n  .;S.clter9 D9\n  .9\n \rD;S.dlter9 D9\n  .9\n \rD9\nD;S.elter\rWedonotexpectanyroutingschemetoperformthiswell,butitservesasausefullowerboundonthenumberofltersthatmustbeapplied.Thenalalternativeisahypotheticalworst-caseapproach,inwhicheverylterisappliedtoeveryquery:5lters,inthework-loadshownbelow.Noroutingschemeshouldperformthisbadly.Weranexperimentstoshowhowtheticketbasedschemecom-parestotheseotherapproachesforthexedqueriesshowninTable3.Wechosetouseaxedsetofqueriesratherthanrandomqueriesbecausequerieswithpredicatesoverauniformlyselectedrandomrangewilltendtoexperiencelittleoverlapandallselectaboutthesamenumberoftuples,causingtherandomandticketschemestoperformsimilarly.Sincethegoalofthisexperimentistoshowthattheticket-basedschemecaneffectivelydetermineselectivitiesofgroupedlterswhenthataffectsperformance,wefeltthiswasanappropriatedecision.Weranthesystemforoneminuteandcomparedthetotalnumberoftuplesscannedtothetuplesenteringeachlteroperator.Figure8showstheresults:ourticket-basedroutingschemeroutestheav-eragetupletojust1.3lters,whiletherandomizedschemerouteseverytupletoabout3.2lters.4.2AdaptingtoChangingWorkloadsInadditiontoroutingtuplesefciently,oneofthepropertiesofourcontinuousquerysystemisthatitcanrapidlyadapttochang-ingworkloads.Todemonstratethis,weranexperimentswiththreequeryworkloads,asshowninTable4.QueriesareoverthesamesourceSasinthepreviousexperiments.Intheseexperiments,therstquerywasintroducedattime0,andeachsuccessivequerywasintroducedvesecondslater.Intherstworkload,queriesareindependent,andso,justaswithaconventionaleddy,themostse- lectivepredicatesshouldbeexecutedrstsincethosearethemostlikelytolterouttuples.Inthiscase,queryveisthemostselec-tive.Thesecondworkloadshowsthecapabilityoftheticket-basedschemetoprioritizeltersthatapplytodifferentnumbersofpred-icates:allltershavethesameselectivity,butvetimesasmanypredicatesareappliedtoS.aasS.e.Thenalworkloadismuchmorecomplex:queriesshareworkandhavelterswitharangeofselectivities.Thecorrectorderingofthesequeriesisnotimmedi-atelyapparent.Figure9showsthepercentageofticketsroutedtoeachlterovertimeforthethreeworkloads.Percentageofticketsreceivedisameasureoftheroutingpolicy'sperceivedvalueofanopera-tor.Highlyselectiveoperatorsareofhighervaluebecausetheyreducethenumberoftuplesinthesystem,asareoperatorswhichapplypredicatesformanyqueries,becausetheyperformmorenetwork.Beforealterisintroduceditreceiveszerotickets;noticehowquicklythesystemadaptstonewlyintroducedlters:inmostcases,foursecondsafteralterisaddedthepercentageoftuplesitreceiveshasreachedsteady-state.Workload1and2settletotheexpectedstate,withthemostse-lective,mostfrequentlyappliedltersreceivingthebulkofthetick-ets.Workload3hasresultssimilartoworkload2,exceptthattheS.a,S.b,andS.cltersallreceiveaboutthesamenamenumberofticketsonceallquerieshavebeenintroduced.Thisisconsis-tent,becauseS.bandS.caremoreselective,butapplytofewerqueriessoareweightedlessheavily.AlsonotethatS.dandS.ereceiveslightlymoreticketsthaninWorkload2;thisisduetotheincreasedselectivityoftheirpredicates.5.PERFORMANCESTUDYTodemonstratetheeffectivenessofourCACQsystem,wecom-pareitwiththeapproachusedbytherecentlypublishedNiagaraCQsystem[3,2].NiagaraCQusesastaticqueryoptimizertobuildxedqueryplansforcontinuousqueries.NiagaraCQ'splansaregrouped,whichmeansthatoperatorsaresharedbetweenquerieswhenpossible.Theoptimizerallowstwoqueriestoshareanop-eratorifitcandemonstratethatthesetoftuplesowingintothatoperatorinbothqueriesisalwaysthesame.This“identicaltuplesets”requirementmustholdbecausetupleshavenoexplicitlyen-codedlineage,asinourCACQapproach,sothequeriestowhichatuplemaybeoutputcanonlybeinferredfromthetuple'slocationinthequeryplan.Inpractice,thismeansthatverylittleoverlapwillbepossiblebetweenqueriesofanycomplexity:althoughitmaybepossibletoshareaninitialselection,anyoperatorswhichfollowthatselectionmustbereplicatedacrossallqueries(eveniftheyhaveexactlythesamepredicates),becausethetuplesowingintothoseoperatorsarenotidentical.Ratherthancreatingapredicateindexforselectionoperators,NiagaraCQcombinesselectionsoveranattributeintoajoinbe-tweentheattributeandtheconstantsfromtheselectionpredicates.BecauseanefcientjoinalgorithmcanbeusedifaB-Treeindexisbuiltonthepredicates,thisapproachissimilarinefciencytoRandomTicketsOptimalWorst CaseRouting Policy012345Avg. Filters Per Tuple (Smaller is Better)Routing Scheme vs. Filters Per TupleFigure8:ComparisonofVariousRoutingSchemesourpredicateindex.However,whenpredicatesoverlap,multiplecopiesofeverytupleareproducedasoutputoftheNiagaraCQjoin,whichimposesanon-trivialperformanceoverhead.Tocomparethetwosystems,werunexperimentslikethosepro-posedin[2].Intheseexperiments,weexecutequeriesoftheform:SELECT*FROMstocksASs,articlesASaWHEREs.price�xANDs.symbol=a.symbolStocksisalistofstockquotes,andarticlesisasetofnewsarticlesaboutthecompaniesinthosequotes.Articlesrangedfromabout200bytesto1kilobyte.Stockpriceswererandomlyselectedusingauniformdistributionoverthedomain(0,100].Werunanumberofqueriesofthisform,varyingonlythevalueofx.No-ticethatthisworkloadisveryfavorabletowardstheNiagaraCQapproach,becausethereiscompleteoverlapbetweenqueries.AmoremixedassortmentofquerieswouldmakeitmuchharderfortheNiagaraCQoptimizertoperformitsgrouping.TheNiagaraCQoptimizergeneratestwopossibleplansforthesequeries,whichconsistofaselectionoperatorandajoin.Intherst,calledPushDown(Figure10(a))theselectionoperatorisplacedbe-lowthejoininthequeryplan.Allselectionscanbeplacedinthesamegroup,becausetheyareallovertheunlteredstocksrela-tion.However,thejoinoperatorscannotbeplacedintothesamegroupbecausethesetsoftuplesfromeachquery'sselectionaredisjoint–aseparatejoinmustberunforeachquery(althoughthehashtableoverarticlesissharedbetweenqueries.)Thesplitoperatorshownintheplanisaspecialoperatorthatdividestheout-putofagroupedoperatorbasedonthequeryspeciedbythefileattributeoftheconstantstable.Theotheralternative,calledPullUp,showninFigure10(b)placesthejoinatthebottomofthequeryplan.Sincethetuplesowinginfrombothrelationsareunlteredinallqueries,everyjoinoperatorcanbeplacedinasinglegroup.Sinceallqueriesuseexactlythesamejoinpredicate,theoutputofthegroupedjoinintoeveryse-lectionpredicateisidentical.Thus,thoseselectionpredicatescanallbeplacedintoasinglegroup.AstheresultsinFigure11show,thisPullUpapproachis(notsurprisingly)moreefcientbecausetherearenotmanycopiesofthejoinoperator.Notice,however,thatitsuffersfromadisturbingproblem–theselectionpredicatesmustbeappliedafterthejoin,whichiscontrarytowellestablishedqueryoptimizationwisdom[20].WecomparedthesetwoalternativestothesamequeryinourCACQsystem.Inoursystem,thisqueryconsistsofthreeopera-tors:agroupedlteronstocks.price,andapairofSteMsonstocks.symbolandarticles.symbol.Weusedthemodi-edticket-basedroutingschemediscussedabovetoscheduletuplesbetweentheSteMsandthegroupedlter.WemanuallyconstructedNiagaraCQqueryplansinTelegraphwiththestructureshowninFigure10.WhenemulatingNiagaraCQ,weremovedtheper-tupledatastructuresandthecodethatmanagesthem,sincethesearespecictoourCACQapproach.NoticethatwealsodidnotincludematerializationoperatorsintheNiagaraCQplans,aswasdoneinthatwork,sincewewereabletokeepalltu-Table4:QueryWorkloadsforAdaptivityScenario.Workload11.selectindexfromSwherea�302.selectindexfromSwhereb�503.selectindexfromSwherec�104.selectindexfromSwhered�405.selectindexfromSwheree�90Workload21.selectindexfromSwherea�102.selectindexfromSwherea�10andb�103.selectindexfromSwherea�10andb�10andc�104.selectindexfromSwherea�10andb�10andandc�10andd�105.selectindexfromSwherea�10andandb�10andc�10andd�10ande�10Workload31.selectindexfromSwherea�102.selectindexfromSwherea�10andb�303.selectindexfromSwherea�10andb�30andc�504.selectindexfromSwherea�10andb�30andandc�50andd�705.selectindexfromSwherea�10andandb�30andc�50andd�70ande�90 00.20.40.60.8105101520253035% of TicketsTime (s)% of Tickets Per Filter vs Time�Query 1: S.a 30�Query 2: S.b 50�Query 3: S.c 10�Query 4: S.d 40�Query 5: S.e 90(a)Workload100.20.40.60.810510152025303540% of TicketsTime (s)% of Tickets Per Filter vs Time�Filter 1: S.a 10 x 5�Filter 2: S.b 10 x 4�Filter 3: S.c 10 x 3�Filter 4: S.d 10 x 2�Filter 5: S.e 10 x 1(b)Workload200.20.40.60.810510152025303540% of TicketsTime (s)% of Tickets Per Filter vs Time�Filter 1: S.a 10 x 5�Filter 2: S.b 30 x 4�Filter 3: S.c 50 x 3�Filter 4: S.d 70 x 2�Filter 5: S.e 90 x 1(c)Workload3Figure9:PercentageofTicketsRoutedtoFiltersOverTime.Noticethatthemostselectivepredicates(S.e(90)in(a)and(S.a(10)in(b)rapidlyadapttoreceivethemosttickets,whicharecorrelatedwiththeirroutingpriorityintheeddy.plesandjointablesinmainmemory.Aswasdonein[2],weranexperimentsthatdeliveredaxedsizeupdatetostockpricesandnewsarticles(2250stocksandarticles,withonearticleperstock).Queryselectionpredicateswererandomlychosenfromtheuniformdistribution,althoughweinsuredthattheunionofallpredicatesselected100%ofthetuples.Weplacednolimitonthesumofse-lectivities(aswasdoneintheNiagaraCQwork),becausedoingsodoesnotsignicantlyaffecttheperformanceoftheirbestapproachoroursystem.Wevariedthenumberofdistinctqueries(distinctselectionpred-icates)from1to200andcomparedtheperformance,showninFigure11(a).NoticethattheCACQapproachisfasterthanthePullUpapproachforsmallnumbersofqueries,butthatitlaterbe-comesslightlyslower.TheoriginalperformancebenetisbecauseCACQcanapplyselectionsonsmallertuplesbeforeitcomputesthejoin;thePullUpapproachjoinsalltuplesrst.Forlargenum-bersofqueries,theCACQapproachissomewhatslowerduetotheadditionaltimespentmaintainingthetuplestate(whichwasnotin-cludedintheNiagaraCQexperiments).ThePushDownapproach(aswasthecasein[2])isslowerinallcases.NotethattheshapeofthelinesforthetwoNiagaraCQexperi-mentsshowninFigure11(a)closelymatchestheshapeofthelinesshowninFigure4.3of[2],suggestingthatouremulationoftheNiagaraCQapproachissound.TheseexperimentsshowthatourCACQapproachiscapableofmatchingthebestoftheNiagaraCQapproacheswithoutthebenetofacost-basedqueryoptimizer.Ourroutingpolicydeterminesthatthegroupedselectiononpricesismoreselectivethanthejoin,andthusroutestuplethroughthese-lectionrst.Inthenextsetofexperiments,wemodifytheabovescenariotoapplyaUDFoverbatchesofstockquoteswhichowintothesystem.Wexthenumberofsimultaneousqueriesat100,butwevarythenumberofarticlesperstockquote,tosimulatemulti-plenewsstoriesandsourcesreportingaboutaparticularcompany.Inthismodiedapproach,eachuserspeciesaUDFthatselectsstockquotesofinterest(insteadofasingle(predicate).Quotesareshippedinbatchesreectingseveralhoursworthofactivity,toallowUDFstoutilizemorethanjustthemostrecentstockpriceindecidingtoreturnaparticularquote.ThisisthesortofenvironmentJoinQuotesConstantTable......40 Q2JoinJoinSplitArticlesArticles...Q1Q2(a)PushDownPlanQuotesConstantTable......80 Q140 Q2JoinJoinSplitArticles...Q1Q2(b)PullUpPlanFigure10:TwoAlternativeQueryPlansinNiagaraCQ020000400006000080000100000120000140000020406080100120140160180200Time To CompleteNumber of QueriesNumber of Queries vs. Time To CompleteCACQNiagara PushDownNiagara PullUp(a)NormalSelections110100Articles per Quote110100100010000Seconds to CompleteNiagaraCQ (PullUp Approach)CACQCACQ vs. NiagaraCQ (UDF Experiment)(Logarithmic Scale)(b)UDFSelectionsFigure11:NiagaraCQvs.CACQwithTwoTypesofSelectionsseriousinvestorsmightuse:eachuser'sUDFwouldsearchfortheparametershethoughtwereparticularlyimportantindeterminingwhentobuyorsellastock;whenthoseparametersaresatised,quotesandrecentnewsarticlesaboutthosequotesarereturned.Noticethatinthiscase,wecannotuseagroupedltertoevaluateUDFsandNiagaraCQcannotgroupUDFsviaaBTree.TheresultsoftheseexperimentsareshowninFigure11(b).WecomparedonlyagainstthePullUpapproach,asthePushDownapproachremainsmuchslowerthanCACQ,forthesamereasonsasinthepreviousexperiment.Wevariedthecardinalityofthearticlesrelationsothattherewere1,10,or100articlesperquote.WesimulatedthecostofaUDFbyspin-loopingforarandomly,uniformlyselectedtimeovertheintervalof10-500S.Inthiscase,theCACQapproachismuchmoreefcientbecauseCACQappliesUDFstostockquotesbeforetheyarejoinedwithar-ticles,whileNiagaraCQmustapplyUDFsafterthejoinifitwishesperformonlyasinglejoin.Asthecardinalityofarticlesin-creasestheexpensiveUDFsmustbeappliedmanymoretimesintheNiagaraCQapproachthaninCACQ.Ingeneral,thisisalimita-tionoftheNiagaraCQapproachwhichcannotbeovercomeunlesslineagesareexplicitlyencoded.NiagaraCQcannotpushselectionsfromtwoqueriesbelowasharedjoinwithoutperformingthejoinmultipletimes;ifthefanoutofthejoinismuchgreaterthanone,thiswillseverelyimpairNiagaraCQ'sperformance.Furthermore,aswesawinthesingle-articlecase,NiagaraCQpaysapenaltyforperformingselectionsonlarger,joinedtuples.6.RELATEDWORKTheintegrationofEddiesandcontinuousqueriesinourCACQsystemisnecessarilyrelatedtobothareasofresearch.Wesum-marizethiswork,andalsodiscussrelatedsystemsintheadaptivequeryprocessing,sensor,andtemporaldatabasecommunities.Eddieswereoriginallyproposedin[1].Thebasicqueryoperator andtheback-pressureandticket-basedroutingschemeswerede-veloped.Notionsofadaptivityandpipeliningarewellestablishedintheresearchcommunity.Parallel-pipelinedjoinswereproposedin[28].AdaptivesystemssuchasXJoin,QueryScrambling,andTukwila[26,27,10]demonstratedtheimportanceofpipelinedop-eratorstoadaptivity.Existingworkoncontinuousqueriesprovidestechniquesforsi-multaneouslyprocessingmanyqueriesoveravarietyofdatasources.Thesesystemsproposethebasiccontinuousqueryframeworkthatweadoptandalsooffersomeextensionsforcombiningrelatedop-eratorswithinqueryplanstoincreaseefciency.Generallyspeak-ing,thetechniquesemployedfordoingthiscombinationarecon-siderablymorecomplexandlesseffectiveatadaptingtorapidlychangingqueryenvironmentsthanCACQ.Efcienttriggersystems,suchastheTriggerMansystem[6]aresimilartocontinuousqueriesinthattheyperformincrementalcom-putationastuplesarrive.Ingeneral,theapproachesusedbythesesystemsistouseadiscriminationnetwork,suchasRETE[5]orTREAT[15],toefcientlydeterminethesetoftriggerstorewhenanewtuplearrives.Theseapproachestypicallymaterializeinter-mediateresultstoreducetheworkrequiredforeachupdate.Continuousquerieswereproposedanddenedin[25]forlter-ingofdocumentsviaalimited,SQL-likelanguage.IntheOpenCQsystem[13],continuousqueriesarelikenedtotriggersystemswherequeriesconsistsoffourelementtuples:aSQL-stylequery,atrigger-condition,astart-condition,andanend-condition.TheNiagaraCQproject[3]isthemostrecentlydescribedCQsystem.Itsgoalistoefcientlyevaluatecontinuousqueriesoverchangingdata,typi-callyweb-sitesthatareperiodicallyupdated,suchasnewsorstockquoteservers.ExamplesoftheNiagaraCQgroupingapproachandadiscussionofitslimitationsaregiveninSection5above.Theproblemofsharingworkingbetweenqueriesisnotnew.Multi-queryoptimization,asdiscussedin[21]seekstoexhaus-tivelyndanoptimalqueryplan,includingcommonsubexpres-sion,betweenasmallnumberofqueries.Recentwork,suchas[19,16]providesheuristicsforreducingthesearchspace,butisstillfundamentallybasedonthenotionofbuildingaquery-plan,whichweavoidinthiswork.Fundamentalnotionsofstreamprocessingarepresentedin[22],includingextensionstoSQLforwindowsanddiscussionsofnon-blockingandtimestampedoperators.[4]proposeswindowsasameansofmanagingjoinsoververylargesetsofdata.[24]discussesoperatorsforprocessingstreamsinthecontextofnetworkrouting;itincludesaninterestingdiscussionofappropriatequerylanguagesforstreamingdata.[17]discussesmodelsofdatastreamingfromsensors.[14]pro-posesusingcontinuousqueriesforprocessingoverstreamsofsen-sordataandoffersmotivatingperformanceexamples,butfallsshortofprovidingaspecicframeworkforqueryevaluationanddoesnotincorporateadaptivity.7.CONCLUSIONSInthispaperwepresenttherstcontinuousqueryimplemen-tationbasedonacontinuouslyadaptivequeryprocessingscheme.Weshowthatoureddy-baseddesignprovidessignicantperfor-mancebenets,notonlybecauseofitsadaptivity,butalsobecauseoftheaggressivecross-querysharingofworkandspacethatiten-ables.Bybreakingtheabstractionofsharedrelationalalgebraex-pressions,ourTelegraphCACQimplementationisabletosharephysicaloperators–bothselectionsandjoinstate–ataverynegrain.Weaugmentthesefeatureswithagrouped-lterindextosimultaneouslyevaluatemultipleselectionpredicates.8.REFERENCES[1]R.AvnurandJ.M.Hellerstein.Eddies:Continuouslyadaptivequeryprocessing.InACMSIGMOD,Dallas,TX,May2000.[2]J.Chen,D.DeWitt,andJ.Naughton.Designandevaluationofalternativeselectionplacementstrategiesinoptimizingcontinuousqueries.InICDE,SanJose,CA,February2002.[3]J.Chen,D.DeWitt,F.Tian,andY.Wang.NiagaraCQ:Ascalablecontinuousquerysystemforinternetdatabases.InACMSIGMOD,2000.[4]D.DeWitt,J.Naughton,andD.Schneider.Anevaluationofnon-equijoinalgorithms.InVLDB,Barcelona,Spain,1991.[5]C.Forgy.Rete:Afastalgorithmforthemanypatterns/manyobjectsmatchproblem.ArticialIntelligence,19(1):17–37,1982.[6]E.Hanson,N.A.Fayoumi,C.Carnes,M.Kandil,H.Liu,M.Lu,J.Park,andA.Vernon.TriggerMan:AnAsynchronousTriggerProcessorasanExtensiontoanObject-RelationalDBMS.TechnicalReport97-024,UniversityofFlorida,December1997.[7]W.Heinzelman,J.Kulik,andH.Balakrishnan.Adaptiveprotocolsforinformationdisseminationinwirelesssensornetworks.InMOBICOM,Seattle,WA,August1999.[8]J.M.Hellerstein,M.J.Franklin,S.Chandrasekaran,A.Deshpande,K.Hildrum,S.Madden,V.Raman,andM.Shah.Adaptivequeryprocessing:Technologyinevolution.IEEEDataEngineeringBulletin,23(2):7–18,2000.[9]J.Hill,R.Szewczyk,A.Woo,S.Hollar,andD.C.K.Pister.Systemarchitecturedirectionsfornetworkedsensors.InASPLOS,November2000.[10]Z.G.Ives,D.Florescu,M.Friedman,A.Levy,andD.S.Weld.Anadaptivequeryexecutionsystemfordataintegration.InProceedingsoftheACMSIGMOD,1999.[11]J.M.Kahn,R.H.Katz,andK.S.J.Pister.Mobilenetworkingforsmartdust.InMOBICOM,Seattle,WA,August1999.[12]N.Lanham.Thetelegraphscreenscraper,2000.http://db.cs.berkeley.edu/nickl/tess.[13]L.Liu,C.Pu,andW.Tang.Continualqueriesforinternet-scaleevent-driveninformationdelivery.IEEEKnowledgeandDataEngineering,1999.SpecialIssueonWebTechnology.[14]S.MaddenandM.Franklin.Fjordingthestream:Anarchitectureforqueriesoverstreamingsensordata.SanJose,CA,February2002.ICDE.[15]D.P.Miranker.Treat:Abettermatchalgorithmforaiproductionsystemmatching.InProceedingsofAAAI,pages42–47,1987.[16]H.Mistry,P.Roy,S.Sudarshan,andK.Ramamritham.Materializedviewselectionandmaintenanceusingmulti-queryoptimization.InACMSIGMOD,2001.[17]P.Bonnet,J.Gehrke,andP.Seshadri.Towardssensordatabasesystems.In2ndInternationalConferenceonMobileDataManagement,HongKong,January2001.[18]V.Raman.InteractiveQueryProcessing.PhDthesis,UCBerkeley,2001.[19]P.Roy,S.Seshadri,S.Sudarshan,andS.Bhobe.Efcientandextensiblealgorithmsformultiqueryoptimization.InACMSIGMOD,pages249–260,2000.[20]P.Selinger,M.Astrahan,D.Chamberlin,R.Lorie,andT.Price.Accesspathselectioninarelationaldatabasemanagementsystem.pages23–34,Boston,MA,1979.[21]T.Sellis.Multiplequeryoptimization.ACMTransactionsonDatabaseSystems,1986.[22]P.Seshadri,M.Livny,andR.Ramakrishnan.Thedesignandimplementationofasequencedatabasesystems.InVLDB,Mumbai,India,September1996.[23]M.Shah,S.Madden,M.Franklin,andJ.M.Hellerstein.Javasupportfordataintensivesystems.SIGMODRecord,December2001.[24]M.SullivanandA.Heybey.Tribeca:Asystemformanaginglargedatabasesofnetworktrafc.InProceedingsoftheUSENIXAnnualTechnicalConference,NewOrleans,LA,June1998.[25]D.Terry,D.Goldberg,D.Nichols,andB.Oki.Continuousqueiesoverappend-onlydatabases.InACMSIGMOD,pages321–330,1992.[26]T.UrhanandM.Franklin.XJoin:Areactively-scheduledpipelinedjoinoperator.IEEEDataEngineeringBulletin,pages27–33,20002000.[27]T.Urhan,M.J.Franklin,andL.Amsaleg.Cost-basedqueryscramblingforinitialdelays.InACMSIGMOD,1998.[28]A.WilschutandP.Apers.Dataowqueryexecutioninaparallelmain-memoryenvironment.InPDIS,pages68–77,December1991.