/
IEEE International Conference on Data Engineering1084-4627/09 $25.00 IEEE International Conference on Data Engineering1084-4627/09 $25.00

IEEE International Conference on Data Engineering1084-4627/09 $25.00 - PDF document

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
393 views
Uploaded On 2015-07-21

IEEE International Conference on Data Engineering1084-4627/09 $25.00 - PPT Presentation

566 567 568 569 570 571 572 573 574 575 576 IEEE International Conference on Data Engineering1084462709 2500 ID: 89569

566 567 568 569 570 571 572 573 574 575 576 IEEE International Conference Data

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "IEEE International Conference on Data En..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

IEEE International Conference on Data Engineering1084-4627/09 $25.00 © 2009 IEEEDOI 10.1109/ICDE.2009.79565 566 567 568 569 570 571 572 573 574 575 576 IEEE International Conference on Data Engineering1084-4627/09 $25.00 © 2009 IEEEDOI 10.1109/ICDE.2009.79565 566 567 Weadoptasubsetof XCORE expressionsinTableII,which issufÞcienttocapture XPATH 1.0and XQUERYFLWOR expressions[8].Weusearepresentationof XPATH pathsinour XCORE grammarthatkeepsconsecutivestepstogether,rather thannestingeachstepinaseparate for -loop(whenallowed Ðtheuseof position() precludesthis).Suchanoptimization iscommonin XQUERY engines,andispartof XQUERY normalization,furtherdescribedinSectionIV.Additionally, wedeÞnetwonewrulesfortheXRPCextension[30]: ::= ::= excuteatŽ { ŽExprSingle } ŽfunctionŽXRPCParam { ŽExpr } Ž ()Ž | (Ž $ ŽVar:=ŽVarRef(,ŽXRPCParam) ? )Ž 27: XRPCExpr 28: XRPCParam Rule 27 identiÞesan xrpc:// URI inexpression ExprSingle ,and declaresanew anonymousfunction thatistobeexecuted remotely.Itisnoticeablethat thesegrammarruleslackthe expressivepowertodeÞnerecursivefunctions.Thisdoes notmatterfor XQUERY decomposition,asourdecomposition strategieswillnotgeneraterecursivefunctions.Wealsonote thatthesyntaxdeÞnedbyrules 27 and 28 differsfromthe actual XRPC syntax( executeat { Expr }{ FunApp(ParamList) } ). Thesyntaxusedhereisonlyforpresentationpurpose,to avoidtheneedtodeÞneallrulesconcerningdeclarationof user-deÞnedfunctions.Thus,oursimple XCORE rulewithout explicituser-deÞnedfunctiondeclarationsallowstoexpress allqueriesinasingle Expr ,whichinturncanbemappedtoa querygraph.ThissimpliÞestheformulationofanalysissteps. A.XCoreDependencyGraph Weintroduceadependencygraph( d -graph)foran XCORE query.Considerthe XQUERY query Q 2 inTableIII,which asksforthegradein course42 ofstudentshavinga tutor who isalsoa student ,andits XCORE equivalence Q c 2 . A dependencygraph isadirected,orderedandconnected graph G withvertices V ( G ) andedges E ( G ) .Eachvertex v is denotedas v i :rule[val] ,where v i isauniquevertexidentiÞer, rule isthegrammarrulerepresentedby v i ,and val isan optionalvalueindicatingtheright-hand-sideof rule .There isasingle root vertexwithoutincomingedges. E ( G ) consists ofparseedges E p ( G ) andvarrefedges E v ( G ) .Each parse edge isanorderedvertexpair ( u,v ) ,where u correspondsto aparsingrule r u thatdirectlycausestheuseofanotherparsing rule r v .A varrefedge isanorderedvertexpair ( w,x ) denoting avariableusage.Whena VarRef ruleisused,anadditional edgeiscreatedbetweenthe VarRef vertexandthe Var vertex thatdeÞnesthevariable. Example3.1: Figure2showsthe d -graphof Q c 2 inTableIII. Solidanddashedlinesrepresentparseandvarrefedges, respectively.Forinstance,thevariablebindingintheÞrst let expressioncorrespondstovertices v 2 ,...,v 7 ,andvertices v 8 , ··· ,v 39 depictits return clause.Theedge ( v 4 ,v 5 ) isa parseedge.Theedge ( v 30 ,v 9 ) isavarrefedge,asthevariable usedby v 30 isareferenceofvariable $ c introducedby v 9 . Thus,ad-graphisinessenceaparse-treewithadditional (dashed)edgestoindicatevariableusages. WedeÞnethreetypesofdependencyrelationshipsuponthe reachabilitybetweentwovertices x,y in V ( G ) : (1) x Òparse- v 1 :/grade v 2 :LetExpr v 3 :Var[ $ s] v 4 :/person v 5 :/people v 6 :FunCall[doc] v 7 : Literal [ ··· /students.xml ] v 8 :LetExpr v 9 :Var[ $ c] v 10 :FunCall[doc] v 11 : Literal [ ··· /course 42 .xml ] v 12 :LetExpr v 13 :Var[ $ t] v 14 :ForExpr v 15 :Var[ $ x] v 16 :VarRef[ $ s] v 17 :IfExpr v 18 := v 19 :/tutor v 20 :VarRef[ $ x] v 21 :/name v 22 :VarRef[ $ s] v 23 :ThenElse v 24 :VarRef[ $ x] v 25 :() v 26 :ForExpr v 27 :Var[ $ e] v 28 :/exam v 29 :/enroll v 30 :VarRef[ $ c] v 31 :IfExpr v 32 := v 33 :@id v 34 :VarRef[ $ e] v 35 :/id v 36 :VarRef[ $ t] v 37 :ThenElse v 38 :VarRef[ $ e] v 39 :() Fig.2.D-graph basic XQUERY query (let$s:=doc(“xrpc://A/st udents.xml”)/people/person, $c:=doc(“xrpc://B/course42.xml”), $t:=$s[tutor=$s/name] for$ein$c/enroll/exam Q 2 where$e/@id=$t/id return$e)/grade XCORE variant (let$s:=doc(“xrpc://A/students.xml”)/child::people/child::personreturn let$c:=doc(“xrpc://B/course42.xml”)return let$t:=for$xin$sreturn if($x/child::tutor=$s/child::name)then$xelse() returnfor$ein$c/child::enroll/child::examreturn Q c 2 if($e/attribute::id=$t/child::id)then$eelse())/child::grade normalized XCORE variant (let$t:=(let$s:=doc(“xrpc://A/students.xml”)/child::people/child::person returnfor$xin$sreturn if($x/child::tutor=$s/child::name)then$xelse()) returnfor$ein(let$c:=doc(“xrpc://B/course42.xml” Q n 2 return$c/child::enroll/child::exam) returnif($e/attribute::id=$t/child::id)then$eelse())/child::grade TABLEIII E XAMPLE Q UERY Q 2 dependsonÓ y ,denotedas x p  y ,if y isreachablefrom x via only parseedges; (2) x Òvarref-dependsonÓ y ,denotedas x v  y ,if y isreachablefrom x viaatleastonevarrefedge; and (3) x ÒdependsonÓ y ,denotedas x  y ,ifeither x p  y or x v  y holds.Thecompositionalnatureof XQUERY meansthat x  y conciselycapturesallsemanticdependenciesbetween subexpressions. ConsiderFigure2, v 15 p  v 16 ,since ( v 15 ,v 16 ) isa parseedge; v 15 v  v 3 ,as v 3 isreachablefrom v 15 via ( v 15 ,v 16 ) , ( v 16 ,v 3 ) and ( v 16 ,v 3 ) isavarrefedge. Fora d -graph G andavertex r s  V ( G ) ,weusethe term subgraph tomeanthevertex-inducedsubgraphof r s , including r s andall u  V ( G ) where r s p  u ; r s iscalled the root ofthesubgraph.Forinstance,thesubgraphrootedat vertex v 26 containsvertices v 26 , ··· v 39 ,butdoesnotcontain vertices v 13 ,...,v 25 .Throughoutthispaper,weusetheterms (sub)graphand(sub)queryinterchangeably,asa(sub)queryis representedbytheinducedsubgraphrootedatsomevertex. 569 570 571 Excerptofrequestmessagefor makenodes()  request   projection-paths   used-path /   returned-path  parent::a  / xrpc:returned-path   / projection-paths   fragments /  Excerptofresponsemessagefor makenodes()  env:Envelope...   env:Body   response   fragments   fragment  a  b  c/  /b  /a  /fragment   /fragments   call   sequence  element fragid=“1”nodeid=“2”  /sequence   /call   /request   /env:Body   /env:Envelope  Fig.5.Pass-By-ProjectionMessages usedandreturnednodesplusthedescendantsofthereturned nodes,andisqueriedwith Q . Therearethreereasonswhyprojecting XML isextremely interestingfordistributed XML processing: (i) untilnow,when sendingnodes,wehadtoserializealldescendantsÐwhich potentiallycontainhugesubtreesthatmayremainuntouched ontheotherside.Thisamountstowastednetworkbandwidth aswellasserializationandshreddingeffort. (ii) ifdocuments areprojectedintoleanskelet onsthatonlycontaintherelevant portions,itbecomesfeasibletoserialize XML fragments fromsome lowestcommonancestor on,possiblyeventhe documentroot.Evenwithpass-by-fragment,theexecutionof reverse/horizontal XPATH axesonremotenodesisimpossible. Byextendingprojecting XML withsupportforreverseand horizontalaxes,however,wegetatooltopreciselyidentify thislowestcommonancestorofan XML documentthatneeds tobeincludedtoallowcorrectremoteexecutionofthoseaxes. (iii) theprojectiontechniquecanevenbeappliedtosupportthe built-infunctions fn:root() and fn:id()/fn:idref() ,i.e.,bytaking thelowestcommonancestorofthose,ifapathcontainsone ofthesefunctions. Forthesereasons,wefurtherreÞnethepass-by-fragment messagepassingsemanticsintotheso-called pass-by- projection semantics. XML projectioncanbeusedinboth directions:toprojectthepar ametersinarequestmessage,and toprojectthefunctionÕsresultsequencebeforeshippingback theresponse. Insertionconditions. Pass-by-projectionremovestheby- fragmentinsertionconditions(inSectionV)iandiv,such thatonlyiiandiii,i.e.,theapplicationofnodecomparison, nodesetoperatorsandaxisstepsontopofmultiplecallsto fn:doc() withthesame URI ,remainsillegal. Messageextension:projectionpaths. Weintroduceanop- tionalelementasasub-elementofa request tag: projection- paths ,whichinturnhaszeroormorechildelements returned- path and used-path .Inthenewpass-by-projectionsemantics, theabsenceorpresenceofthis elementdetermineswhether theresponsemessageshouldbeintheoriginalpass-by-value orthenewpass-by-projectionformat. ::= ::= ::= | | | ::= ProjectionPath SimplePath Axis NodeTest doc (ŽLiteral::ŽLiteral)Ž(  / ŽSimplePath )* AxisNodeTest | SimplePath / ŽAxisNodeTest self::Ž | child::Ž | attribute::Ž descendant::Ž | descendant-or-self::Ž  ancestor ::Ž |  ancestor-or-self ::Ž |  parent ::Ž  root() Ž |  id() Ž |  idref() Ž (( NCName | ):)?( NCName | ) | node()Ž | text()Ž TABLEV G RAMMAR R ULE E XTENSIONOF ProjectionPath ( BOLD ) Example6.1: Toillustrateprojected XRPC messages,the upperpartofFigure5showspartoftherequestmessage forthecallfrom Q 1 (discussedinProblem 4 ): let$bc:=executeat { “example.org” }{ makenodes() } sincetheprojectionpathanalysisdetectsthat $bc willsub- sequentlybeusedascontextnodebyaparentstep: $abc :=$bc/parent::a ,therequestmessagespeciÞes parent::a asa returnedpath.Therefore,theresponsemessagecontainsthe fullfragment  a  b  c/  /b  /a  towhich $abc thengets correctlybound. A.ExtendingProjectedXML Weextendthepathgrammarrules[18]andpathan- notations,tohandlefull-ßedged XQUERY involvingre- verse/horizontal XPATH stepsandbuilt-infunctions.Theex- tendedgrammarrulefor ProjectionPath isgiveninTableV. Wedenotepathannotationsinprojected XML asfollows: Env ( v i )  Expr  Paths 1 using Paths 2 Thenotation Env ( v i ) isusedtoidentifythepathannotation environmentatacertainvertex v i inthe XQUERY d -graph. Suchannotationsareconstructedbottomupby pathanalysis rules thatderivetheused( UPaths )andreturned( Paths ) pathsforeach XCORE expressionintermsofusedand returnedpathsofitssubexpressions.Thebasicpathanalysis ruleshavebeendiscussedin[18],suchas literalvalues , sequences , for and let expressionsand XPATH steps,etc.Our extensiontoincludereverse/horizontal XPATH stepsbringsno changesforthepathanalysisrules,butmustbesupported bytheloadingalgorithm,whichisdescribedinSectionVI- B.Wecomplementtherulesforbuilt-infunctions,which apartfromtheunsolvedcasesmentionedunder Problem 5 in SectionII( fn:root(),fn:id(),fn:idref() )alsoincludes fn:doc() .The descriptionofthebasicprojectiontechniqueassumesasingle document.Asindistributedqueryprocessingtherearealways multipledocuments,ourpathsalwaysstartwith fn:doc( URI ) . Pathanalysisrules. Weprovideonerulefor fn:doc() witha constantparameterandanotherforcomputed URI s: ( DOC 1 )() Env ( v i )  doc ( Literal 1 )  doc ( Literal 1 :: v i ) using  ( DOC 2 ) Env ( v j )  Expr j  Paths j using UPaths j Env ( v i )  doc ( Expr j )  doc (  :: v i ) using Paths j UPaths j AsmentionedinSectionIV,inthedeÞnitionof D ( v x ) 3 ,we useawildcard URI  ifthedocumentnameisanexpression. 3 Weusethe doc(..) preÞxesofthe returnedpaths annotationson v asa morepreciseformofthe D ( v ) property.Documentsthatwereonlyusedbut notreturnedwerealsobe partoftheoriginal D ( v ) ,butthesewillnotcause semanticproblems. Alsonotethatallpathsstartwith doc( URI :: v i ) ,thusidentifying bothdocument URI aswellasthevertex v i whereitisloaded. ThisnotationfacilitatestheidentiÞcationofsituationswhere thesame URI isloadedtwice(thefunction hasMatchingDoc() ). Asimilarrulecanbeformulatedfor XML elementconstruc- tion,producingareturnpath doc ( v i :: v i )withanartiÞcialunique URI .Therulefor fn:root() is: ( ROOT ) Env ( v j )  Expr j  Paths j using UPaths j Env ( v i )  fn : root ( Expr j )  p  Paths j p /root()using UPaths j Thebuilt-infunction fn:root() withasingleparameteristreated inthepathannotationsmuchlike XPATH axissteps,wherethe parameterhasbecomethepathpreÞx.Inthispathnotation, functionsremaineasilyrecognizablebytheparentheses.The rulesforthebuilt-infunctions fn:id()/fn:idref() ,arehighly similar(only fn:id() provided): Env ( v j )  Expr j  Paths j using UPaths j ( ID ) Env ( v k )  Expr k  Paths k using UPaths k Env ( v i )  fn : id ( Expr j ,Expr k )  p  Paths k p /id() using Paths j UPaths j UPaths k TheÞrstparameterof fn:id() isignoredbytheannotations, asitcontainsstringvalues,andtheprojectionannotation frameworkonlyallowsfortheestimationofnodesets.This hastheconsequencethatourloadingalgorithmwillconserve all elementswithan ID/IDREF attribute. B.RuntimeXMLProjection Theextensionswemadeto XML projection,namelysupport forreverse/horizontal XPATH axesand fn:root(),fn:id()/fn:idref() , couldnotbetriviallyintegratedintheloadingalgorithm of[18].However,inthiscasewearenotreallylookingfor aloadingalgorithmthatefÞcientlyreads(shreds)an XML Þleintoaprojectedrepresentation.Rather,thedocumentsare alreadypresent(andindexed)inthe XQUERY engine,and runtimemessageprojectionisa serialization task.Therefore, weproposeanew runtime approachforprojection,targeted atserialization,ratherthanat shredding.Whereastheoriginal loadingalgorithmstartsatthedocumentroot,andevaluates absolute usedandreturnedpaths,ourruntimeprojectionalgo- rithmstartsinarun-timestate,thatis,withareal, materialized contextsequence (e.g.,theparametervaluesthatareaboutto beserializedina SOAP message),andexecutesonly relative pathsonthem.Becausethenodesequenceboundatrun- timetoafunctionparameterisonlyasubsetofthenode setcharacterizedbyitscompile-timepathannotation(e.g.,its contentsmaywellhavebeenre ducedbyapplyingaselection predicate),thisruntimeproj ectiontechniquecanbemuchmore precisethantheoriginalprojectionalgorithm. Forthesereasons,our runtime approachforprojection simplyreliesonthenormal XPATH evaluationcapabilitiesof the XQUERY engineforfullyevaluatingallusedandreturned pathannotationsone-by-one(andunitingthemwith union() ). Doingso,itproducesa usednodesetU anda returnednode setR .Thesetwosetsaretheinputfortheruntimeprojection algorithmlistedinAlgorithm1. Algorithm1 :R UNTIME XMLP ROJECTION ( U,R, D ) input : U -usednodes,sortedondocumentorder R -returnednodes,sortedondocumentorder D -theoriginal XML document output : D  -theprojectionof U and R on D projectionnodes P U R andsortedondocumentorder; 1 proj Þrstnodein P ; 2 cur Þrstnodeof D ,i.e.,rootnode; 3 while ¬ P.end () do 4 if proj isadescendantof cur then 5 add cur to D  ; 6 cur nextnodein D ; 7 elseif proj = cur then 8 if proj isareturnednode then 9 add cur andalldescendantsof cur to D  ; 10 cur nextfollowingnodeof cur in D ; 11 while proj.next isadescendantof proj do 12 proj proj.next pruneprojectionnodes; 13 end 14 else 15 add cur to D  ; 16 cur nextnodein D ; 17 end 18 proj proj.next nextprojectionnode; 19 else 20 cur nextfollowingnodeof cur in D ; 21 end 22 end 23 cur rootnodeof D  ; 24 while cur hasonlyonechildnode cur/ { U R } do 25 cur Þrstchildof cur ; 26 end 27 Theruntimeprojectionalgorithm identiÞesallprojection nodesinthe XML treerepresentationoftheoriginaldocument, bytraversingthetreetop-downdepth-Þrst.Duringtraversal,if thecurrentnode cur ofthe XML documentisanancestorofthe currentprojectionnode proj (line5), cur isaddedtooutput D  andmovedtothe nextnode indocumentorder.Ifa proj is found(line8), proj isaddedto D  ;ifthis proj isareturned node,allitsdescendantsarealsoappended.Then cur ismoved toits nextfollowingnode inthedocument.Otherwise,ifthe currentprojectionnode proj isnotadescendantof cur ,the subtreeof cur canbeskipped(line21).Thoughthisalgorithm isformulatedonanabstractlevelthatisindependentofthe particular XML storageschemeusedinan XQUERY engine,it issafetoassumethatskippingasubtreeisfast(either O (1) or O ( log ( |D| )) ).Attheendofthealgorithm(lines24-27), post-processingisperformedtoremoveunnecessarynodes,as weareonlyinterestedinthe lowestcommonancestor ofall inputnodesintheprojecteddocument D  . Example6.2: Consideran XML document D inFigure6(a). Assumethattheusednodeset U is { i } ,andthereturnednode set R is { d,k } .Figure6(b)showstheprojecteddocument D  ofapplyingAlgorithm1on U , R and D . Thealgorithmstartswith P { d,i,k } , proj  d and cur  a .Wetraversethetreeusing cur from a to d .Nodes a , b and c areaddedto D  ,sincetheyareancestorsofthe currentcontextnode d .Nodes d,e and f arealsoaddedto D  ,as d isareturnednode.Then, cur isadvancedto g ( d Õs nextfollowingnode).Becausethenextcontextnode i isnotin a b c d e f g h i j k l m n o (a) Original XML tree D b c d e f i k l m (b) Projectedtree D  Fig.6.Runtime XML ProjectionExample thesubtreeof g ,thesubtreeisskippedbyadvancing cur to i . Recallthat i isausednode,thusonly i isaddedto D  .Thelast contextnodeis k .Ourcurrentdocumentnode cur traverses from i to j ,andthento k ,wherewecanaddnodes k , l and m to D  .Thetraversalcanbeterminated,becausethereis nomorecontextnodestoprocess.However,theintermediate result D  contains all commonancestorsof { d,i,k } .Thepost- processingremovesnode a from D  ,whichproducestheÞnal projecteddocument D  asshowninFigure6(b). Relativeprojectionpaths. At compiletime ,the XQUERY compilerbuildsaquerygraph( d -graph)withroot v root , normalizesitfollowedbydecompositionandcodemotion. ForeachinsertedXRPCExpr v xrpc ,andforeach XRPCParam parametervertex v param ,itthenextractstherelativepaths: U rel ( v param ) = allSufxes ( R ( v param ) ,U ( v xrpc )) R rel ( v param ) = allSufxes ( R ( v param ) ,R ( v xrpc )) U rel ( v xrpc ) = allSufxes ( R ( v xrpc ) ,U ( v root )) R rel ( v xrpc ) = allSufxes ( R ( v xrpc ) ,R ( v root )) ,with: allSufxes ( Paths i ,Paths j ) = { s j | p i /s j  Paths j :  p i  Paths i } At runtime ,   v param U rel ( v param ) and   v param R rel ( v param ) areusedtoprojecttheparametersintheoutgoing XRPC requestmessage.The U rel ( v xrpc ) and R rel ( v xrpc ) arepassed inthe projection-paths elementsuchthattheremotepeer canappropriatelyapplythesepathstoprojecttheresponse message. ProjectingadocumentwithAlgorithm1requirespre- calculatedusedandreturnednodesets.Thesesetsaresimply computedusingthe XPATH evaluationinfrastructureofthe underlying XQUERY engine,byfeedingtheintermediateresult $ctx param correspondingto v param ascontextsequenceinto allsufÞxpaths s i  U rel ( v param ) (resp. R rel ( v param ) ): union($ctx param /s 1 ,union($ctx param /s 2 , .... union($ctx param /s n Š 1 ,$ctx param /s n )...)) Paths $ctx /path i / root() /path j withfunction fn:root() areexe- cutedas root($ctx) /path j .Similarly, $ctx /path i / id() /path j is executedas root($ctx) // attribute() ::( a 1 | .. | a n ) /../path j ,where a 1 ,..,a n areall ID attributes(resp. IDREF incaseof idref() ). Therequesthandlerontheremotesideusesthe samemethodtoevaluatethesufÞxpaths U rel ( v xrpc ) and R rel ( v xrpc ) usingtheresultsequenceofthefunctionas $ctx xrpc duringserializationoftheresponsemessage. Incaseof XML datawithauser-deÞned XMLSCHEMA ,the defaultprojectionalgorithmislikelytothrowawaymandatory elementsandattributes.Forthisreason,theruntimeprojection algorithmshouldbemadeschema-aware.Asimplesolution istoensurethatonlyelementswitha minoccurs declaration ofzero(i.e.,optionalelements)areremoved.Onecanalso envisionmoreadvancedvariantsthatfurtherreducethesize ofatyped XML document. VII.E VALUATIONIN M ONET DB/XQ UERY Wehaveimplementedtheproposedalgorithmsin Mon- etDB/XQuery [4],apurelyrelational XML databasesystemthat usesthe PathÞnder [10] XQUERY compiler.Weusethe XRPC extensionforremotefunctionevaluation.Thetestplatform consistedofthree 2 GHzAthlon64Linuxmachinesconnected via 1 Gb/sEthernet.Eachwasequippedwitha 2 GBRAM. ThebenchmarkdatausedisXMark[23],apopular XML benchmarkforevaluating XQUERY efÞciencyandscalability. Thedatasetwasgeneratedusingscalarfactors 0.1,0.2, 0.4,0.8 and 1.6 .Adatasetisstoredoneachremotepeer. Weconductedthreegroupsofexperiments:bandwidthusage, queryexecutiontimeandruntimeprojectionprecision.Note that,astherearenoothercompar ativeresultsexist,themain goalofourexperimentsistoshowtheimpactoftheproposed techniquesinastep-by-stepfashion. WeslightlymodiÞedthequery Q n 2 (inTableIII)sothatit conformstotheXMarkschemaasthefollowing: (let$t:=let$s:=doc(“xrpc://peer1/xmk nn MB.xml”) /child::site/child::people/child::person returnfor$xin$sreturnif($x/descendant::age 40)then$xelse() returnfor$ein(let$c:=doc(“xrpc://peer2/xmk nn MB.auctions.xml”) return$c/descendant::open auction) returnif($c/child::seller/attrib ute::person=$t/attribute::id) then$c/child::annotationelse())/child::author Alltechniquesdiscussedinthispaperareappliedontheabove query: (i) underthepass-by-valuesemantics,onlytheexpres- sion doc(“xrpc://peer1/xmk nn MB.xml”)/.../child::person can bedecomposedandexecutedon peer1 ; (ii) underthepass- by-fragmentsemantics,wecandecomposeboththesecond let clause( “let$s:=...” )andthesecond for -loop( “for$ein ...” ),andexecutethemon peer1 and peer2 respectively.The variable $t becomestheparameterofthegeneratedfunction containingthesecond for -loop(seealsoTableIV); (iii) under thepass-by-projectionsemantics,thequeryisdecomposed inthesamewayasusingpass-by-fragment,however,when serializingtherequestmessages,aprojectionof $t/attribute::id (parameterprojection)and $c/child::annotation/child:author (re- sultprojection)iscalculated.Thetestsetthuscontainsfour queriesintotal,andeachofthemisexecutedon 2 documents ofsizes 10,20,40,80 and 160 MB. Bandwidthusage. Figure7showsthebandwidthusedby eachbenchmarkqueryondifferentsetofdocuments,i.e., thetotalsizeof XML documentsplustotalsizeof XML messagestransferredamongpeers,initsy-axis.Thex-axis isthetotalsizeofthe XML documentsusedbyeachquery. Thepuredata-shipping XQUERY query(theleftmostbar) coststhelargestbandwidthusage,asbothdocumentshave tobeshipped.By-valuedecompositioncanpushthe XPATH step doc(“xrpc://peer1/xmk nn MB.xml”)/.../child::person tobe evaluatedon peer1 ,whichreducestheamountofdatasent from peer1 tothelocalpeer.However,theseconddocument 575 protocol,whichlackspropersupportfor XML elementsand sequences;aproblemaddressedby XRPC usingaspeciÞc literal SOAP messageencoding.ActiveXML(AXML)[1],[2] isadeclarativeframeworkthatharnesseswebservicesfordata integrationinapeer-to- peerarchitecture.Like XRPC ,italso useda(document/literalencoding) SOAP protocoltorepresent XML subtreevalues.However,thefocusin AXML hasbeen inadaptivecallmaterializationstrategies,notonautomatic querydecompositionandthesemanticchallengesthisbrings in XQUERY ,suchasdistributednodeidentity.XQueryD[22], like XRPC supportsfunctionshippingin XQUERY ,butitdoes notdeÞneanopennetworkprotocol. Decomposingqueriestoaddressmultipledatasourcesisby nowawell-studiedprobleminrelationaldatabases[28]and object-orienteddatabases[13],[17].Manyoftheseideasand methodscanbeappliedto XQUERY ,yetwehaveshownhere thattheissueofefÞcientlymanagingdistributednodeidentity anddocumentorderaddinterestingchallenges.[24],[25] discussthedecompositionofunstructuredquerylanguages onlyonasemi-structureddatab ase(arooted,labeledgraph). In XML data-bases,previousapproachesrequirestructural informationaboutpeersfors upervisingdecomposition[27]. Otherworks[6],[7],[26]onlyfocusonarestrictedsetof XQUERY queries. XML projection[18]drasticallyreducesthesizeofthe datamodelrepresentationusingcompile-timequerycharac- terization.[5]introducesaprecise XML pruningtechnique forasubsetof XQUERYFLWOR expressions,basedonthe apriori knowledgeofadataguideforunderlying XML data. However,itdoesnothandle XPATH predicates,backwardaxes and XQUERY -likelanguages.Atype-based XML projection technique[3]isstudiedtoimprovecurrentsolutionswith comparableorhigherprecisionandlesspruningoverhead,as wellassupportingbackward XPATH axes.However,a DTD is required.[15]discussesruntime XML projectiontechniques. Basedonthestaticcompilationofruntimelookup-tablesanda runtime-automatonfromprojectionpathsanda DTD ,theycan Þltertheinput XML documentefÞcientlyusingstringmatching algorithms.Thistechnique,however,stilllackssupportfor reverse XPATH axesand XQUERY built-infunctions. IX.C ONCLUSION Wehavedescribedaframeworkfordistributedexecution offull-ßedged XQUERY ,focusingontheissueofproviding equivalentquerydecompositions,inthefaceofsemantic differenceswhen(partsof)nodesareshippedacrossthe networkin XML messages.WeÞrstcarefullycharacterized theproblemsthatmayoccu rregardingnodeidentityand structural XPATH relationshipsinsuchadistributedsetting. Then,weproposedaseriesoftechniquessuchaspass-by- fragmentandtheuseofanovelruntime XML projection methodforserializing XML messages,thatremovevirtually allsemanticproblemsandstronglyimproveperformance,as shownbyexperimentsontheopen-sourceMonetDB/XQuery XML databasesystem( monetdb.cwi.nl ). Ourmainfutureworkisanissueleftout-of-scopehere: decidingondistributedqueryplacementafterdecomposition. Inthisarea,wealsocontemplateusingruntimemethodstoim- proveoptimizationquality.Anotherdirectionisdecomposition ofqueriescontaining XQUF updateexpressions.Thechallenge hereisthatupdatesarenecessarilytiedtoexecutionontheir sourcepeer,whichrestrictsdecompositiontocaseswhereat compile-timeasingleaffect edpeercanbeidentiÞed. R EFERENCES [1]S.Abiteboul,O.Benjelloun,B.Cautis,I.Manolescu,T.Milo,and N.Preda.LazyQueryEvaluationforActiveXML.In SIGMOD ,2004. [2]S.Abitebouletal.AFrameworkf orDistributedX MLDataManage- ment.In EDBT ,2006. [3]V.Benzakenetal.Type-BasedXMLProjection.In VLDB ,2006. [4]P.Bonczetal.MonetDB/XQuery:AFastXQueryProcessorPowered byaRelationalEngine.In SIGMOD ,2006. [5]S.Bressanetal.AcceleratingqueriesbypruningXMLdocuments. Data Knowl.Eng. ,54(2),2005. [6]P.Bunemanetal.UsingPartialEvaluationinDistributedQuery Evaluation.In VLDB ,2006. [7]G.Congetal.Distributedqueryevaluationwithperformanceguarantees. In SIGMOD ,2007. [8]D.Draperetal.XQuery1.0andXPath2.0FormalSemantics.W3C CandidateRecommendation8June2006. [9]M.Fern« andezetal.HighlyDistributedXQuerywithDXQ.In SIGMOD , 2007. [10]T.Grustetal.XQueryonSQLHosts.In VLDB ,2004. [11]M.Gudgin,M.Hadley,N.Mendels ohn,J.-J.Moreau,andH.F.Nielsen. SOAPVersion1.2Part1:MessagingFramework.W3CRecommen- dation24June2003.http://www.w3.or g/TR/2003/REC-soap12-part1- 20030624. [12]M.Gudgin,M.Hadley,N.Mendels ohn,J.-J.Moreau,andH.F.Nielsen. SOAPVersion1.2Part2:Adjuncts.W3CRecommendation24June 2003.http://www.w3.org/TR/2003/REC-soap12-part2-20030624. [13]V.JosifovskiandT.Risch.Querydecompositionforadistributed object-orientedmediatorsystem. DistributedandParallelDatabases , 11(3):307Ð336,2002. [14]J.KnoopandB.Steffen.Codemotionforexplicitlyparallelprograms. SIGPLANNot. ,34(8):13Ð24,1999. [15]C.Kochetal.XMLPreÞlteringasaStringMatchingProblem.In ICDE ,2008. [16]D.Kossmann.Thestateoftheart indistributedqueryprocessing. ACM ComputingSurveys ,32(4),2000. [17]H.Kozankiewicz,K.Stencel,andK.Subieta.Distributedquery optimizationinthestack-basedapproach.In HPCC ,2005. [18]A.Marianetal.ProjectingXMLDocuments.In VLDB ,2003. [19]N.Mitra.SOAPVersion1.2Part0:Primer.W3CRecommendation24 June2003.http://www.w3.org/TR/2003/REC-soap12-part0-20030624. [20]N.OnoseandJ.Sim« eon.XQueryatYourWebService.In WWW ,2004. [21]M.T. ¬ OzsuandP.Valduriez. Principlesofdistributeddatabasesystems (2nded.) .Prentice-Hall,Inc.,NJ,USA,1999. [22]C.Reetal.DistributedXQuery.In IIWeb ,September2004. [23]A.Schmidtetal.XMark:ABen chmarkforXMLDataManagement. In VLDB ,2002. [24]D.Suciu.Querydecompositionandviewmaintenanceforquery languagesforunstructureddata.In VLDB ,1996. [25]D.Suciu.Distributedqueryev aluationonsemistructureddata. ACM Trans.DatabaseSyst. ,27(1),2002. [26]K.TajimaandY.Fukui.Answeri ngXPathqueriesovernetworksby sendingminimalviews.In VLDB ,2004. [27]L.T.T.Thuy,D.D.Duong,V.C.Bhavsar,andH.Boley.Abottom-up strategyforquerydecomposition.In ICDIM ,2006. [28]E.WongandK.YousseÞ.Decomposition-astrategyforquery processing. ACMTrans.DatabaseSyst. ,1(3):223Ð241,1976. [29]C.YuandC.Chang.Distributedqueryprocessing. ACMComputing Surveys ,16(4),1984. [30]Y.ZhangandP.Boncz.XRPC:Inter operableandEfÞcie ntDistributed XQuery.In VLDB ,2007. [31]Y.ZhangandP.Boncz.DistributedXQueryandupdatesprocessing withheterogeneousXQueryengines.In SIGMOD ,2008.