GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 23 prof 4 prof 5 phd 67 stud adv adv adv sup 1Fortheformaldevelopmentinthispaperitwillbeconvenientt ID: 356179
Download Pdf The PPT/PDF document "Figure1:Graphsaboutacademicrelationsbetw..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 2,3 prof 4 prof 5 phd 6,7 stud adv adv adv sup Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andbachelorstudents,withadvisor-ofandsupervisesrelationships.GraphIisasimulation-basedstructuralindexforG.structuralindexesareobtainedbygroupingtogethernodesintheinputgraphthataresimilar(respectively,bisimilar).Theseindexesareknowntobecoveringfordierentfrag-mentsoftheXPathquerylanguage[11,22,27].Thatis,givenaqueryinthefragment,itsevaluationonthestructuralin-dexwillprovideexactlythenodesthatwouldbereturnedhadthequerybeenevaluatedontheoriginaldata.InFig-ure1,forexample,theindexIisactuallyasimulation-basedindexobtainedbygroupingtogetherthesimilarnodesinG.(And,asExample1hasalreadyillustrated,certainqueriescanbeimmediatelyansweredonIinsteadofG.)Variationsofthisideaunderlyingstructuralindexinghavealsobeenusedingraphdatamanagementtocompress[5,10]graph-structureddatasets,aswellasaidinqueryprocess-ing[18,25,30],anddataanalytics[9].Giventhenumeroussuccessfulapplicationsofstructuralindexingingraphdatabases,onemayaskthequestion:Isitpossibletoextendstructuralindexingfromgraphdatabasestoarbitraryrelationaldatabases?Inthispaperandcom-panionwork[24,25],weembarkonaformalstudyofthisquestion,andshowthatithasanarmativeanswer,bothfromatheoreticalandpracticalperspective.Generalmethodology.Ourstudyfollowsthemethodol-ogyproposedbyFletcheretal.[11]forthedesignofcoveringstructuralindexesforagiventargetquerylanguageQ.Thismethodologyrequiresthedevelopmentofthefollowingthreecomponents.(1)Alanguage-independentstructuralcharacterizationofqueryinvariance,characterizingwhendataobjects(inoursetting:relationaltuples)cannotbedistinguishedbyanyqueryinthetargetquerylanguageQ.(2)AnecientalgorithmtogrouptogetherdataobjectsthatcannotbedistinguishedbyanyqueryinthetargetlanguageQ.(3)Adatastructure(i.e.,theindex)thatexploitsthisgroup-ingtosupportqueryansweringbymeansoftheindexinsteadofrevertingtothefulldatabase.Inthispaper,wefocusontheconjunctivequeriesasourtar-getquerylanguage,anddevoteourstudytothestructuralcharacterizationrequiredforcomponent(1).Components(2)and(3)aredevelopedincompanionwork[24,25].Actu-ally,wewillfocusonthoseconjunctivequeriesthat\select"tuplesintheinputdatabaseratherthancomputenewtu-ples.Ourfocusonthisfragmentoftheconjunctivequeriesasthetargetlanguageinsteadofallconjunctivequeriesismotivatedbythefactthat,atleastforthepurposeofobtain-ingsuccinctstructuralindexes,theclassofallconjunctivequeriesistoolarge.Toclarifythisclaim,wenotethatingraphdatabasesthereisaknowntrade-obetweenthearityofqueriesinQandthesizeofthecorrespondingstructuralindexes:theindexsizeincreaseswiththearity.Ingraphdatabases,forexam-ple,Qisusuallyalanguageofnode-selecting(i.e.,unary)queries.Inthissetting,thedataobjectsinthemethodologyofFletcheretal.arenodes;thestructuralcharacterizationisgivenby(bi)similarity;andthestructuralindexdatastruc-tureisbuiltfromthegroupsofindistinguishablenodes,asillustratedinExample1.Thegroupsofindistinguishablenodesarenecessarilydisjoint.Therefore,therecanbeatmostasmanygroupsastherearenodesintheinputgraph,and,hence,thestructuralindexisalwaysguaranteedtobeatmostthesizeoftheinputgraph(althoughusuallymuchsmaller).NowconsiderthesettingwhereQisaclassofk-arygraphqueries(k2)instead.MiloandSuciuhaveshownthatessentiallythesameapproachasbeforecanbeusedtobuildstructuralindexesforQ[22].However,thedataobjectsbecomek-tuplesofnodes;thestructuralcharacterizationofindistinguishabilityisageneralizationof(bi)simulationtok-tuples;andthestructuralindexiscomposedofthegroupsofindistinguishablek-tuples.Essentially,wearenolongerbuildingasummaryoftheinputgraph,butasummaryofthepossibleoutputspaceofqueriesinQ|whichcanbevastlylargerthantheinputgraph.Inparticular,sincethenumberofk-tuplesfork3signicantlyexceedsthesizeoftheinputgraph,thenumberofgroupsofindistinguishablek-tuples(andhence,theindex)exceedsthesizeoftheinputgraphinpractice.Clearly,thisdefeatsthepurposeofthestructuralindexasasuccinctgraphsummary.Sinceananalogousreasoningappliestotherelationalset-ting,wearethereforenotinterestedinastructuralcharac-terizationofindistinguishabilitythatappliestoallconjunc-tivequeries(ofarbitraryarity),butinacharacterizationthatisapplicabletothoseconjunctivequeriesthat\select"tuplesintheinputdatabase.Intheliterature,thesecon-junctivequeriesareknownasthestrict(orvariable-guarded)conjunctivequeries[12].Formally,arule-basedconjunctivequeryisstrictifallvariablesintheheadoccurtogetherinasingleatominthebody.(SeealsoSection2.)Ourfocusonthestrictconjunctivequeriesasthetargetquerylanguageimpliesthatwewillnotbeabletoanswernon-strictqueriesonthestructuralindexdirectly.Never-theless,weshowincompanionworkthatqueryprocessingofallconjunctivequeriescanbenetfromthepresenceoftheseindexes[24,25].(SeealsoSection5.)Overviewofapproachandmainresult.Whatisagoodnotionofindistinguishabilitybystrictconjunctivequeries?Itiswellknownthatallconjunctivequeries(strictandnon-strict)areinvariantunderhomomorphisms(i.e.,structurepreservingfunctionsfromdatabasestodatabases),inthefollowingsense.1 1Fortheformaldevelopmentinthispaper,itwillbeconve-nienttofocusontheconjunctivequeriesthatdonotmentionanyconstants.Allresultscanbeextendedtoaccountforthepresenceofconstants,muchinthesamewayase.g.,theclassicalresultongenericityinrelationaldatabasescanbeextendedtoC-genericity,preservingconstantsinthenitesetC[3]. Sincetheseconcernsaboutindexsizecanbetransferredtotherelationalsetting,itishenceusefultodevelopapproxi-mateversionsofguardedsimulation.Tothisend,weintroduceapproximationsoffactsimu-lationanalogouslytohowapproximationsofclassicalsim-ulationaredened.Theseapproximationsareprovedtobetightlylinkedtoinvarianceoffreelyacyclicconjunctivequerieswhosejointreeisofboundedheight.Incompanionwork[24,25]weshowthattheseapproximationscanbothbeecientlycomputedandusedtoengineerpracticalguardedsimulation-basedstructuralindexesforrelationalqueryen-ginesoperatingonSemanticWebdata.Contributionsandorganization.Insummary,ourcon-tributionsareasfollows.(1)Weintroduceguardedsimula-tionasavariantofguardedbisimulation,andprovethechar-acterizationstatedinTheorem5(Section3).(2)Weintro-ducefactsimulationasanalternativedenitionofguardedsimulation,andshowthatapproximationsoffactsimulationaretightlylinkedtoinvarianceoffreelyacyclicconjunctivequeriesofboundedheight(Section4).(3)Weshowhowstructuralindexesbasedon(approximationsof)factsimu-lationscanbedened(Section5).Webegin,however,inSection2withintroducingthere-quiredbackground.2.PRELIMINARIESAtoms,facts,anddatabases.Fromtheoutset,weas-sumegivenaxeduniverseUofatomicdatavalues,axeduniverseVofvariables,andaxedsetSofrelationsym-bols,allinniteandpairwisedisjoint.Wecallatomicdatavaluesandvariablescollectivelyterms.Everyrelationsym-bolr2Sisassociatedwithanaturalnumbercalledthearityofr.Anatom(respectivelyafact)isanexpressionoftheformr(a1;:::;ak)withr2Sarelationsymbol;kthearityofrelationsymbolr;andeachofthea1;:::;ak2Vavariable(respectivelyanatomicdatavalue).ArelationaldatabaseoverSisanitesetdboffacts.Notation.Inwhatfollows,wedenotethesetofallterms(respectivelyvariables,respectivelydatavalues)occurringinamathematicalobjectX(suchas,e.g.anatom,fact,orsetofatomsandfacts)byterms(X)(resp.var(X),resp.val(X)).Wewriterel(a)fortherelationsymbolrofatomorfacta=r(a1;:::;ak).Wewritejajforthearitykofrel(a)anda:iforthei-thtermaiina,provided1ijaj.Wedenotetuples(a1;:::;ak)as a,andgivethenaturalsemanticstoj ajand a:i.TherestrictionofasetAofatomsorfactstoasetoftermsXU[V,denotedAjX,consistsofallatomsorfactsinAbuiltonlyfromtermsinX,AjX:=fa2Ajterms(a)Xg.Functionsf:X!YwithXandYsetsoftermsareex-tendedpoint-wisetoatoms,facts,tuplesofterms,andsetsthereof.Forinstance,ifa=r(a1;:::;ak)andterms(a)Xthenf(a)=r(f(a1);:::;f(ak)).WedenotebyfjZthere-strictionofthedomainofftothesetX\Zand,extendingthisnotationtoatomsandfacts,denotebyfjatherestric-tionofthedomainofftothesetX\terms(a).Werangeoveratomsbyboldfacelettersdrawnfromthebeginningofthealphabet(a;b;:::)andfactsbyboldfacelettersfromtheendofthealphabet(r;s;:::). Project PIDMgrAuditor s1 1AmyLex s2 2LexAmy s3 3SueSue Databasedb1 WorksOn EmpProj t1 Amy1 t2 Lex2 t3 Sue3 t4 Jerey3 t5 Cathy3 Project PIDMgrAuditor u1 aLivRob u2 bRobLiv u3 cNedNed u4 dEllenFred u5 eFredEllen Databasedb2 WorksOn EmpProj v1 Liva v2 Robb v3 Nedc v4 Bobc v5 Ellend v6 Frede Figure2:Twocompanydatabases.Forfutureref-erence,factsarelabeledwithidentiers(s1;s2;:::).Thedottedlinesindicateafactsimulation(Sec-tion4)betweendb1anddb2.Denition6.Ifsandtaretwofacts(resp.,atoms),thentheequalitytypeofsandt,denotedeqtp(s;t)isthesetf(i;j)js:i=t:j;with1ijsj;1jtg:Theequalitytypebetweentwofactshencerecordsthepositionsonwhichthefactsshareavalue.Toillustrate,referringtothefactsinthedatabasedb1ofFigure2,wehaveeqtp(s1;t1)=f(1;2);(2;1)g.Homomorphismsandisomorphisms.LetAandBbesetsoffactsandatoms.Afunctionf:X!Yisahomo-morphismfromAtoBifterms(A)Xandf(A)B.Itisapartialhomomorphismiff(AjX)B.Itisanisomor-phismiffisbijective,terms(A)X,andf(A)=B.Conjunctivequeries.A(rule-based)conjunctivequery(CQforshort)QconsistsofaruleoftheformQ:ans( x) a1;:::;an;withans( x);a1;:::;anatoms(n0).Thesetfa1;:::;angiscalledthebodyofQandisdenotedbybody(Q).Theatomans( x)iscalledtheheadofQandisdenotedbyhead(Q).Itisrequiredthatvar(head(Q))var(body(Q)).Wesome-timeswriteQ( x)toindicatethat xisthetupleofvariablesintheheadofQ.Avaluationisapartialfunction:V!U.Avalua-tionisanembeddingofsetofatomsAinadatabasedbifitisahomomorphismfromAtodb.AvaluationisanembeddingofaconjunctivequeryQinadatabasedbifitisanembeddingofbody(Q)indb.Theresultofconjunc-tivequeryQ( x)ondatabasedbisthesetQ(db):=f( x)jisanembeddingofQindbg.Example7.ConsiderthefollowingCQQ:ans(emp) Project(pid;mgr;mgr);WorksOn(pid;emp):WhenappliedtothedatabasesofFigure2itretrievesalltheemployeeswhoworkonaprojectthatismanagedandauditedbythesameperson.Aunionofconjunctivequeries(UCQforshort)isaniteset'ofCQs,allwiththesamehead,sayans( x),whichiscalledtheheadof'.TheresultofUCQ'ondatabasedbistheset'(db):=SfQ(db)jQ2'g. R(a;b) S(b;c;e) R(b;d) S(c;e;f) R(a;k) R(g;h) R(g;i) R(h;j) Figure3:AjointreeforthequeryinExample9.Anatomorfactaisbooleanifitdoesnotmentionanyterm.ACQisbooleanifitsheadis.ACQQisstrictifallvariablesintheheadoccurtogetherinasingleatominthebody.Toillustrate,thequeryfromExample7isstrict,butthefollowingisnot:ans(pid;emp;mgr) Project(pid;mgr;mgr);WorksOn(pid;emp):Minimality.ACQQiscontainedinaCQQ0,denotedQQ0,ifQ(db)Q0(db)foralldatabasesdb.QisequivalenttoQ0,denotedQQ0ifQQ0andQ0Q.ACQQisminimaliftheredoesnotexistanequivalentconjunctivequerywithfeweratomsinthebody.AUCQ'isminimalifallofitsCQsareminimal,and,moreover,Q6Q0foralldistinctQ;Q02'.Obviously,everyUCQhasanequivalentonethatisminimal.Acyclicity.Theacyclicconjunctivequerieswererecog-nizedearlyinthehistoryofdatabasetheoryasanimpor-tantsubclassoftheconjunctivequeriesthathaveaPTimequeryevaluationproblemundercombinedcomplexity[1,32].Therearemanyequivalentdenitionsofwhenaconjunctivequeryisacyclic.Here,wewillusetwodierentversions:adenitionbasedonjointreesandadenitionbasedonacyclichypergraphs.Denition8(Jointree).LetAbeanitesetofatoms.AjointreeforAisatreeT(i.e.,aconnectedacyclicundi-rectedgraph)whosenodesaretheatomsinAsuchthat,wheneverthesamevariablexoccursintwoatomsaandbinA,thenxoccursineachatomontheuniquepathlinkingaandb.AjointreeforaconjunctivequeryQisajointreeforbody(Q).Example9.Considerthefollowingquery:Q:ans(a;b) R(a;b);S(b;c;e);R(b;d);S(c;e;f);R(a;k);R(g;h);R(g;i);R(h;j):AjointreeforQisshowninFigure3.Denition10.Aconjunctivequeryisacyclicifithasajointree.Itiscyclicotherwise.ThequeryQfromExample9ishenceacyclic.Hypergraphacyclicity.Ahypergraphisapair(N;E),whereNisasetofnodesandEisasetofedges(alsocalledhyperedges),whicharearbitrarynonemptysubsetsofN.IfQisaconjunctivequery,wedenethehypergraphH(Q)=(N;E)associatedtoQasfollows.ThesetofnodesNconsistofallvariablesoccurringinQ.ForeachatomainthebodyofQ,thesetEcontainsahyperedgeconsistingofallvariablesoccurringina.Itiswell-knownthataconjunctivequeryisacyclicifandonlyifH(Q)isacyclic.Here,acyclicityofahypergraph,alsoreferredtoas-acyclicitybyFagin[8],isdenedasfollows.Apathfromanodestoanodetinahypergraph(N;E)isasequenceofk1edgesE1;:::;Ek2Esuchthat:s2E1,t2Ek,andEi\Ei+16=;,forevery1ik.Twonodes(ortwoedges)areconnectedifthereisapathfromonetotheother.Asetofnodes(orasetofedges)isconnectedifallofitspairsofnodes(resp.edges)areconnected.Thereductionofthehypergraph(N;E)isobtainedbyremovingfromEeachedgethatisapropersubsetofanotheredge.Ahypergraphisreducedifitisequaltoitsreduction.Givenahypergraph(N;E),thesetofpartialedgesgen-eratedbyasetofnodesMNisobtainedbyintersectingtheedgesinEwithM.Thatis,thesetofpartialedgesgeneratedbyMisthereductionoffE\MjE2Egf;g.AsetBissaidtobeanode-generatedsetofpartialedgesifBisthesetofpartialedgesgeneratedbyMN,forsomeM.LetFbeaconnected,reducedsetofpartialedges,andletEandFbeinF.LetG=E\F.WesaythatGisanarticulationsetofFifthesetofpartialedgesfHGjH2Fgf;gisnotconnected.Denition11(HypergraphAcyclicity).Ablockofare-ducedhypergraphisaconnected,node-generatedsetofpar-tialedgeswithnoarticulationset.Ablockistrivialifitcontainslessthantwomembers.Areducedhypergraphisacyclicifallitsblocksaretrivial.Ahypergraphissaidtobeacyclicifitsreductionis.Observethatnoblockcanbeformedfromexactlytwopar-tialedges.Indeed,thesetwoedgesareeitherdisconnectedortheirintersectionformsanarticulationset.Example12.ConsidertheconjunctivequeryQ2:ans() R(a;b;c);R(a;b;d);R(a;c;d);R(b;c;d):ItshypergraphH(Q2)consistsofthefollowingedges:E1=fa;b;cgE2=fa;c;dgE3=fa;b;dgE4=fb;c;dgNotethatH(Q2)itselfequalsthesetofpartialhyperedgesofH(Q2)generatedbythesetfa;b;c;dg.Thissetisclearlyconnectedandreduced.Furthermore,ithasnoarticulationset,anditisnottrivial.Therefore,H(Q2)itselfformsanon-trivialblockofH(Q2).HenceH(Q2)iscyclic,andsoisQ2.3.STRUCTURALCHARACTERIZATIONGuardedbisimulationisageneralizationofclassicalbi-simulationtorelationaldatabasesintroducedbyAndrekaetal.[2].(AformaldenitionofguardedbisimulationisprovidedinAppendixAforcompleteness.)Analogouslytomodalbisimulation,guardedbisimulationisformulatedbymeansofbackandforthconditions.Inthissection,wein-troduceguardedsimulationasavariantofguardedbisim-ulationwithoutthebackcondition,andproveTheorem5.Towardsthis,westartwiththedenitionoffreeacyclicity.3.1FreeAcyclicityTheextensionofCQQ,denotedbyQ+,istheCQob-tainedbyaddinghead(Q)asanatomtothebody. aso-calledcompactwinningstrategyfortheexistentialk-covergamebetweentworelationalstructures,forthespecialcasewherek=1.ChenandDalmaulinktheexistenceofwinningstrategiesforthek-covergametoinvariancebycon-junctivequeriesofso-calledcoverwidth(alsoknownasgen-eralizedhypertreewidth)atmostk.Sinceitisknownthattheconjunctivequeriesofcoverwidth1areexactlytheACQs(e.g.,[7,13]),itisnotdiculttoobtainthefollowingfromtheirresults.Proposition18.Thefollowingareequivalent.db1; agdb2; bForallFACQsQ,if a2Q(db1)then b2Q(db2).3.3CharacterizinginvarianceunderguardedsimulationProposition18impliesthattheFACQsareinvariantunderguardedsimulation.ItalsoimpliesthatanyFOdenablequerythatisequivalenttoaunionofFACQsmustbein-variantunderguardedsimulation.ToobtainTheorem5,therefore,itremainstoprovethatanyFOdenablequerythatisinvariantunderguardedsimulationisequivalenttoaunionofFACQs.Wedevotetherestofthissectiontothisproof,whichstartswiththefollowingobservation.Proposition19.If'isaFOformulainvariantunderguardedsimulation(onnitedatabases)then'isequiva-lent(inthenite)toaUCQ.Proof.Everyhomomorphismgivesrisetoaguardedsim-ulation.Indeed,ifhisahomomorphismfromdb1todb2thatmaps ato bthenitisreadilyveriedthatthesetS:=fhj ag[fhjXjXguardedindb1gisaguardedsimulationfromdb1[fans( a)gtodb2[fans( b)g.Hencedb1; agdb2; b,sincehj a2Smaps ato b.Thensince'isinvariantunderguardedsimulations,itisalsoinvariantunderhomomor-phisms.ByRossman'stheorem(Theorem3),'ishenceequivalenttoaUCQ. NowxthroughouttheremainderofthissectionanFOformula'( x)invariantunderguardedsimulation.ByPropo-sition19wemayassumew.l.o.g.that'isaUCQ.Further-more,wemayassumew.l.o.g.thatthisUCQisminimal.NowassumeforthepurposeofcontradictionthatnounionofFACQsexpresses'.TheninparticularthereexistssomeCQQ( x)in'thatisnotfreelyacyclic,i.e.,Q+iscyclic.FromQwewillconstructpairs(canondb; a)and(unrolldb; b)suchthatcanondb; agunrolldb; band a2'(canondb)but b62'(unrolldb).Thenobviously,'isnotinvariantunderguardedsimulation,yieldingthedesiredcontradiction.Thedenitionofcanondbandunrolldbisasfollows.Thecanonicaldatabase.Thedatabasecanondbissimplywhatisnormallycalledthe\canonicaldatabase"(or\frozen"database)forQinthetheoryofconjunctivequeries.For-mally,xforeveryvariablex2Qauniquedatavaluex2Usuchthatthefunctionfreezemappingx7!xforallx2var(Q)isabijection.Letcanondb:=freeze(body(Q))and a:=freeze( x).Byconstruction,freezeisanembeddingofQincanondb.Therefore,Lemma20. a2Q(canondb)'(canondb).Theunrolleddatabase.SinceQ+iscyclicthehyper-graphH(Q+)containsanontrivialblock.Fixsuchanon-trivialblockB,aswellasadistinguishedhyperedgeF2B.Letfx1;:::;xngbethevariablesmentionedinQ.WexasetU=fx1;:::;xn;x1;:::;xngUofpairwisedis-tinctvalues.Inwhatfollows,wecallxithewhitecoloredversionofxi,andxitheblackcoloredversionofxi.Letvar(B)denotethesetofallvariablesthataremen-tionedinthehyperedgesofblockB.WedeneforeveryVvar(B)thefunctionclrV:var(Q)!Uby:clrV(v)=vv62var(B)orv2VclrV(v)=vv2var(B)andv62V:Intuitively,clrVisafunctionthatmapsvariablestovaluesby\coloring"thevariables.VariablesnotmentionedinBarecoloredwhite,whileavariablevmentionedinBiscoloredwhiteifvisinV,andblackotherwise.Denition21(Covering).LetE,E0,andVbethreesetsofvariables.WesaythatEcoversE0w.r.t.V,denotedEwVE0,ifE\VE0\V.WeabbreviateEwvar(B)E0byEwE0andwriteE=E0andE=VE0todenotethecorrespondingstrictrelations.Denition22(Maximumintersections).LetB=FdenotethesetofallpartialhyperedgesE2BnfFgthathaveamaximalintersectionwithFamongthehyperedgesinBnfFg.Thatis,B=FconsistsofallE2BnfFgforwhichtheredoesnotexistE02BnfFgwithE0=FE.LetM\bethesetofmaximumintersectionsofpartialhyperedgesofBnfFgwithF,M\:=fE\FjE2B=Fg.Notethat,sinceBisnontrivial,thecardinalityofM\isatleast2,andallintersectionsinM\arenonempty.AlsonotethatforanyA2M\,wehaveF)AandhenceFwA.Example23.ConsiderthequeryQ1:ans() R(a;b;d);R(c;a;d);S(b;c;d;e);T(e;f);T(f;g):ItisreadilyveriedthatthesetB1=ffa;b;dg;fb;c;dg;fc;a;dggformsablockofH(Q1+).ConsiderthehyperedgeF=fb;c;dgofthisblock.ThenM\=ffb;dg;fc;dgg,result-ingfromtheintersectionswiththehyperedgesfa;b;dgandfc;a;dgrespectively.Next,considerthequeryQ2:ans() R(a;b;c);R(a;b;d);R(a;c;d);R(b;c;d)fromExample12.ItisreadilyveriedthatthesetB2=ffa;b;cg;fa;b;dg;fa;c;dg;fb;c;dggformsablockofH(Q2+)(cf.,e.g.,Example12).ConsiderthehyperedgeF=fa;b;dgofthisblock.ThenM\isthesetffa;bg;fa;dg;fb;dgg,resultingfromtheintersectionswiththehyperedgesfa;b;cg,fa;c;dg,andfb;c;dgrespectively.Wenowturntotheconstructionofunrolldb.Denition24(Unrolleddatabase).DeneFtobethesetoffunctionsthatcontains ofeachfact.NoteinparticularthatitisnotpossibletoembedQ1(resp.Q2)intotheunrolleddatabaseofQ1(resp.Q2).Indeed,toconstructsuchanembedding,wewouldessentiallyhavetondanedge-label-preservinggraphhomomorphismofthegraphinFigure4(a)(resp.Figure4(c))tothegraphinFigure4(b)(resp.Figure4(d)),whichisreadilyveriedtobeimpossible.BydenitionofQ+,H(Q+)containsahyperedgeXwithvar( x)X.Nowobservethat,byconstruction,FcontainsforeveryhyperedgeXofH(Q+)afunctionfwithdomainX.Fixf2Fwithvar( x)dom(f)arbitrarilyandlet b=f( x).Letfreeze1denotetheinverseoffreeze.Thefollow-inglemmasandpropositionsshowthat(canondb; a)and(unrolldb; b)havebeenconstructedasdesired.Lemma26.ThesetS=fffreeze1jf2Fgisaguardedsimulationofcanondbinunrolldb.Proofsketch.Itsucestoprovethateachf2Fisapar-tialhomomorphismfrombody(Q)intounrolldbandthatFsatisestheguardedforthcondition.Indeed,sincefreeze1isanisomorphismfromcanondbtobody(Q),Swillthenbeasetofpartialhomomorphismsfromcanondbintounrolldbthatsatisfytheguardedforthcondition.Establishingthateachf2Fisapartialhomomorphismisstraightforward;establishingtheguardedforthconditionisdonebyatech-nicalcaseanalysis. Proposition27.canondb; agunrolldb; b.Proof.Clearly b=f(freeze1( a)).Hencecanondb; agunrolldb; bsinceS=fgfreeze1jg2FgisaguardedsimulationoffreezeinunrolldbbyLemma26,andsinceffreeze12Smaps a7! b. Proposition28. b62Q(unrolldb).Proofsketch.Theproofisbycontradiction.Theessentialreasoning(glossingovermanyimportantdetails)isasfol-lows.Letans( x)betheheadofQandletunrolldb+denoteunrolldb[fans( b)g.First,weshowthatif b2Q(unrolldb)thentheremustalsoexistanembeddinghofQ+inunrolldb+thatmapsx7!xorx7!x;foreveryx2var(Q).Inparticular,xwillnotbemappedtoacoloredversionofanothervariable.Asaconsequence,wecanestablishthathmapseachatoma2body(Q+)toacopyofainunrolldb+,andnottoacopyofsomeotheratom.Then,sinceFisapartialhyperedgeofH(Q+)thereexistssomeatomainQ+thatcontainsallvariablesinF.Since,bytherstbullet,hmapsatomsinbody(Q+)totheircopiesinunrolldb+,weknowinparticularthath(a)isacopyofa.Then,sinceacontainsallvariablesinF,thereexistssomeA2M\suchthateveryvariableinA(Fiscoloredwhiteinh(a)andeveryvariableinFnAiscoloredblackinh(a).SinceA2M\thereexistsE12BsuchthatA=E1\F.Moreover,sinceBisablockofH(Q+),AcannotbeanarticulationsetofB.Assuch,theremustexistapathE1;:::;En;F2BthatdoesnotneedtotraverseanynodeinA.Thatis,(Ei\Ei+1)nA6=;for1in,and(En\F)nA6=;.Now,itispossibletoestablishthath(Ei)consistsonlyofwhitecoloredvariables,forall1in.Thisyieldsthedesiredcontradiction.Indeed,since(En\F)nAisnon-emptythereissomevariablexthatisbothinEnandF,butnotinA.Sincex2Ei,hmustmapx7!x.Ontheotherhand,sincex2FnA,wehavealreadyestablishedbeforethathmustmapx7!x. Proposition29. b62'(unrolldb).Crux.Wealreadyknowthat b62Q(unrolldb)byProposi-tion28.Suppose,forthepurposeofcontradiction,thatthereissomeotherCQQ02'suchthat b2Q0(unrolldb).Inparticular,thereexistsanembeddinghfromQ0intounrolldbsuchthath( x)= b.Now,considerthefunctiondecopy:im(h)!var(Q)suchthatdecopy(x)=xforeveryx2im(h)decopy(x)=xforeveryx2im(h):Observethatbydenitionofunrolldb,Qcontainsanatomdecopy(s)foreachfacts2unrolldbbuiltovertheimageofh.Hence(decopyh)isahomomorphismofbody(Q0)intobody(Q).Furthermore,decopy( b)= xsince b=f(x)forsomef2F.ByChandraandMerlin'sclassicalresult[6],thisimpliesthatQiscontainedinQ0,contradictingthefactthat'isminimal. Sincecanondb; agunrolldb; band a2'(canondb)but b62'(unrolldb)wehaveourdesiredcontradiction:'isnotinvariantunderguardedsimulation.ThisnishestheproofofTheorem5.4.GUARDEDVSFACTSIMULATIONWenextpresentanalternatedenitionforguardedsimu-lation,calledfactsimulation,andshowthatfactsimulationnaturallyyieldsapproximationsthataretightlylinkedtoin-varianceoffreelyacyclicconjunctivequerieswhosejointreeisofaspecicboundedheight.4.1FactsimulationDenition30.Afactsimulationofdatabasedb1indata-basedb2isanonemptybinaryrelationFdb1db2be-tweenthefactsofdb1anddb2suchthatforallfactss2db1andt2db2withsFt:sandtcarrythesamerelationsymbol,i.e.,rel(s)=rel(t);foralls02db1thereexistst02db2witheqtp(s;s0)eqtp(t;t0)ands0Ft0.Example31.Toillustrate,thedottedlinesinFigure2showafactsimulationFofdatabasedb1indb2.Notethatfactsimulationisnecessarilytotalondb1(i.e.,everyfactofdb1occursinF). Now,letdb1anddb2betwodatabases,andletsandtbefacts.Wesaythat(db1;s)isfactsimulatedby(db2;t),denoteddb1;sfdb2;t,ifthereexistsafactsimulationFofdb1[fsgindb2[ftgwithsFt.Moreoverif aand baretuplesofdatavalues,thendb1; afdb2; bifdb1;ans( a)fdb2;ans( b)withansarelationsymbolofthesamearityas aand bthatdoesnotoccurindb1ordb2.Werequirethefollowingnotionstoestablishthatfactsimulationisequivalenttoguardedsimulation.Letstdenote,foreverys=s(a1;:::;ak)andt=t(b1;:::;bl),therelationf(ai;bi)j1imin(k;l)g.Whenwearesurethatstisafunction,weusecommonnotationforfunctions,suchas(st)(a)todenotetheuniquevalueassociatedtoabythefunctionst.Nowdene,foraguardedsimulationS,F[S]:=f(s;f(s))jf:X!Y2S;s2db1;andval(s)Xg:Alsodene,forafactsimulationF,S[F]:=fstj(s;t)2Fg:Forexample,therelation(1;Amy;Lex)(c;Ned;Ned)isanelementofS[F],forfactsimulationFofExample31.Thefollowingpropositionestablishesthecorrespondencebetweenguardedsimulationandfactsimulation.Proposition32.1.IfSisaguardedsimulationofdb1indb2thenF[S]isafactsimulationofdb1indb2.2.IfFisafactsimulationofdb1indb2thenS[F]isaguardedsimulationofdb1indb2.Itfollowsthatfactsimulationprovidesanalternativedef-initionforguardedsimulation,inthefollowingsense.Theorem33.Fordatabasesdb1anddb2andtuples aand bitholdsthatdb1; agdb2; bidb1; afdb2; b.4.2ApproximatefactsimulationInapplicationsofclassicalsimulationindatamanagementitisknownthat,iftherearefewnodesingraphGthataresimilar,thenthecorrespondingstructuralindexofGmaybeofthesamesizeasGitself,andhencebetoolargetoactasasuccinctsummaryofthestructureofG[18].Insuchsituations,ithasbeenproposedtoapproximatesimulationsandtogroupnodesintheindexwithrespecttotheseap-proximationsinsteadofwithrespecttofullsimulation[18].Towardsasuitableapproximationofguardedsimulation,weintroducethefollowingversionoffactsimulation.Denition34(Approximatefactsimulation).Letdb1anddb2betwodatabases.Adepth-kapproximationoffactsim-ulationofdb1indb2,orfactk-simulationforshort,isase-quenceFkFk1F0ofbinaryrelationssuchthatF0consistsofallpairs(s;t)2db1db2withrel(s)=rel(t)andeqtp(s;s)eqtp(t;t);andthefollowingpropertyholdsforevery1jkandallsandtwithsFjt.Foreverys02db1thereexistst0indb2suchthateqtp(s;s0)eqtp(t;t0)ands0Fj1t0(FactForth).Wesaythat(db1;s)isk-factsimulatedby(db2;t),de-noteddb1;skfdb2;t,ifthereexistsafactk-simulationFkFk1F0fromdb1[fsgtodb2[ftgwithsFkt.Thenotionofk-factsimilaritybetween(db1; a)and(db2; b)with aand btuplesisnowdenedintheobviousway.Observethatdb1;sfdb2;tidb1;skfdb2;tforeveryk0.Wenowlinkapproximateguardedsimulationtoindistin-guishabilitybyFACQsofboundedheight.Here,theheightofaFACQisdenedasfollows.Recallthatingraphthe-ory,thedistancebetweentwoconnectednodesuandvinanundirectedgraphGisthelengthofashortestpathbetweenuandv.TheeccentricityofuinG,denotedecc(u;G)isthemaximumdistanceofutoanyothernodetowhichitisconnected.Example35.Consider,forinstance,thejointreeTofFigure3.TheeccentricityofR(a;b)inTis2whiletheeccentricityofR(g;h)inTis3.Denition36.LetAbeasetofatoms.WhenAisacyclic,theeccentricityofatoma2A,denotedecc(a;A)istheminimumeccentricityofaamongalljointreesTforA.TheheightofaFACQQistheeccentricityofhead(Q)inbody(Q+).Inotherwords,theheightofQistheminimumheightofanyjointreeTforbody(Q+),whenconsideredasbeingrootedathead(Q).Example37.ContinuingExample35,queryQofExample9hasaheightof3.Proposition38.Letk0beanaturalnumber.Thefol-lowingareequivalent.(1)db1; akfdb2; b(2)ForallFACQsQofheightk,if a2Q(db1)then b2Q(db2).NotethatProposition38impliesProposition18(yettheconverseisnottrue).Closingremark.Weclosethissectionwiththefollowingimportantremark.Itisobviouslypossibletodeneapprox-imationsofguardedsimulationinananalogouswayasap-proximationsoffactsimulation:adepth-kapproximationisasequenceSkSk1S0ofpartialhomomor-phismssuchthateachSisatisestheguardedforthpropertytoSi1,fori1.Whilefullguardedsimulationcoincideswithfactsimulation(cf.Theorem33),theirapproximationsdonot.Inparticular,0gisdistinctfrom0f.Indeed,con-siderdb1=fr(a;b);r(b;a)ganddb2=fr(1;2)g.Notethatthereisnopartialhomomorphismfromdb1todb2withdo-mainfa;bg.Therefore,db1isnotguarded0-simulatedbydb2.Yet,db1db2isafact0-simulationofdb1indb2.5.GUARDEDSTRUCTURALINDEXINGRecallfromtheIntroductionthatingraphdatamanage-ment,astructuralindexisacompactrepresentationofadatagraph.Typically,thiscompactrepresentationisob-tainedbygroupingthenodesintheinputgraphthataresimilarorbisimilar.Structuralcharacterizationsofqueryin-variancethenenableecientretrievaloftherelevantnodesofthegraphforvariousgraphquerylanguages.Inthissectionweanalogouslydeneguardedstructuralindexesascompactrepresentationsofrelationaldataob-tainedbygroupingfactsaccordingtoguardedsimilarity. [6]A.K.ChandraandP.M.Merlin.Optimalimplementationofconjunctivequeriesinrelationaldatabases.InSTOC1977,pages77{90.ACM,1977.[7]H.ChenandV.Dalmau.Beyondhypertreewidth:Decompositionmethodswithoutdecompositions.InCP,volume3709,pages167{181.Springer,2005.[8]R.Fagin.Degreesofacyclicityforhypergraphsandrelationaldatabaseschemes.J.ACM,30(3):514{550,1983.[9]W.Fan.Graphpatternmatchingrevisedforsocialnetworkanalysis.InICDT2012,pages8{21.ACM,2012.[10]W.Fan,J.Li,X.Wang,andY.Wu.Querypreservinggraphcompression.InSIGMOD2012,pages157{168.ACM,2012.[11]G.H.L.Fletcher,D.VanGucht,Y.Wu,M.Gyssens,S.Brenes,andJ.Paredaens.AmethodologyforcouplingfragmentsofXPathwithstructuralindexesforXMLdocuments.Inf.Syst.,34(7):657{670,2009.[12]J.Flum,M.Frick,andM.Grohe.Queryevaluationviatree-decompositions.J.ACM,49(6):716{752,2002.[13]G.Gottlob,N.Leone,andF.Scarcello.Robbers,marshals,andguards:gametheoreticandlogicalcharacterizationsofhypertreewidth.JCSS,66(4):775{808,2003.[14]G.GouandR.Chirkova.EcientlyqueryinglargeXMLdatarepositories:asurvey.TKDE,19(10):1381{1403,2007.[15]E.Gradel,C.Hirsch,andM.Otto.Backandforthbetweenguardedandmodallogics.TOCL,3(3):418{463,2002.[16]M.Gyssens,J.Paredaens,D.V.Gucht,andG.H.L.Fletcher.StructuralcharacterizationsofthesemanticsofXPathasnavigationtoolonadocument.InPODS2006,pages318{327.ACM,2006.[17]C.Hirsch.Guardedlogics:algorithmsandbisimulation.PhDthesis,TUAachen,2002.[18]R.Kaushik,P.Shenoy,P.Bohannon,andE.Gudes.Exploitinglocalsimilarityforindexingpathsingraph-structureddata.InICDE2002,pages129{140.IEEE,2002.[19]P.G.KolaitisandM.Y.Vardi.Conjunctive-querycontainmentandconstraintsatisfaction.InPODS1998,pages205{213.ACM,1998.[20]D.Leinders,M.Marx,J.Tyszkiewicz,andJ.V.denBussche.Thesemijoinalgebraandtheguardedfragment.JOLLI,14(3):331{343,2005.[21]A.Matono,T.Amagasa,M.Yoshikawa,andS.Uemura.Apath-basedrelationalRDFdatabase.InADC2005,pages95{103.AustralianComputerSociety,2005.[22]T.MiloandD.Suciu.Indexstructuresforpathexpressions.InICDT1999,pages277{295.Springer,1999.[23]M.Otto.Highlyacyclicgroups,hypergraphcovers,andtheguardedfragment.J.ACM,59(1):5:1{5:40,2012.[24]F.Picalausa.Guardedstructuralindexes:theoryandapplicationtorelationalRDFdatabases.PhDthesis,UniversiteLibredeBruxelles,2013.[25]F.Picalausa,Y.Luo,G.H.L.Fletcher,J.Hidders,andS.Vansummeren.Astructuralapproachtoindexingtriples.InESWC,volume7295,pages406{421.Springer,2012.[26]E.Prud'hommeauxandA.Seaborne.SPARQLquerylanguageforRDF.Technicalreport,W3CRecommendation,2008.[27]P.Ramanan.CoveringindexesforXMLqueries:bisimulationsimulation=negation.InVLDB2003,pages165{176,2003.[28]B.Rossman.Homomorphismpreservationtheorems.J.ACM,55(3):15:1{15:53,2008.[29]D.Sangiorgi.Introductiontobisimulationandcoinduction.CambridgeUniversityPress,2012.[30]T.Tran,G.Ladwig,andS.Rudolph.ManagingstructuredandsemistructuredRDFdatausingstructureindexes.TKDE,25(9):2076{2089,2013.[31]O.Udrea,A.Pugliese,andV.S.Subrahmanian.GRIN:agraphbasedRDFindex.InAAAI2007,pages1465{1470,2007.[32]M.Yannakakis.Algorithmsforacyclicdatabaseschemes.InVLDB1981,pages82{94.IEEE,1981.APPENDIXA.GUARDEDBISIMULATIONThedenitionofguardedbisimulationduetoAndrekaetal.[2]isrecalledhereforcompleteness.GiventwosetsAandBoffactsandatoms,afunctionf:X!YisapartialisomorphismfromAtoBifitisbijectiveandf(AjX)=BjY.Denition44(Guardedbisimulation).Letdb1anddb2bedatabases.Aguardedbisimulationfromdb1todb2isanonemptysetIofnitepartialisomorphismsfromdb1todb2suchthatthefollowingforthandbackconditionsaresatised.Foreveryf:X!Y2IandforeverysetX0guardedindb1,thereexistsapartialisomorphismg:X0!Y02IsuchthatgandfagreeonX\X0.(GuardedBi-simulationForth).Foreveryf:X!Y2IandforeverysetY0guardedindb2,thereexistsapartialisomorphismg:X0!Y02Isuchthatg1andf1agreeonY\Y0.(GuardedBisimulationBack)