/
Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb

Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
385 views
Uploaded On 2016-06-10

Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb - PPT Presentation

GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 23 prof 4 prof 5 phd 67 stud adv adv adv sup 1Fortheformaldevelopmentinthispaperitwillbeconvenientt ID: 356179

GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 2 3 prof 4 prof 5 phd 6 7 stud adv adv adv sup 1Fortheformaldevelopmentinthispaper itwillbeconve-nientt

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Figure1:Graphsaboutacademicrelationsbetw..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 2,3 prof 4 prof 5 phd 6,7 stud adv adv adv sup Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andbachelorstudents,withadvisor-ofandsupervisesrelationships.GraphIisasimulation-basedstructuralindexforG.structuralindexesareobtainedbygroupingtogethernodesintheinputgraphthataresimilar(respectively,bisimilar).Theseindexesareknowntobecoveringfordi erentfrag-mentsoftheXPathquerylanguage[11,22,27].Thatis,givenaqueryinthefragment,itsevaluationonthestructuralin-dexwillprovideexactlythenodesthatwouldbereturnedhadthequerybeenevaluatedontheoriginaldata.InFig-ure1,forexample,theindexIisactuallyasimulation-basedindexobtainedbygroupingtogetherthesimilarnodesinG.(And,asExample1hasalreadyillustrated,certainqueriescanbeimmediatelyansweredonIinsteadofG.)Variationsofthisideaunderlyingstructuralindexinghavealsobeenusedingraphdatamanagementtocompress[5,10]graph-structureddatasets,aswellasaidinqueryprocess-ing[18,25,30],anddataanalytics[9].Giventhenumeroussuccessfulapplicationsofstructuralindexingingraphdatabases,onemayaskthequestion:Isitpossibletoextendstructuralindexingfromgraphdatabasestoarbitraryrelationaldatabases?Inthispaperandcom-panionwork[24,25],weembarkonaformalstudyofthisquestion,andshowthatithasanarmativeanswer,bothfromatheoreticalandpracticalperspective.Generalmethodology.Ourstudyfollowsthemethodol-ogyproposedbyFletcheretal.[11]forthedesignofcoveringstructuralindexesforagiventargetquerylanguageQ.Thismethodologyrequiresthedevelopmentofthefollowingthreecomponents.(1)Alanguage-independentstructuralcharacterizationofqueryinvariance,characterizingwhendataobjects(inoursetting:relationaltuples)cannotbedistinguishedbyanyqueryinthetargetquerylanguageQ.(2)AnecientalgorithmtogrouptogetherdataobjectsthatcannotbedistinguishedbyanyqueryinthetargetlanguageQ.(3)Adatastructure(i.e.,theindex)thatexploitsthisgroup-ingtosupportqueryansweringbymeansoftheindexinsteadofrevertingtothefulldatabase.Inthispaper,wefocusontheconjunctivequeriesasourtar-getquerylanguage,anddevoteourstudytothestructuralcharacterizationrequiredforcomponent(1).Components(2)and(3)aredevelopedincompanionwork[24,25].Actu-ally,wewillfocusonthoseconjunctivequeriesthat\select"tuplesintheinputdatabaseratherthancomputenewtu-ples.Ourfocusonthisfragmentoftheconjunctivequeriesasthetargetlanguageinsteadofallconjunctivequeriesismotivatedbythefactthat,atleastforthepurposeofobtain-ingsuccinctstructuralindexes,theclassofallconjunctivequeriesistoolarge.Toclarifythisclaim,wenotethatingraphdatabasesthereisaknowntrade-o betweenthearityofqueriesinQandthesizeofthecorrespondingstructuralindexes:theindexsizeincreaseswiththearity.Ingraphdatabases,forexam-ple,Qisusuallyalanguageofnode-selecting(i.e.,unary)queries.Inthissetting,thedataobjectsinthemethodologyofFletcheretal.arenodes;thestructuralcharacterizationisgivenby(bi)similarity;andthestructuralindexdatastruc-tureisbuiltfromthegroupsofindistinguishablenodes,asillustratedinExample1.Thegroupsofindistinguishablenodesarenecessarilydisjoint.Therefore,therecanbeatmostasmanygroupsastherearenodesintheinputgraph,and,hence,thestructuralindexisalwaysguaranteedtobeatmostthesizeoftheinputgraph(althoughusuallymuchsmaller).NowconsiderthesettingwhereQisaclassofk-arygraphqueries(k2)instead.MiloandSuciuhaveshownthatessentiallythesameapproachasbeforecanbeusedtobuildstructuralindexesforQ[22].However,thedataobjectsbecomek-tuplesofnodes;thestructuralcharacterizationofindistinguishabilityisageneralizationof(bi)simulationtok-tuples;andthestructuralindexiscomposedofthegroupsofindistinguishablek-tuples.Essentially,wearenolongerbuildingasummaryoftheinputgraph,butasummaryofthepossibleoutputspaceofqueriesinQ|whichcanbevastlylargerthantheinputgraph.Inparticular,sincethenumberofk-tuplesfork3signi cantlyexceedsthesizeoftheinputgraph,thenumberofgroupsofindistinguishablek-tuples(andhence,theindex)exceedsthesizeoftheinputgraphinpractice.Clearly,thisdefeatsthepurposeofthestructuralindexasasuccinctgraphsummary.Sinceananalogousreasoningappliestotherelationalset-ting,wearethereforenotinterestedinastructuralcharac-terizationofindistinguishabilitythatappliestoallconjunc-tivequeries(ofarbitraryarity),butinacharacterizationthatisapplicabletothoseconjunctivequeriesthat\select"tuplesintheinputdatabase.Intheliterature,thesecon-junctivequeriesareknownasthestrict(orvariable-guarded)conjunctivequeries[12].Formally,arule-basedconjunctivequeryisstrictifallvariablesintheheadoccurtogetherinasingleatominthebody.(SeealsoSection2.)Ourfocusonthestrictconjunctivequeriesasthetargetquerylanguageimpliesthatwewillnotbeabletoanswernon-strictqueriesonthestructuralindexdirectly.Never-theless,weshowincompanionworkthatqueryprocessingofallconjunctivequeriescanbene tfromthepresenceoftheseindexes[24,25].(SeealsoSection5.)Overviewofapproachandmainresult.Whatisagoodnotionofindistinguishabilitybystrictconjunctivequeries?Itiswellknownthatallconjunctivequeries(strictandnon-strict)areinvariantunderhomomorphisms(i.e.,structurepreservingfunctionsfromdatabasestodatabases),inthefollowingsense.1 1Fortheformaldevelopmentinthispaper,itwillbeconve-nienttofocusontheconjunctivequeriesthatdonotmentionanyconstants.Allresultscanbeextendedtoaccountforthepresenceofconstants,muchinthesamewayase.g.,theclassicalresultongenericityinrelationaldatabasescanbeextendedtoC-genericity,preservingconstantsinthe nitesetC[3]. Sincetheseconcernsaboutindexsizecanbetransferredtotherelationalsetting,itishenceusefultodevelopapproxi-mateversionsofguardedsimulation.Tothisend,weintroduceapproximationsoffactsimu-lationanalogouslytohowapproximationsofclassicalsim-ulationarede ned.Theseapproximationsareprovedtobetightlylinkedtoinvarianceoffreelyacyclicconjunctivequerieswhosejointreeisofboundedheight.Incompanionwork[24,25]weshowthattheseapproximationscanbothbeecientlycomputedandusedtoengineerpracticalguardedsimulation-basedstructuralindexesforrelationalqueryen-ginesoperatingonSemanticWebdata.Contributionsandorganization.Insummary,ourcon-tributionsareasfollows.(1)Weintroduceguardedsimula-tionasavariantofguardedbisimulation,andprovethechar-acterizationstatedinTheorem5(Section3).(2)Weintro-ducefactsimulationasanalternativede nitionofguardedsimulation,andshowthatapproximationsoffactsimulationaretightlylinkedtoinvarianceoffreelyacyclicconjunctivequeriesofboundedheight(Section4).(3)Weshowhowstructuralindexesbasedon(approximationsof)factsimu-lationscanbede ned(Section5).Webegin,however,inSection2withintroducingthere-quiredbackground.2.PRELIMINARIESAtoms,facts,anddatabases.Fromtheoutset,weas-sumegivena xeduniverseUofatomicdatavalues,a xeduniverseVofvariables,anda xedsetSofrelationsym-bols,allin niteandpairwisedisjoint.Wecallatomicdatavaluesandvariablescollectivelyterms.Everyrelationsym-bolr2Sisassociatedwithanaturalnumbercalledthearityofr.Anatom(respectivelyafact)isanexpressionoftheformr(a1;:::;ak)withr2Sarelationsymbol;kthearityofrelationsymbolr;andeachofthea1;:::;ak2Vavariable(respectivelyanatomicdatavalue).ArelationaldatabaseoverSisa nitesetdboffacts.Notation.Inwhatfollows,wedenotethesetofallterms(respectivelyvariables,respectivelydatavalues)occurringinamathematicalobjectX(suchas,e.g.anatom,fact,orsetofatomsandfacts)byterms(X)(resp.var(X),resp.val(X)).Wewriterel(a)fortherelationsymbolrofatomorfacta=r(a1;:::;ak).Wewritejajforthearitykofrel(a)anda:iforthei-thtermaiina,provided1ijaj.Wedenotetuples(a1;:::;ak)as a,andgivethenaturalsemanticstoj ajand a:i.TherestrictionofasetAofatomsorfactstoasetoftermsXU[V,denotedAjX,consistsofallatomsorfactsinAbuiltonlyfromtermsinX,AjX:=fa2Ajterms(a)Xg.Functionsf:X!YwithXandYsetsoftermsareex-tendedpoint-wisetoatoms,facts,tuplesofterms,andsetsthereof.Forinstance,ifa=r(a1;:::;ak)andterms(a)Xthenf(a)=r(f(a1);:::;f(ak)).WedenotebyfjZthere-strictionofthedomainofftothesetX\Zand,extendingthisnotationtoatomsandfacts,denotebyfjatherestric-tionofthedomainofftothesetX\terms(a).Werangeoveratomsbyboldfacelettersdrawnfromthebeginningofthealphabet(a;b;:::)andfactsbyboldfacelettersfromtheendofthealphabet(r;s;:::). Project PIDMgrAuditor s1 1AmyLex s2 2LexAmy s3 3SueSue Databasedb1 WorksOn EmpProj t1 Amy1 t2 Lex2 t3 Sue3 t4 Je rey3 t5 Cathy3 Project PIDMgrAuditor u1 aLivRob u2 bRobLiv u3 cNedNed u4 dEllenFred u5 eFredEllen Databasedb2 WorksOn EmpProj v1 Liva v2 Robb v3 Nedc v4 Bobc v5 Ellend v6 Frede Figure2:Twocompanydatabases.Forfutureref-erence,factsarelabeledwithidenti ers(s1;s2;:::).Thedottedlinesindicateafactsimulation(Sec-tion4)betweendb1anddb2.De nition6.Ifsandtaretwofacts(resp.,atoms),thentheequalitytypeofsandt,denotedeqtp(s;t)isthesetf(i;j)js:i=t:j;with1ijsj;1jtg:Theequalitytypebetweentwofactshencerecordsthepositionsonwhichthefactsshareavalue.Toillustrate,referringtothefactsinthedatabasedb1ofFigure2,wehaveeqtp(s1;t1)=f(1;2);(2;1)g.Homomorphismsandisomorphisms.LetAandBbesetsoffactsandatoms.Afunctionf:X!Yisahomo-morphismfromAtoBifterms(A)Xandf(A)B.Itisapartialhomomorphismiff(AjX)B.Itisanisomor-phismiffisbijective,terms(A)X,andf(A)=B.Conjunctivequeries.A(rule-based)conjunctivequery(CQforshort)QconsistsofaruleoftheformQ:ans( x) a1;:::;an;withans( x);a1;:::;anatoms(n0).Thesetfa1;:::;angiscalledthebodyofQandisdenotedbybody(Q).Theatomans( x)iscalledtheheadofQandisdenotedbyhead(Q).Itisrequiredthatvar(head(Q))var(body(Q)).Wesome-timeswriteQ( x)toindicatethat xisthetupleofvariablesintheheadofQ.Avaluationisapartialfunction:V!U.Avalua-tionisanembeddingofsetofatomsAinadatabasedbifitisahomomorphismfromAtodb.AvaluationisanembeddingofaconjunctivequeryQinadatabasedbifitisanembeddingofbody(Q)indb.Theresultofconjunc-tivequeryQ( x)ondatabasedbisthesetQ(db):=f( x)jisanembeddingofQindbg.Example7.ConsiderthefollowingCQQ:ans(emp) Project(pid;mgr;mgr);WorksOn(pid;emp):WhenappliedtothedatabasesofFigure2itretrievesalltheemployeeswhoworkonaprojectthatismanagedandauditedbythesameperson.Aunionofconjunctivequeries(UCQforshort)isa niteset'ofCQs,allwiththesamehead,sayans( x),whichiscalledtheheadof'.TheresultofUCQ'ondatabasedbistheset'(db):=SfQ(db)jQ2'g. R(a;b) S(b;c;e) R(b;d) S(c;e;f) R(a;k) R(g;h) R(g;i) R(h;j) Figure3:AjointreeforthequeryinExample9.Anatomorfactaisbooleanifitdoesnotmentionanyterm.ACQisbooleanifitsheadis.ACQQisstrictifallvariablesintheheadoccurtogetherinasingleatominthebody.Toillustrate,thequeryfromExample7isstrict,butthefollowingisnot:ans(pid;emp;mgr) Project(pid;mgr;mgr);WorksOn(pid;emp):Minimality.ACQQiscontainedinaCQQ0,denotedQQ0,ifQ(db)Q0(db)foralldatabasesdb.QisequivalenttoQ0,denotedQQ0ifQQ0andQ0Q.ACQQisminimaliftheredoesnotexistanequivalentconjunctivequerywithfeweratomsinthebody.AUCQ'isminimalifallofitsCQsareminimal,and,moreover,Q6Q0foralldistinctQ;Q02'.Obviously,everyUCQhasanequivalentonethatisminimal.Acyclicity.Theacyclicconjunctivequerieswererecog-nizedearlyinthehistoryofdatabasetheoryasanimpor-tantsubclassoftheconjunctivequeriesthathaveaPTimequeryevaluationproblemundercombinedcomplexity[1,32].Therearemanyequivalentde nitionsofwhenaconjunctivequeryisacyclic.Here,wewillusetwodi erentversions:ade nitionbasedonjointreesandade nitionbasedonacyclichypergraphs.De nition8(Jointree).LetAbea nitesetofatoms.AjointreeforAisatreeT(i.e.,aconnectedacyclicundi-rectedgraph)whosenodesaretheatomsinAsuchthat,wheneverthesamevariablexoccursintwoatomsaandbinA,thenxoccursineachatomontheuniquepathlinkingaandb.AjointreeforaconjunctivequeryQisajointreeforbody(Q).Example9.Considerthefollowingquery:Q:ans(a;b) R(a;b);S(b;c;e);R(b;d);S(c;e;f);R(a;k);R(g;h);R(g;i);R(h;j):AjointreeforQisshowninFigure3.De nition10.Aconjunctivequeryisacyclicifithasajointree.Itiscyclicotherwise.ThequeryQfromExample9ishenceacyclic.Hypergraphacyclicity.Ahypergraphisapair(N;E),whereNisasetofnodesandEisasetofedges(alsocalledhyperedges),whicharearbitrarynonemptysubsetsofN.IfQisaconjunctivequery,wede nethehypergraphH(Q)=(N;E)associatedtoQasfollows.ThesetofnodesNconsistofallvariablesoccurringinQ.ForeachatomainthebodyofQ,thesetEcontainsahyperedgeconsistingofallvariablesoccurringina.Itiswell-knownthataconjunctivequeryisacyclicifandonlyifH(Q)isacyclic.Here,acyclicityofahypergraph,alsoreferredtoas -acyclicitybyFagin[8],isde nedasfollows.Apathfromanodestoanodetinahypergraph(N;E)isasequenceofk1edgesE1;:::;Ek2Esuchthat:s2E1,t2Ek,andEi\Ei+16=;,forevery1ik.Twonodes(ortwoedges)areconnectedifthereisapathfromonetotheother.Asetofnodes(orasetofedges)isconnectedifallofitspairsofnodes(resp.edges)areconnected.Thereductionofthehypergraph(N;E)isobtainedbyremovingfromEeachedgethatisapropersubsetofanotheredge.Ahypergraphisreducedifitisequaltoitsreduction.Givenahypergraph(N;E),thesetofpartialedgesgen-eratedbyasetofnodesMNisobtainedbyintersectingtheedgesinEwithM.Thatis,thesetofpartialedgesgeneratedbyMisthereductionoffE\MjE2Eg�f;g.AsetBissaidtobeanode-generatedsetofpartialedgesifBisthesetofpartialedgesgeneratedbyMN,forsomeM.LetFbeaconnected,reducedsetofpartialedges,andletEandFbeinF.LetG=E\F.WesaythatGisanarticulationsetofFifthesetofpartialedgesfH�GjH2Fg�f;gisnotconnected.De nition11(HypergraphAcyclicity).Ablockofare-ducedhypergraphisaconnected,node-generatedsetofpar-tialedgeswithnoarticulationset.Ablockistrivialifitcontainslessthantwomembers.Areducedhypergraphisacyclicifallitsblocksaretrivial.Ahypergraphissaidtobeacyclicifitsreductionis.Observethatnoblockcanbeformedfromexactlytwopar-tialedges.Indeed,thesetwoedgesareeitherdisconnectedortheirintersectionformsanarticulationset.Example12.ConsidertheconjunctivequeryQ2:ans() R(a;b;c);R(a;b;d);R(a;c;d);R(b;c;d):ItshypergraphH(Q2)consistsofthefollowingedges:E1=fa;b;cgE2=fa;c;dgE3=fa;b;dgE4=fb;c;dgNotethatH(Q2)itselfequalsthesetofpartialhyperedgesofH(Q2)generatedbythesetfa;b;c;dg.Thissetisclearlyconnectedandreduced.Furthermore,ithasnoarticulationset,anditisnottrivial.Therefore,H(Q2)itselfformsanon-trivialblockofH(Q2).HenceH(Q2)iscyclic,andsoisQ2.3.STRUCTURALCHARACTERIZATIONGuardedbisimulationisageneralizationofclassicalbi-simulationtorelationaldatabasesintroducedbyAndrekaetal.[2].(Aformalde nitionofguardedbisimulationisprovidedinAppendixAforcompleteness.)Analogouslytomodalbisimulation,guardedbisimulationisformulatedbymeansofbackandforthconditions.Inthissection,wein-troduceguardedsimulationasavariantofguardedbisim-ulationwithoutthebackcondition,andproveTheorem5.Towardsthis,westartwiththede nitionoffreeacyclicity.3.1FreeAcyclicityTheextensionofCQQ,denotedbyQ+,istheCQob-tainedbyaddinghead(Q)asanatomtothebody. aso-calledcompactwinningstrategyfortheexistentialk-covergamebetweentworelationalstructures,forthespecialcasewherek=1.ChenandDalmaulinktheexistenceofwinningstrategiesforthek-covergametoinvariancebycon-junctivequeriesofso-calledcoverwidth(alsoknownasgen-eralizedhypertreewidth)atmostk.Sinceitisknownthattheconjunctivequeriesofcoverwidth1areexactlytheACQs(e.g.,[7,13]),itisnotdiculttoobtainthefollowingfromtheirresults.Proposition18.Thefollowingareequivalent.db1; agdb2; bForallFACQsQ,if a2Q(db1)then b2Q(db2).3.3CharacterizinginvarianceunderguardedsimulationProposition18impliesthattheFACQsareinvariantunderguardedsimulation.ItalsoimpliesthatanyFOde nablequerythatisequivalenttoaunionofFACQsmustbein-variantunderguardedsimulation.ToobtainTheorem5,therefore,itremainstoprovethatanyFOde nablequerythatisinvariantunderguardedsimulationisequivalenttoaunionofFACQs.Wedevotetherestofthissectiontothisproof,whichstartswiththefollowingobservation.Proposition19.If'isaFOformulainvariantunderguardedsimulation(on nitedatabases)then'isequiva-lent(inthe nite)toaUCQ.Proof.Everyhomomorphismgivesrisetoaguardedsim-ulation.Indeed,ifhisahomomorphismfromdb1todb2thatmaps ato bthenitisreadilyveri edthatthesetS:=fhj ag[fhjXjXguardedindb1gisaguardedsimulationfromdb1[fans( a)gtodb2[fans( b)g.Hencedb1; agdb2; b,sincehj a2Smaps ato b.Thensince'isinvariantunderguardedsimulations,itisalsoinvariantunderhomomor-phisms.ByRossman'stheorem(Theorem3),'ishenceequivalenttoaUCQ. Now xthroughouttheremainderofthissectionanFOformula'( x)invariantunderguardedsimulation.ByPropo-sition19wemayassumew.l.o.g.that'isaUCQ.Further-more,wemayassumew.l.o.g.thatthisUCQisminimal.NowassumeforthepurposeofcontradictionthatnounionofFACQsexpresses'.TheninparticularthereexistssomeCQQ( x)in'thatisnotfreelyacyclic,i.e.,Q+iscyclic.FromQwewillconstructpairs(canondb; a)and(unrolldb; b)suchthatcanondb; agunrolldb; band a2'(canondb)but b62'(unrolldb).Thenobviously,'isnotinvariantunderguardedsimulation,yieldingthedesiredcontradiction.Thede nitionofcanondbandunrolldbisasfollows.Thecanonicaldatabase.Thedatabasecanondbissimplywhatisnormallycalledthe\canonicaldatabase"(or\frozen"database)forQinthetheoryofconjunctivequeries.For-mally, xforeveryvariablex2Qauniquedatavaluex2Usuchthatthefunctionfreezemappingx7!xforallx2var(Q)isabijection.Letcanondb:=freeze(body(Q))and a:=freeze( x).Byconstruction,freezeisanembeddingofQincanondb.Therefore,Lemma20. a2Q(canondb)'(canondb).Theunrolleddatabase.SinceQ+iscyclicthehyper-graphH(Q+)containsanontrivialblock.Fixsuchanon-trivialblockB,aswellasadistinguishedhyperedgeF2B.Letfx1;:::;xngbethevariablesmentionedinQ.We xasetU=fx1;:::;xn;x1;:::;xngUofpairwisedis-tinctvalues.Inwhatfollows,wecallxithewhitecoloredversionofxi,andxitheblackcoloredversionofxi.Letvar(B)denotethesetofallvariablesthataremen-tionedinthehyperedgesofblockB.Wede neforeveryVvar(B)thefunctionclrV:var(Q)!Uby:clrV(v)=vv62var(B)orv2VclrV(v)=vv2var(B)andv62V:Intuitively,clrVisafunctionthatmapsvariablestovaluesby\coloring"thevariables.VariablesnotmentionedinBarecoloredwhite,whileavariablevmentionedinBiscoloredwhiteifvisinV,andblackotherwise.De nition21(Covering).LetE,E0,andVbethreesetsofvariables.WesaythatEcoversE0w.r.t.V,denotedEwVE0,ifE\VE0\V.WeabbreviateEwvar(B)E0byEwE0andwriteE=E0andE=VE0todenotethecorrespondingstrictrelations.De nition22(Maximumintersections).LetB=FdenotethesetofallpartialhyperedgesE2BnfFgthathaveamaximalintersectionwithFamongthehyperedgesinBnfFg.Thatis,B=FconsistsofallE2BnfFgforwhichtheredoesnotexistE02BnfFgwithE0=FE.LetM\bethesetofmaximumintersectionsofpartialhyperedgesofBnfFgwithF,M\:=fE\FjE2B=Fg.Notethat,sinceBisnontrivial,thecardinalityofM\isatleast2,andallintersectionsinM\arenonempty.AlsonotethatforanyA2M\,wehaveF)AandhenceFwA.Example23.ConsiderthequeryQ1:ans() R(a;b;d);R(c;a;d);S(b;c;d;e);T(e;f);T(f;g):Itisreadilyveri edthatthesetB1=ffa;b;dg;fb;c;dg;fc;a;dggformsablockofH(Q1+).ConsiderthehyperedgeF=fb;c;dgofthisblock.ThenM\=ffb;dg;fc;dgg,result-ingfromtheintersectionswiththehyperedgesfa;b;dgandfc;a;dgrespectively.Next,considerthequeryQ2:ans() R(a;b;c);R(a;b;d);R(a;c;d);R(b;c;d)fromExample12.Itisreadilyveri edthatthesetB2=ffa;b;cg;fa;b;dg;fa;c;dg;fb;c;dggformsablockofH(Q2+)(cf.,e.g.,Example12).ConsiderthehyperedgeF=fa;b;dgofthisblock.ThenM\isthesetffa;bg;fa;dg;fb;dgg,resultingfromtheintersectionswiththehyperedgesfa;b;cg,fa;c;dg,andfb;c;dgrespectively.Wenowturntotheconstructionofunrolldb.De nition24(Unrolleddatabase).De neFtobethesetoffunctionsthatcontains ofeachfact.NoteinparticularthatitisnotpossibletoembedQ1(resp.Q2)intotheunrolleddatabaseofQ1(resp.Q2).Indeed,toconstructsuchanembedding,wewouldessentiallyhaveto ndanedge-label-preservinggraphhomomorphismofthegraphinFigure4(a)(resp.Figure4(c))tothegraphinFigure4(b)(resp.Figure4(d)),whichisreadilyveri edtobeimpossible.Byde nitionofQ+,H(Q+)containsahyperedgeXwithvar( x)X.Nowobservethat,byconstruction,FcontainsforeveryhyperedgeXofH(Q+)afunctionfwithdomainX.Fixf2Fwithvar( x)dom(f)arbitrarilyandlet b=f( x).Letfreeze�1denotetheinverseoffreeze.Thefollow-inglemmasandpropositionsshowthat(canondb; a)and(unrolldb; b)havebeenconstructedasdesired.Lemma26.ThesetS=fffreeze�1jf2Fgisaguardedsimulationofcanondbinunrolldb.Proofsketch.Itsucestoprovethateachf2Fisapar-tialhomomorphismfrombody(Q)intounrolldbandthatFsatis estheguardedforthcondition.Indeed,sincefreeze�1isanisomorphismfromcanondbtobody(Q),Swillthenbeasetofpartialhomomorphismsfromcanondbintounrolldbthatsatisfytheguardedforthcondition.Establishingthateachf2Fisapartialhomomorphismisstraightforward;establishingtheguardedforthconditionisdonebyatech-nicalcaseanalysis. Proposition27.canondb; agunrolldb; b.Proof.Clearly b=f(freeze�1( a)).Hencecanondb; agunrolldb; bsinceS=fgfreeze�1jg2FgisaguardedsimulationoffreezeinunrolldbbyLemma26,andsinceffreeze�12Smaps a7! b. Proposition28. b62Q(unrolldb).Proofsketch.Theproofisbycontradiction.Theessentialreasoning(glossingovermanyimportantdetails)isasfol-lows.Letans( x)betheheadofQandletunrolldb+denoteunrolldb[fans( b)g.First,weshowthatif b2Q(unrolldb)thentheremustalsoexistanembeddinghofQ+inunrolldb+thatmapsx7!xorx7!x;foreveryx2var(Q).Inparticular,xwillnotbemappedtoacoloredversionofanothervariable.Asaconsequence,wecanestablishthathmapseachatoma2body(Q+)toacopyofainunrolldb+,andnottoacopyofsomeotheratom.Then,sinceFisapartialhyperedgeofH(Q+)thereexistssomeatomainQ+thatcontainsallvariablesinF.Since,bythe rstbullet,hmapsatomsinbody(Q+)totheircopiesinunrolldb+,weknowinparticularthath(a)isacopyofa.Then,sinceacontainsallvariablesinF,thereexistssomeA2M\suchthateveryvariableinA(Fiscoloredwhiteinh(a)andeveryvariableinFnAiscoloredblackinh(a).SinceA2M\thereexistsE12BsuchthatA=E1\F.Moreover,sinceBisablockofH(Q+),AcannotbeanarticulationsetofB.Assuch,theremustexistapathE1;:::;En;F2BthatdoesnotneedtotraverseanynodeinA.Thatis,(Ei\Ei+1)nA6=;for1in,and(En\F)nA6=;.Now,itispossibletoestablishthath(Ei)consistsonlyofwhitecoloredvariables,forall1in.Thisyieldsthedesiredcontradiction.Indeed,since(En\F)nAisnon-emptythereissomevariablexthatisbothinEnandF,butnotinA.Sincex2Ei,hmustmapx7!x.Ontheotherhand,sincex2FnA,wehavealreadyestablishedbeforethathmustmapx7!x. Proposition29. b62'(unrolldb).Crux.Wealreadyknowthat b62Q(unrolldb)byProposi-tion28.Suppose,forthepurposeofcontradiction,thatthereissomeotherCQQ02'suchthat b2Q0(unrolldb).Inparticular,thereexistsanembeddinghfromQ0intounrolldbsuchthath( x)= b.Now,considerthefunctiondecopy:im(h)!var(Q)suchthatdecopy(x)=xforeveryx2im(h)decopy(x)=xforeveryx2im(h):Observethatbyde nitionofunrolldb,Qcontainsanatomdecopy(s)foreachfacts2unrolldbbuiltovertheimageofh.Hence(decopyh)isahomomorphismofbody(Q0)intobody(Q).Furthermore,decopy( b)= xsince b=f(x)forsomef2F.ByChandraandMerlin'sclassicalresult[6],thisimpliesthatQiscontainedinQ0,contradictingthefactthat'isminimal. Sincecanondb; agunrolldb; band a2'(canondb)but b62'(unrolldb)wehaveourdesiredcontradiction:'isnotinvariantunderguardedsimulation.This nishestheproofofTheorem5.4.GUARDEDVSFACTSIMULATIONWenextpresentanalternatede nitionforguardedsimu-lation,calledfactsimulation,andshowthatfactsimulationnaturallyyieldsapproximationsthataretightlylinkedtoin-varianceoffreelyacyclicconjunctivequerieswhosejointreeisofaspeci cboundedheight.4.1FactsimulationDe nition30.Afactsimulationofdatabasedb1indata-basedb2isanonemptybinaryrelationFdb1db2be-tweenthefactsofdb1anddb2suchthatforallfactss2db1andt2db2withsFt:sandtcarrythesamerelationsymbol,i.e.,rel(s)=rel(t);foralls02db1thereexistst02db2witheqtp(s;s0)eqtp(t;t0)ands0Ft0.Example31.Toillustrate,thedottedlinesinFigure2showafactsimulationFofdatabasedb1indb2.Notethatfactsimulationisnecessarilytotalondb1(i.e.,everyfactofdb1occursinF). Now,letdb1anddb2betwodatabases,andletsandtbefacts.Wesaythat(db1;s)isfactsimulatedby(db2;t),denoteddb1;sfdb2;t,ifthereexistsafactsimulationFofdb1[fsgindb2[ftgwithsFt.Moreoverif aand baretuplesofdatavalues,thendb1; afdb2; bifdb1;ans( a)fdb2;ans( b)withansarelationsymbolofthesamearityas aand bthatdoesnotoccurindb1ordb2.Werequirethefollowingnotionstoestablishthatfactsimulationisequivalenttoguardedsimulation.Letstdenote,foreverys=s(a1;:::;ak)andt=t(b1;:::;bl),therelationf(ai;bi)j1imin(k;l)g.Whenwearesurethatstisafunction,weusecommonnotationforfunctions,suchas(st)(a)todenotetheuniquevalueassociatedtoabythefunctionst.Nowde ne,foraguardedsimulationS,F[S]:=f(s;f(s))jf:X!Y2S;s2db1;andval(s)Xg:Alsode ne,forafactsimulationF,S[F]:=fstj(s;t)2Fg:Forexample,therelation(1;Amy;Lex)(c;Ned;Ned)isanelementofS[F],forfactsimulationFofExample31.Thefollowingpropositionestablishesthecorrespondencebetweenguardedsimulationandfactsimulation.Proposition32.1.IfSisaguardedsimulationofdb1indb2thenF[S]isafactsimulationofdb1indb2.2.IfFisafactsimulationofdb1indb2thenS[F]isaguardedsimulationofdb1indb2.Itfollowsthatfactsimulationprovidesanalternativedef-initionforguardedsimulation,inthefollowingsense.Theorem33.Fordatabasesdb1anddb2andtuples aand bitholdsthatdb1; agdb2; bi db1; afdb2; b.4.2ApproximatefactsimulationInapplicationsofclassicalsimulationindatamanagementitisknownthat,iftherearefewnodesingraphGthataresimilar,thenthecorrespondingstructuralindexofGmaybeofthesamesizeasGitself,andhencebetoolargetoactasasuccinctsummaryofthestructureofG[18].Insuchsituations,ithasbeenproposedtoapproximatesimulationsandtogroupnodesintheindexwithrespecttotheseap-proximationsinsteadofwithrespecttofullsimulation[18].Towardsasuitableapproximationofguardedsimulation,weintroducethefollowingversionoffactsimulation.De nition34(Approximatefactsimulation).Letdb1anddb2betwodatabases.Adepth-kapproximationoffactsim-ulationofdb1indb2,orfactk-simulationforshort,isase-quenceFkFk�1F0ofbinaryrelationssuchthatF0consistsofallpairs(s;t)2db1db2withrel(s)=rel(t)andeqtp(s;s)eqtp(t;t);andthefollowingpropertyholdsforevery1jkandallsandtwithsFjt.Foreverys02db1thereexistst0indb2suchthateqtp(s;s0)eqtp(t;t0)ands0Fj�1t0(FactForth).Wesaythat(db1;s)isk-factsimulatedby(db2;t),de-noteddb1;skfdb2;t,ifthereexistsafactk-simulationFkFk�1F0fromdb1[fsgtodb2[ftgwithsFkt.Thenotionofk-factsimilaritybetween(db1; a)and(db2; b)with aand btuplesisnowde nedintheobviousway.Observethatdb1;sfdb2;ti db1;skfdb2;tforeveryk0.Wenowlinkapproximateguardedsimulationtoindistin-guishabilitybyFACQsofboundedheight.Here,theheightofaFACQisde nedasfollows.Recallthatingraphthe-ory,thedistancebetweentwoconnectednodesuandvinanundirectedgraphGisthelengthofashortestpathbetweenuandv.TheeccentricityofuinG,denotedecc(u;G)isthemaximumdistanceofutoanyothernodetowhichitisconnected.Example35.Consider,forinstance,thejointreeTofFigure3.TheeccentricityofR(a;b)inTis2whiletheeccentricityofR(g;h)inTis3.De nition36.LetAbeasetofatoms.WhenAisacyclic,theeccentricityofatoma2A,denotedecc(a;A)istheminimumeccentricityofaamongalljointreesTforA.TheheightofaFACQQistheeccentricityofhead(Q)inbody(Q+).Inotherwords,theheightofQistheminimumheightofanyjointreeTforbody(Q+),whenconsideredasbeingrootedathead(Q).Example37.ContinuingExample35,queryQofExample9hasaheightof3.Proposition38.Letk0beanaturalnumber.Thefol-lowingareequivalent.(1)db1; akfdb2; b(2)ForallFACQsQofheightk,if a2Q(db1)then b2Q(db2).NotethatProposition38impliesProposition18(yettheconverseisnottrue).Closingremark.Weclosethissectionwiththefollowingimportantremark.Itisobviouslypossibletode neapprox-imationsofguardedsimulationinananalogouswayasap-proximationsoffactsimulation:adepth-kapproximationisasequenceSkSk�1S0ofpartialhomomor-phismssuchthateachSisatis estheguardedforthpropertytoSi�1,fori1.Whilefullguardedsimulationcoincideswithfactsimulation(cf.Theorem33),theirapproximationsdonot.Inparticular,0gisdistinctfrom0f.Indeed,con-siderdb1=fr(a;b);r(b;a)ganddb2=fr(1;2)g.Notethatthereisnopartialhomomorphismfromdb1todb2withdo-mainfa;bg.Therefore,db1isnotguarded0-simulatedbydb2.Yet,db1db2isafact0-simulationofdb1indb2.5.GUARDEDSTRUCTURALINDEXINGRecallfromtheIntroductionthatingraphdatamanage-ment,astructuralindexisacompactrepresentationofadatagraph.Typically,thiscompactrepresentationisob-tainedbygroupingthenodesintheinputgraphthataresimilarorbisimilar.Structuralcharacterizationsofqueryin-variancethenenableecientretrievaloftherelevantnodesofthegraphforvariousgraphquerylanguages.Inthissectionweanalogouslyde neguardedstructuralindexesascompactrepresentationsofrelationaldataob-tainedbygroupingfactsaccordingtoguardedsimilarity. [6]A.K.ChandraandP.M.Merlin.Optimalimplementationofconjunctivequeriesinrelationaldatabases.InSTOC1977,pages77{90.ACM,1977.[7]H.ChenandV.Dalmau.Beyondhypertreewidth:Decompositionmethodswithoutdecompositions.InCP,volume3709,pages167{181.Springer,2005.[8]R.Fagin.Degreesofacyclicityforhypergraphsandrelationaldatabaseschemes.J.ACM,30(3):514{550,1983.[9]W.Fan.Graphpatternmatchingrevisedforsocialnetworkanalysis.InICDT2012,pages8{21.ACM,2012.[10]W.Fan,J.Li,X.Wang,andY.Wu.Querypreservinggraphcompression.InSIGMOD2012,pages157{168.ACM,2012.[11]G.H.L.Fletcher,D.VanGucht,Y.Wu,M.Gyssens,S.Brenes,andJ.Paredaens.AmethodologyforcouplingfragmentsofXPathwithstructuralindexesforXMLdocuments.Inf.Syst.,34(7):657{670,2009.[12]J.Flum,M.Frick,andM.Grohe.Queryevaluationviatree-decompositions.J.ACM,49(6):716{752,2002.[13]G.Gottlob,N.Leone,andF.Scarcello.Robbers,marshals,andguards:gametheoreticandlogicalcharacterizationsofhypertreewidth.JCSS,66(4):775{808,2003.[14]G.GouandR.Chirkova.EcientlyqueryinglargeXMLdatarepositories:asurvey.TKDE,19(10):1381{1403,2007.[15]E.Gradel,C.Hirsch,andM.Otto.Backandforthbetweenguardedandmodallogics.TOCL,3(3):418{463,2002.[16]M.Gyssens,J.Paredaens,D.V.Gucht,andG.H.L.Fletcher.StructuralcharacterizationsofthesemanticsofXPathasnavigationtoolonadocument.InPODS2006,pages318{327.ACM,2006.[17]C.Hirsch.Guardedlogics:algorithmsandbisimulation.PhDthesis,TUAachen,2002.[18]R.Kaushik,P.Shenoy,P.Bohannon,andE.Gudes.Exploitinglocalsimilarityforindexingpathsingraph-structureddata.InICDE2002,pages129{140.IEEE,2002.[19]P.G.KolaitisandM.Y.Vardi.Conjunctive-querycontainmentandconstraintsatisfaction.InPODS1998,pages205{213.ACM,1998.[20]D.Leinders,M.Marx,J.Tyszkiewicz,andJ.V.denBussche.Thesemijoinalgebraandtheguardedfragment.JOLLI,14(3):331{343,2005.[21]A.Matono,T.Amagasa,M.Yoshikawa,andS.Uemura.Apath-basedrelationalRDFdatabase.InADC2005,pages95{103.AustralianComputerSociety,2005.[22]T.MiloandD.Suciu.Indexstructuresforpathexpressions.InICDT1999,pages277{295.Springer,1999.[23]M.Otto.Highlyacyclicgroups,hypergraphcovers,andtheguardedfragment.J.ACM,59(1):5:1{5:40,2012.[24]F.Picalausa.Guardedstructuralindexes:theoryandapplicationtorelationalRDFdatabases.PhDthesis,UniversiteLibredeBruxelles,2013.[25]F.Picalausa,Y.Luo,G.H.L.Fletcher,J.Hidders,andS.Vansummeren.Astructuralapproachtoindexingtriples.InESWC,volume7295,pages406{421.Springer,2012.[26]E.Prud'hommeauxandA.Seaborne.SPARQLquerylanguageforRDF.Technicalreport,W3CRecommendation,2008.[27]P.Ramanan.CoveringindexesforXMLqueries:bisimulation�simulation=negation.InVLDB2003,pages165{176,2003.[28]B.Rossman.Homomorphismpreservationtheorems.J.ACM,55(3):15:1{15:53,2008.[29]D.Sangiorgi.Introductiontobisimulationandcoinduction.CambridgeUniversityPress,2012.[30]T.Tran,G.Ladwig,andS.Rudolph.ManagingstructuredandsemistructuredRDFdatausingstructureindexes.TKDE,25(9):2076{2089,2013.[31]O.Udrea,A.Pugliese,andV.S.Subrahmanian.GRIN:agraphbasedRDFindex.InAAAI2007,pages1465{1470,2007.[32]M.Yannakakis.Algorithmsforacyclicdatabaseschemes.InVLDB1981,pages82{94.IEEE,1981.APPENDIXA.GUARDEDBISIMULATIONThede nitionofguardedbisimulationduetoAndrekaetal.[2]isrecalledhereforcompleteness.GiventwosetsAandBoffactsandatoms,afunctionf:X!YisapartialisomorphismfromAtoBifitisbijectiveandf(AjX)=BjY.De nition44(Guardedbisimulation).Letdb1anddb2bedatabases.Aguardedbisimulationfromdb1todb2isanonemptysetIof nitepartialisomorphismsfromdb1todb2suchthatthefollowingforthandbackconditionsaresatis ed.Foreveryf:X!Y2IandforeverysetX0guardedindb1,thereexistsapartialisomorphismg:X0!Y02IsuchthatgandfagreeonX\X0.(GuardedBi-simulationForth).Foreveryf:X!Y2IandforeverysetY0guardedindb2,thereexistsapartialisomorphismg:X0!Y02Isuchthatg�1andf�1agreeonY\Y0.(GuardedBisimulationBack)