Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb - PDF document

Download presentation
Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb
Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb

Embed / Share - Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb


Presentation on theme: "Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb"— Presentation transcript


GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 2,3 prof 4 prof 5 phd 6,7 stud adv adv adv sup Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andbachelorstudents,withadvisor-ofandsupervisesrelationships.GraphIisasimulation-basedstructuralindexforG.structuralindexesareobtainedbygroupingtogethernodesintheinputgraphthataresimilar(respectively,bisimilar).Theseindexesareknowntobecoveringfordi erentfrag-mentsoftheXPathquerylanguage[11,22,27].Thatis,givenaqueryinthefragment,itsevaluationonthestructuralin-dexwillprovideexactlythenodesthatwouldbereturnedhadthequerybeenevaluatedontheoriginaldata.InFig-ure1,forexample,theindexIisactuallyasimulation-basedindexobtainedbygroupingtogetherthesimilarnodesinG.(And,asExample1hasalreadyillustrated,certainqueriescanbeimmediatelyansweredonIinsteadofG.)Variationsofthisideaunderlyingstructuralindexinghavealsobeenusedingraphdatamanagementtocompress[5,10]graph-structureddatasets,aswellasaidinqueryprocess-ing[18,25,30],anddataanalytics[9].Giventhenumeroussuccessfulapplicationsofstructuralindexingingraphdatabases,onemayaskthequestion:Isitpossibletoextendstructuralindexingfromgraphdatabasestoarbitraryrelationaldatabases?Inthispaperandcom-panionwork[24,25],weembarkonaformalstudyofthisquestion,andshowthatithasanarmativeanswer,bothfromatheoreticalandpracticalperspective.Generalmethodology.Ourstudyfollowsthemethodol-ogyproposedbyFletcheretal.[11]forthedesignofcoveringstructuralindexesforagiventargetquerylanguageQ.Thismethodologyrequiresthedevelopmentofthefollowingthreecomponents.(1)Alanguage-independentstructuralcharacterizationofqueryinvariance,characterizingwhendataobjects(inoursetting:relationaltuples)cannotbedistinguishedbyanyqueryinthetargetquerylanguageQ.(2)AnecientalgorithmtogrouptogetherdataobjectsthatcannotbedistinguishedbyanyqueryinthetargetlanguageQ.(3)Adatastructure(i.e.,theindex)thatexploitsthisgroup-ingtosupportqueryansweringbymeansoftheindexinsteadofrevertingtothefulldatabase.Inthispaper,wefocusontheconjunctivequeriesasourtar-getquerylanguage,anddevoteourstudytothestructuralcharacterizationrequiredforcomponent(1).Components(2)and(3)aredevelopedincompanionwork[24,25].Actu-ally,wewillfocusonthoseconjunctivequeriesthat\select"tuplesintheinputdatabaseratherthancomputenewtu-ples.Ourfocusonthisfragmentoftheconjunctivequeriesasthetargetlanguageinsteadofallconjunctivequeriesismotivatedbythefactthat,atleastforthepurposeofobtain-ingsuccinctstructuralindexes,theclassofallconjunctivequeriesistoolarge.Toclarifythisclaim,wenotethatingraphdatabasesthereisaknowntrade-o betweenthearityofqueriesinQandthesizeofthecorrespondingstructuralindexes:theindexsizeincreaseswiththearity.Ingraphdatabases,forexam-ple,Qisusuallyalanguageofnode-selecting(i.e.,unary)queries.Inthissetting,thedataobjectsinthemethodologyofFletcheretal.arenodes;thestructuralcharacterizationisgivenby(bi)similarity;andthestructuralindexdatastruc-tureisbuiltfromthegroupsofindistinguishablenodes,asillustratedinExample1.Thegroupsofindistinguishablenodesarenecessarilydisjoint.Therefore,therecanbeatmostasmanygroupsastherearenodesintheinputgraph,and,hence,thestructuralindexisalwaysguaranteedtobeatmostthesizeoftheinputgraph(althoughusuallymuchsmaller).NowconsiderthesettingwhereQisaclassofk-arygraphqueries(k2)instead.MiloandSuciuhaveshownthatessentiallythesameapproachasbeforecanbeusedtobuildstructuralindexesforQ[22].However,thedataobjectsbecomek-tuplesofnodes;thestructuralcharacterizationofindistinguishabilityisageneralizationof(bi)simulationtok-tuples;andthestructuralindexiscomposedofthegroupsofindistinguishablek-tuples.Essentially,wearenolongerbuildingasummaryoftheinputgraph,butasummaryofthepossibleoutputspaceofqueriesinQ|whichcanbevastlylargerthantheinputgraph.Inparticular,sincethenumberofk-tuplesfork3signi cantlyexceedsthesizeoftheinputgraph,thenumberofgroupsofindistinguishablek-tuples(andhence,theindex)exceedsthesizeoftheinputgraphinpractice.Clearly,thisdefeatsthepurposeofthestructuralindexasasuccinctgraphsummary.Sinceananalogousreasoningappliestotherelationalset-ting,wearethereforenotinterestedinastructuralcharac-terizationofindistinguishabilitythatappliestoallconjunc-tivequeries(ofarbitraryarity),butinacharacterizationthatisapplicabletothoseconjunctivequeriesthat\select"tuplesintheinputdatabase.Intheliterature,thesecon-junctivequeriesareknownasthestrict(orvariable-guarded)conjunctivequeries[12].Formally,arule-basedconjunctivequeryisstrictifallvariablesintheheadoccurtogetherinasingleatominthebody.(SeealsoSection2.)Ourfocusonthestrictconjunctivequeriesasthetargetquerylanguageimpliesthatwewillnotbeabletoanswernon-strictqueriesonthestructuralindexdirectly.Never-theless,weshowincompanionworkthatqueryprocessingofallconjunctivequeriescanbene tfromthepresenceoftheseindexes[24,25].(SeealsoSection5.)Overviewofapproachandmainresult.Whatisagoodnotionofindistinguishabilitybystrictconjunctivequeries?Itiswellknownthatallconjunctivequeries(strictandnon-strict)areinvariantunderhomomorphisms(i.e.,structurepreservingfunctionsfromdatabasestodatabases),inthefollowingsense.1 1Fortheformaldevelopmentinthispaper,itwillbeconve-nienttofocusontheconjunctivequeriesthatdonotmentionanyconstants.Allresultscanbeextendedtoaccountforthepresenceofconstants,muchinthesamewayase.g.,theclassicalresultongenericityinrelationaldatabasescanbeextendedtoC-genericity,preservingconstantsinthe nitesetC[3]. Sincetheseconcernsaboutindexsizecanbetransferredtotherelationalsetting,itishenceusefultodevelopapproxi-mateversionsofguardedsimulation.Tothisend,weintroduceapproximationsoffactsimu-lationanalogouslytohowapproximationsofclassicalsim-ulationarede ned.Theseapproximationsareprovedtobetightlylinkedtoinvarianceoffreelyacyclicconjunctivequerieswhosejointreeisofboundedheight.Incompanionwork[24,25]weshowthattheseapproximationscanbothbeecientlycomputedandusedtoengineerpracticalguardedsimulation-basedstructuralindexesforrelationalqueryen-ginesoperatingonSemanticWebdata.Contributionsandorganization.Insummary,ourcon-tributionsareasfollows.(1)Weintroduceguardedsimula-tionasavariantofguardedbisimulation,andprovethechar-acterizationstatedinTheorem5(Section3).(2)Weintro-ducefactsimulationasanalternativede nitionofguardedsimulation,andshowthatapproximationsoffactsimulationaretightlylinkedtoinvarianceoffreelyacyclicconjunctivequeriesofboundedheight(Section4).(3)Weshowhowstructuralindexesbasedon(approximationsof)factsimu-lationscanbede ned(Section5).Webegin,however,inSection2withintroducingthere-quiredbackground.2.PRELIMINARIESAtoms,facts,anddatabases.Fromtheoutset,weas-sumegivena xeduniverseUofatomicdatavalues,a xeduniverseVofvariables,anda xedsetSofrelationsym-bols,allin niteandpairwisedisjoint.Wecallatomicdatavaluesandvariablescollectivelyterms.Everyrelationsym-bolr2Sisassociatedwithanaturalnumbercalledthearityofr.Anatom(respectivelyafact)isanexpressionoftheformr(a1;:::;ak)withr2Sarelationsymbol;kthearityofrelationsymbolr;andeachofthea1;:::;ak2Vavariable(respectivelyanatomicdatavalue).ArelationaldatabaseoverSisa nitesetdboffacts.Notation.Inwhatfollows,wedenotethesetofallterms(respectivelyvariables,respectivelydatavalues)occurringinamathematicalobjectX(suchas,e.g.anatom,fact,orsetofatomsandfacts)byterms(X)(resp.var(X),resp.val(X)).Wewriterel(a)fortherelationsymbolrofatomorfacta=r(a1;:::;ak).Wewritejajforthearitykofrel(a)anda:iforthei-thtermaiina,provided1ijaj.Wedenotetuples(a1;:::;ak)as a,andgivethenaturalsemanticstoj ajand a:i.TherestrictionofasetAofatomsorfactstoasetoftermsXU[V,denotedAjX,consistsofallatomsorfactsinAbuiltonlyfromtermsinX,AjX:=fa2Ajterms(a)Xg.Functionsf:X!YwithXandYsetsoftermsareex-tendedpoint-wisetoatoms,facts,tuplesofterms,andsetsthereof.Forinstance,ifa=r(a1;:::;ak)andterms(a)Xthenf(a)=r(f(a1);:::;f(ak)).WedenotebyfjZthere-strictionofthedomainofftothesetX\Zand,extendingthisnotationtoatomsandfacts,denotebyfjatherestric-tionofthedomainofftothesetX\terms(a).Werangeoveratomsbyboldfacelettersdrawnfromthebeginningofthealphabet(a;b;:::)andfactsbyboldfacelettersfromtheendofthealphabet(r;s;:::). Project PIDMgrAuditor s1 1AmyLex s2 2LexAmy s3 3SueSue Databasedb1 WorksOn EmpProj t1 Amy1 t2 Lex2 t3 Sue3 t4 Je rey3 t5 Cathy3 Project PIDMgrAuditor u1 aLivRob u2 bRobLiv u3 cNedNed u4 dEllenFred u5 eFredEllen Databasedb2 WorksOn EmpProj v1 Liva v2 Robb v3 Nedc v4 Bobc v5 Ellend v6 Frede Figure2:Twocompanydatabases.Forfutureref-erence,factsarelabeledwithidenti ers(s1;s2;:::).Thedottedlinesindicateafactsimulation(Sec-tion4)betweendb1anddb2.De nition6.Ifsandtaretwofacts(resp.,atoms),thentheequalitytypeofsandt,denotedeqtp(s;t)isthesetf(i;j)js:i=t:j;with1ijsj;1jtg:Theequalitytypebetweentwofactshencerecordsthepositionsonwhichthefactsshareavalue.Toillustrate,referringtothefactsinthedatabasedb1ofFigure2,wehaveeqtp(s1;t1)=f(1;2);(2;1)g.Homomorphismsandisomorphisms.LetAandBbesetsoffactsandatoms.Afunctionf:X!Yisahomo-morphismfromAtoBifterms(A)Xandf(A)B.Itisapartialhomomorphismiff(AjX)B.Itisanisomor-phismiffisbijective,terms(A)X,andf(A)=B.Conjunctivequeries.A(rule-based)conjunctivequery(CQforshort)QconsistsofaruleoftheformQ:ans( x) a1;:::;an;withans( x);a1;:::;anatoms(n0).Thesetfa1;:::;angiscalledthebodyofQandisdenotedbybody(Q).Theatomans( x)iscalledtheheadofQandisdenotedbyhead(Q).Itisrequiredthatvar(head(Q))var(body(Q)).Wesome-timeswriteQ( x)toindicatethat xisthetupleofvariablesintheheadofQ.Avaluationisapartialfunction:V!U.Avalua-tionisanembeddingofsetofatomsAinadatabasedbifitisahomomorphismfromAtodb.AvaluationisanembeddingofaconjunctivequeryQinadatabasedbifitisanembeddingofbody(Q)indb.Theresultofconjunc-tivequeryQ( x)ondatabasedbisthesetQ(db):=f( x)jisanembeddingofQindbg.Example7.ConsiderthefollowingCQQ:ans(emp) Project(pid;mgr;mgr);WorksOn(pid;emp):WhenappliedtothedatabasesofFigure2itretrievesalltheemployeeswhoworkonaprojectthatismanagedandauditedbythesameperson.Aunionofconjunctivequeries(UCQforshort)isa niteset'ofCQs,allwiththesamehead,sayans( x),whichiscalledtheheadof'.TheresultofUCQ'ondatabasedbistheset'(db):=SfQ(db)jQ2'g. R(a;b) S(b;c;e) R(b;d) S(c;e;f) R(a;k) R(g;h) R(g;i) R(h;j) Figure3:AjointreeforthequeryinExample9.Anatomorfactaisbooleanifitdoesnotmentionanyterm.ACQisbooleanifitsheadis.ACQQisstrictifallvariablesintheheadoccurtogetherinasingleatominthebody.Toillustrate,thequeryfromExample7isstrict,butthefollowingisnot:ans(pid;emp;mgr) Project(pid;mgr;mgr);WorksOn(pid;emp):Minimality.ACQQiscontainedinaCQQ0,denotedQQ0,ifQ(db)Q0(db)foralldatabasesdb.QisequivalenttoQ0,denotedQQ0ifQQ0andQ0Q.ACQQisminimaliftheredoesnotexistanequivalentconjunctivequerywithfeweratomsinthebody.AUCQ'isminimalifallofitsCQsareminimal,and,moreover,Q6Q0foralldistinctQ;Q02'.Obviously,everyUCQhasanequivalentonethatisminimal.Acyclicity.Theacyclicconjunctivequerieswererecog-nizedearlyinthehistoryofdatabasetheoryasanimpor-tantsubclassoftheconjunctivequeriesthathaveaPTimequeryevaluationproblemundercombinedcomplexity[1,32].Therearemanyequivalentde nitionsofwhenaconjunctivequeryisacyclic.Here,wewillusetwodi erentversions:ade nitionbasedonjointreesandade nitionbasedonacyclichypergraphs.De nition8(Jointree).LetAbea nitesetofatoms.AjointreeforAisatreeT(i.e.,aconnectedacyclicundi-rectedgraph)whosenodesaretheatomsinAsuchthat,wheneverthesamevariablexoccursintwoatomsaandbinA,thenxoccursineachatomontheuniquepathlinkingaandb.AjointreeforaconjunctivequeryQisajointreeforbody(Q).Example9.Considerthefollowingquery:Q:ans(a;b) R(a;b);S(b;c;e);R(b;d);S(c;e;f);R(a;k);R(g;h);R(g;i);R(h;j):AjointreeforQisshowninFigure3.De nition10.Aconjunctivequeryisacyclicifithasajointree.Itiscyclicotherwise.ThequeryQfromExample9ishenceacyclic.Hypergraphacyclicity.Ahypergraphisapair(N;E),whereNisasetofnodesandEisasetofedges(alsocalledhyperedges),whicharearbitrarynonemptysubsetsofN.IfQisaconjunctivequery,wede nethehypergraphH(Q)=(N;E)associatedtoQasfollows.ThesetofnodesNconsistofallvariablesoccurringinQ.ForeachatomainthebodyofQ,thesetEcontainsahyperedgeconsistingofallvariablesoccurringina.Itiswell-knownthataconjunctivequeryisacyclicifandonlyifH(Q)isacyclic.Here,acyclicityofahypergraph,alsoreferredtoas -acyclicitybyFagin[8],isde nedasfollows.Apathfromanodestoanodetinahypergraph(N;E)isasequenceofk1edgesE1;:::;Ek2Esuchthat:s2E1,t2Ek,andEi\Ei+16=;,forevery1ik.Twonodes(ortwoedges)areconnectedifthereisapathfromonetotheother.Asetofnodes(orasetofedges)isconnectedifallofitspairsofnodes(resp.edges)areconnected.Thereductionofthehypergraph(N;E)isobtainedbyremovingfromEeachedgethatisapropersubsetofanotheredge.Ahypergraphisreducedifitisequaltoitsreduction.Givenahypergraph(N;E),thesetofpartialedgesgen-eratedbyasetofnodesMNisobtainedbyintersectingtheedgesinEwithM.Thatis,thesetofpartialedgesgeneratedbyMisthereductionoffE\MjE2Eg�f;g.AsetBissaidtobeanode-generatedsetofpartialedgesifBisthesetofpartialedgesgeneratedbyMN,forsomeM.LetFbeaconnected,reducedsetofpartialedges,andletEandFbeinF.LetG=E\F.WesaythatGisanarticulationsetofFifthesetofpartialedgesfH�GjH2Fg�f;gisnotconnected.De nition11(HypergraphAcyclicity).Ablockofare-ducedhypergraphisaconnected,node-generatedsetofpar-tialedgeswithnoarticulationset.Ablockistrivialifitcontainslessthantwomembers.Areducedhypergraphisacyclicifallitsblocksaretrivial.Ahypergraphissaidtobeacyclicifitsreductionis.Observethatnoblockcanbeformedfromexactlytwopar-tialedges.Indeed,thesetwoedgesareeitherdisconnectedortheirintersectionformsanarticulationset.Example12.ConsidertheconjunctivequeryQ2:ans() R(a;b;c);R(a;b;d);R(a;c;d);R(b;c;d):ItshypergraphH(Q2)consistsofthefollowingedges:E1=fa;b;cgE2=fa;c;dgE3=fa;b;dgE4=fb;c;dgNotethatH(Q2)itselfequalsthesetofpartialhyperedgesofH(Q2)generatedbythesetfa;b;c;dg.Thissetisclearlyconnectedandreduced.Furthermore,ithasnoarticulationset,anditisnottrivial.Therefore,H(Q2)itselfformsanon-trivialblockofH(Q2).HenceH(Q2)iscyclic,andsoisQ2.3.STRUCTURALCHARACTERIZATIONGuardedbisimulationisageneralizationofclassicalbi-simulationtorelationaldatabasesintroducedbyAndrekaetal.[2].(Aformalde nitionofguardedbisimulationisprovidedinAppendixAforcompleteness.)Analogouslytomodalbisimulation,guardedbisimulationisformulatedbymeansofbackandforthconditions.Inthissection,wein-troduceguardedsimulationasavariantofguardedbisim-ulationwithoutthebackcondition,andproveTheorem5.Towardsthis,westartwiththede nitionoffreeacyclicity.3.1FreeAcyclicityTheextensionofCQQ,denotedbyQ+,istheCQob-tainedbyaddinghead(Q)asanatomtothebody. aso-calledcompactwinningstrategyfortheexistentialk-covergamebetweentworelationalstructures,forthespecialcasewherek=1.ChenandDalmaulinktheexistenceofwinningstrategiesforthek-covergametoinvariancebycon-junctivequeriesofso-calledcoverwidth(alsoknownasgen-eralizedhypertreewidth)atmostk.Sinceitisknownthattheconjunctivequeriesofcoverwidth1areexactlytheACQs(e.g.,[7,13]),itisnotdiculttoobtainthefollowingfromtheirresults.Proposition18.Thefollowingareequivalent.db1; agdb2; bForallFACQsQ,if a2Q(db1)then b2Q(db2).3.3CharacterizinginvarianceunderguardedsimulationProposition18impliesthattheFACQsareinvariantunderguardedsimulation.ItalsoimpliesthatanyFOde nablequerythatisequivalenttoaunionofFACQsmustbein-variantunderguardedsimulation.ToobtainTheorem5,therefore,itremainstoprovethatanyFOde nablequerythatisinvariantunderguardedsimulationisequivalenttoaunionofFACQs.Wedevotetherestofthissectiontothisproof,whichstartswiththefollowingobservation.Proposition19.If'isaFOformulainvariantunderguardedsimulation(on nitedatabases)then'isequiva-lent(inthe nite)toaUCQ.Proof.Everyhomomorphismgivesrisetoaguardedsim-ulation.Indeed,ifhisahomomorphismfromdb1todb2thatmaps ato bthenitisreadilyveri edthatthesetS:=fhj ag[fhjXjXguardedindb1gisaguardedsimulationfromdb1[fans( a)gtodb2[fans( b)g.Hencedb1; agdb2; b,sincehj a2Smaps ato b.Thensince'isinvariantunderguardedsimulations,itisalsoinvariantunderhomomor-phisms.ByRossman'stheorem(Theorem3),'ishenceequivalenttoaUCQ. Now xthroughouttheremainderofthissectionanFOformula'( x)invariantunderguardedsimulation.ByPropo-sition19wemayassumew.l.o.g.that'isaUCQ.Further-more,wemayassumew.l.o.g.thatthisUCQisminimal.NowassumeforthepurposeofcontradictionthatnounionofFACQsexpresses'.TheninparticularthereexistssomeCQQ( x)in'thatisnotfreelyacyclic,i.e.,Q+iscyclic.FromQwewillconstructpairs(canondb; a)and(unrolldb; b)suchthatcanondb; agunrolldb; band a2'(canondb)but b62'(unrolldb).Thenobviously,'isnotinvariantunderguardedsimulation,yieldingthedesiredcontradiction.Thede nitionofcanondbandunrolldbisasfollows.Thecanonicaldatabase.Thedatabasecanondbissimplywhatisnormallycalledthe\canonicaldatabase"(or\frozen"database)forQinthetheoryofconjunctivequeries.For-mally, xforeveryvariablex2Qauniquedatavaluex2Usuchthatthefunctionfreezemappingx7!xforallx2var(Q)isabijection.Letcanondb:=freeze(body(Q))and a:=freeze( x).Byconstruction,freezeisanembeddingofQincanondb.Therefore,Lemma20. a2Q(canondb)'(canondb).Theunrolleddatabase.SinceQ+iscyclicthehyper-graphH(Q+)containsanontrivialblock.Fixsuchanon-trivialblockB,aswellasadistinguishedhyperedgeF2B.Letfx1;:::;xngbethevariablesmentionedinQ.We xasetU=fx1;:::;xn;x1;:::;xngUofpairwisedis-tinctvalues.Inwhatfollows,wecallxithewhitecoloredversionofxi,andxitheblackcoloredversionofxi.Letvar(B)denotethesetofallvariablesthataremen-tionedinthehyperedgesofblockB.Wede neforeveryVvar(B)thefunctionclrV:var(Q)!Uby:clrV(v)=vv62var(B)orv2VclrV(v)=vv2var(B)andv62V:Intuitively,clrVisafunctionthatmapsvariablestovaluesby\coloring"thevariables.VariablesnotmentionedinBarecoloredwhite,whileavariablevmentionedinBiscoloredwhiteifvisinV,andblackotherwise.De nition21(Covering).LetE,E0,andVbethreesetsofvariables.WesaythatEcoversE0w.r.t.V,denotedEwVE0,ifE\VE0\V.WeabbreviateEwvar(B)E0byEwE0andwriteE=E0andE=VE0todenotethecorrespondingstrictrelations.De nition22(Maximumintersections).LetB=FdenotethesetofallpartialhyperedgesE2BnfFgthathaveamaximalintersectionwithFamongthehyperedgesinBnfFg.Thatis,B=FconsistsofallE2BnfFgforwhichtheredoesnotexistE02BnfFgwithE0=FE.LetM\bethesetofmaximumintersectionsofpartialhyperedgesofBnfFgwithF,M\:=fE\FjE2B=Fg.Notethat,sinceBisnontrivial,thecardinalityofM\isatleast2,andallintersectionsinM\arenonempty.AlsonotethatforanyA2M\,wehaveF)AandhenceFwA.Example23.ConsiderthequeryQ1:ans() R(a;b;d);R(c;a;d);S(b;c;d;e);T(e;f);T(f;g):Itisreadilyveri edthatthesetB1=ffa;b;dg;fb;c;dg;fc;a;dggformsablockofH(Q1+).ConsiderthehyperedgeF=fb;c;dgofthisblock.ThenM\=ffb;dg;fc;dgg,result-ingfromtheintersectionswiththehyperedgesfa;b;dgandfc;a;dgrespectively.Next,considerthequeryQ2:ans() R(a;b;c);R(a;b;d);R(a;c;d);R(b;c;d)fromExample12.Itisreadilyveri edthatthesetB2=ffa;b;cg;fa;b;dg;fa;c;dg;fb;c;dggformsablockofH(Q2+)(cf.,e.g.,Example12).ConsiderthehyperedgeF=fa;b;dgofthisblock.ThenM\isthesetffa;bg;fa;dg;fb;dgg,resultingfromtheintersectionswiththehyperedgesfa;b;cg,fa;c;dg,andfb;c;dgrespectively.Wenowturntotheconstructionofunrolldb.De nition24(Unrolleddatabase).De neFtobethesetoffunctionsthatcontains ofeachfact.NoteinparticularthatitisnotpossibletoembedQ1(resp.Q2)intotheunrolleddatabaseofQ1(resp.Q2).Indeed,toconstructsuchanembedding,wewouldessentiallyhaveto ndanedge-label-preservinggraphhomomorphismofthegraphinFigure4(a)(resp.Figure4(c))tothegraphinFigure4(b)(resp.Figure4(d)),whichisreadilyveri edtobeimpossible.Byde nitionofQ+,H(Q+)containsahyperedgeXwithvar( x)X.Nowobservethat,byconstruction,FcontainsforeveryhyperedgeXofH(Q+)afunctionfwithdomainX.Fixf2Fwithvar( x)dom(f)arbitrarilyandlet b=f( x).Letfreeze�1denotetheinverseoffreeze.Thefollow-inglemmasandpropositionsshowthat(canondb; a)and(unrolldb; b)havebeenconstructedasdesired.Lemma26.ThesetS=fffreeze�1jf2Fgisaguardedsimulationofcanondbinunrolldb.Proofsketch.Itsucestoprovethateachf2Fisapar-tialhomomorphismfrombody(Q)intounrolldbandthatFsatis estheguardedforthcondition.Indeed,sincefreeze�1isanisomorphismfromcanondbtobody(Q),Swillthenbeasetofpartialhomomorphismsfromcanondbintounrolldbthatsatisfytheguardedforthcondition.Establishingthateachf2Fisapartialhomomorphismisstraightforward;establishingtheguardedforthconditionisdonebyatech-nicalcaseanalysis. Proposition27.canondb; agunrolldb; b.Proof.Clearly b=f(freeze�1( a)).Hencecanondb; agunrolldb; bsinceS=fgfreeze�1jg2FgisaguardedsimulationoffreezeinunrolldbbyLemma26,andsinceffreeze�12Smaps a7! b. Proposition28. b62Q(unrolldb).Proofsketch.Theproofisbycontradiction.Theessentialreasoning(glossingovermanyimportantdetails)isasfol-lows.Letans( x)betheheadofQandletunrolldb+denoteunrolldb[fans( b)g.First,weshowthatif b2Q(unrolldb)thentheremustalsoexistanembeddinghofQ+inunrolldb+thatmapsx7!xorx7!x;foreveryx2var(Q).Inparticular,xwillnotbemappedtoacoloredversionofanothervariable.Asaconsequence,wecanestablishthathmapseachatoma2body(Q+)toacopyofainunrolldb+,andnottoacopyofsomeotheratom.Then,sinceFisapartialhyperedgeofH(Q+)thereexistssomeatomainQ+thatcontainsallvariablesinF.Since,bythe rstbullet,hmapsatomsinbody(Q+)totheircopiesinunrolldb+,weknowinparticularthath(a)isacopyofa.Then,sinceacontainsallvariablesinF,thereexistssomeA2M\suchthateveryvariableinA(Fiscoloredwhiteinh(a)andeveryvariableinFnAiscoloredblackinh(a).SinceA2M\thereexistsE12BsuchthatA=E1\F.Moreover,sinceBisablockofH(Q+),AcannotbeanarticulationsetofB.Assuch,theremustexistapathE1;:::;En;F2BthatdoesnotneedtotraverseanynodeinA.Thatis,(Ei\Ei+1)nA6=;for1in,and(En\F)nA6=;.Now,itispossibletoestablishthath(Ei)consistsonlyofwhitecoloredvariables,forall1in.Thisyieldsthedesiredcontradiction.Indeed,since(En\F)nAisnon-emptythereissomevariablexthatisbothinEnandF,butnotinA.Sincex2Ei,hmustmapx7!x.Ontheotherhand,sincex2FnA,wehavealreadyestablishedbeforethathmustmapx7!x. Proposition29. b62'(unrolldb).Crux.Wealreadyknowthat b62Q(unrolldb)byProposi-tion28.Suppose,forthepurposeofcontradiction,thatthereissomeotherCQQ02'suchthat b2Q0(unrolldb).Inparticular,thereexistsanembeddinghfromQ0intounrolldbsuchthath( x)= b.Now,considerthefunctiondecopy:im(h)!var(Q)suchthatdecopy(x)=xforeveryx2im(h)decopy(x)=xforeveryx2im(h):Observethatbyde nitionofunrolldb,Qcontainsanatomdecopy(s)foreachfacts2unrolldbbuiltovertheimageofh.Hence(decopyh)isahomomorphismofbody(Q0)intobody(Q).Furthermore,decopy( b)= xsince b=f(x)forsomef2F.ByChandraandMerlin'sclassicalresult[6],thisimpliesthatQiscontainedinQ0,contradictingthefactthat'isminimal. Sincecanondb; agunrolldb; band a2'(canondb)but b62'(unrolldb)wehaveourdesiredcontradiction:'isnotinvariantunderguardedsimulation.This nishestheproofofTheorem5.4.GUARDEDVSFACTSIMULATIONWenextpresentanalternatede nitionforguardedsimu-lation,calledfactsimulation,andshowthatfactsimulationnaturallyyieldsapproximationsthataretightlylinkedtoin-varianceoffreelyacyclicconjunctivequerieswhosejointreeisofaspeci cboundedheight.4.1FactsimulationDe nition30.Afactsimulationofdatabasedb1indata-basedb2isanonemptybinaryrelationFdb1db2be-tweenthefactsofdb1anddb2suchthatforallfactss2db1andt2db2withsFt:sandtcarrythesamerelationsymbol,i.e.,rel(s)=rel(t);foralls02db1thereexistst02db2witheqtp(s;s0)eqtp(t;t0)ands0Ft0.Example31.Toillustrate,thedottedlinesinFigure2showafactsimulationFofdatabasedb1indb2.Notethatfactsimulationisnecessarilytotalondb1(i.e.,everyfactofdb1occursinF). Now,letdb1anddb2betwodatabases,andletsandtbefacts.Wesaythat(db1;s)isfactsimulatedby(db2;t),denoteddb1;sfdb2;t,ifthereexistsafactsimulationFofdb1[fsgindb2[ftgwithsFt.Moreoverif aand baretuplesofdatavalues,thendb1; afdb2; bifdb1;ans( a)fdb2;ans( b)withansarelationsymbolofthesamearityas aand bthatdoesnotoccurindb1ordb2.Werequirethefollowingnotionstoestablishthatfactsimulationisequivalenttoguardedsimulation.Letstdenote,foreverys=s(a1;:::;ak)andt=t(b1;:::;bl),therelationf(ai;bi)j1imin(k;l)g.Whenwearesurethatstisafunction,weusecommonnotationforfunctions,suchas(st)(a)todenotetheuniquevalueassociatedtoabythefunctionst.Nowde ne,foraguardedsimulationS,F[S]:=f(s;f(s))jf:X!Y2S;s2db1;andval(s)Xg:Alsode ne,forafactsimulationF,S[F]:=fstj(s;t)2Fg:Forexample,therelation(1;Amy;Lex)(c;Ned;Ned)isanelementofS[F],forfactsimulationFofExample31.Thefollowingpropositionestablishesthecorrespondencebetweenguardedsimulationandfactsimulation.Proposition32.1.IfSisaguardedsimulationofdb1indb2thenF[S]isafactsimulationofdb1indb2.2.IfFisafactsimulationofdb1indb2thenS[F]isaguardedsimulationofdb1indb2.Itfollowsthatfactsimulationprovidesanalternativedef-initionforguardedsimulation,inthefollowingsense.Theorem33.Fordatabasesdb1anddb2andtuples aand bitholdsthatdb1; agdb2; bi db1; afdb2; b.4.2ApproximatefactsimulationInapplicationsofclassicalsimulationindatamanagementitisknownthat,iftherearefewnodesingraphGthataresimilar,thenthecorrespondingstructuralindexofGmaybeofthesamesizeasGitself,andhencebetoolargetoactasasuccinctsummaryofthestructureofG[18].Insuchsituations,ithasbeenproposedtoapproximatesimulationsandtogroupnodesintheindexwithrespecttotheseap-proximationsinsteadofwithrespecttofullsimulation[18].Towardsasuitableapproximationofguardedsimulation,weintroducethefollowingversionoffactsimulation.De nition34(Approximatefactsimulation).Letdb1anddb2betwodatabases.Adepth-kapproximationoffactsim-ulationofdb1indb2,orfactk-simulationforshort,isase-quenceFkFk�1F0ofbinaryrelationssuchthatF0consistsofallpairs(s;t)2db1db2withrel(s)=rel(t)andeqtp(s;s)eqtp(t;t);andthefollowingpropertyholdsforevery1jkandallsandtwithsFjt.Foreverys02db1thereexistst0indb2suchthateqtp(s;s0)eqtp(t;t0)ands0Fj�1t0(FactForth).Wesaythat(db1;s)isk-factsimulatedby(db2;t),de-noteddb1;skfdb2;t,ifthereexistsafactk-simulationFkFk�1F0fromdb1[fsgtodb2[ftgwithsFkt.Thenotionofk-factsimilaritybetween(db1; a)and(db2; b)with aand btuplesisnowde nedintheobviousway.Observethatdb1;sfdb2;ti db1;skfdb2;tforeveryk0.Wenowlinkapproximateguardedsimulationtoindistin-guishabilitybyFACQsofboundedheight.Here,theheightofaFACQisde nedasfollows.Recallthatingraphthe-ory,thedistancebetweentwoconnectednodesuandvinanundirectedgraphGisthelengthofashortestpathbetweenuandv.TheeccentricityofuinG,denotedecc(u;G)isthemaximumdistanceofutoanyothernodetowhichitisconnected.Example35.Consider,forinstance,thejointreeTofFigure3.TheeccentricityofR(a;b)inTis2whiletheeccentricityofR(g;h)inTis3.De nition36.LetAbeasetofatoms.WhenAisacyclic,theeccentricityofatoma2A,denotedecc(a;A)istheminimumeccentricityofaamongalljointreesTforA.TheheightofaFACQQistheeccentricityofhead(Q)inbody(Q+).Inotherwords,theheightofQistheminimumheightofanyjointreeTforbody(Q+),whenconsideredasbeingrootedathead(Q).Example37.ContinuingExample35,queryQofExample9hasaheightof3.Proposition38.Letk0beanaturalnumber.Thefol-lowingareequivalent.(1)db1; akfdb2; b(2)ForallFACQsQofheightk,if a2Q(db1)then b2Q(db2).NotethatProposition38impliesProposition18(yettheconverseisnottrue).Closingremark.Weclosethissectionwiththefollowingimportantremark.Itisobviouslypossibletode neapprox-imationsofguardedsimulationinananalogouswayasap-proximationsoffactsimulation:adepth-kapproximationisasequenceSkSk�1S0ofpartialhomomor-phismssuchthateachSisatis estheguardedforthpropertytoSi�1,fori1.Whilefullguardedsimulationcoincideswithfactsimulation(cf.Theorem33),theirapproximationsdonot.Inparticular,0gisdistinctfrom0f.Indeed,con-siderdb1=fr(a;b);r(b;a)ganddb2=fr(1;2)g.Notethatthereisnopartialhomomorphismfromdb1todb2withdo-mainfa;bg.Therefore,db1isnotguarded0-simulatedbydb2.Yet,db1db2isafact0-simulationofdb1indb2.5.GUARDEDSTRUCTURALINDEXINGRecallfromtheIntroductionthatingraphdatamanage-ment,astructuralindexisacompactrepresentationofadatagraph.Typically,thiscompactrepresentationisob-tainedbygroupingthenodesintheinputgraphthataresimilarorbisimilar.Structuralcharacterizationsofqueryin-variancethenenableecientretrievaloftherelevantnodesofthegraphforvariousgraphquerylanguages.Inthissectionweanalogouslyde neguardedstructuralindexesascompactrepresentationsofrelationaldataob-tainedbygroupingfactsaccordingtoguardedsimilarity. [6]A.K.ChandraandP.M.Merlin.Optimalimplementationofconjunctivequeriesinrelationaldatabases.InSTOC1977,pages77{90.ACM,1977.[7]H.ChenandV.Dalmau.Beyondhypertreewidth:Decompositionmethodswithoutdecompositions.InCP,volume3709,pages167{181.Springer,2005.[8]R.Fagin.Degreesofacyclicityforhypergraphsandrelationaldatabaseschemes.J.ACM,30(3):514{550,1983.[9]W.Fan.Graphpatternmatchingrevisedforsocialnetworkanalysis.InICDT2012,pages8{21.ACM,2012.[10]W.Fan,J.Li,X.Wang,andY.Wu.Querypreservinggraphcompression.InSIGMOD2012,pages157{168.ACM,2012.[11]G.H.L.Fletcher,D.VanGucht,Y.Wu,M.Gyssens,S.Brenes,andJ.Paredaens.AmethodologyforcouplingfragmentsofXPathwithstructuralindexesforXMLdocuments.Inf.Syst.,34(7):657{670,2009.[12]J.Flum,M.Frick,andM.Grohe.Queryevaluationviatree-decompositions.J.ACM,49(6):716{752,2002.[13]G.Gottlob,N.Leone,andF.Scarcello.Robbers,marshals,andguards:gametheoreticandlogicalcharacterizationsofhypertreewidth.JCSS,66(4):775{808,2003.[14]G.GouandR.Chirkova.EcientlyqueryinglargeXMLdatarepositories:asurvey.TKDE,19(10):1381{1403,2007.[15]E.Gradel,C.Hirsch,andM.Otto.Backandforthbetweenguardedandmodallogics.TOCL,3(3):418{463,2002.[16]M.Gyssens,J.Paredaens,D.V.Gucht,andG.H.L.Fletcher.StructuralcharacterizationsofthesemanticsofXPathasnavigationtoolonadocument.InPODS2006,pages318{327.ACM,2006.[17]C.Hirsch.Guardedlogics:algorithmsandbisimulation.PhDthesis,TUAachen,2002.[18]R.Kaushik,P.Shenoy,P.Bohannon,andE.Gudes.Exploitinglocalsimilarityforindexingpathsingraph-structureddata.InICDE2002,pages129{140.IEEE,2002.[19]P.G.KolaitisandM.Y.Vardi.Conjunctive-querycontainmentandconstraintsatisfaction.InPODS1998,pages205{213.ACM,1998.[20]D.Leinders,M.Marx,J.Tyszkiewicz,andJ.V.denBussche.Thesemijoinalgebraandtheguardedfragment.JOLLI,14(3):331{343,2005.[21]A.Matono,T.Amagasa,M.Yoshikawa,andS.Uemura.Apath-basedrelationalRDFdatabase.InADC2005,pages95{103.AustralianComputerSociety,2005.[22]T.MiloandD.Suciu.Indexstructuresforpathexpressions.InICDT1999,pages277{295.Springer,1999.[23]M.Otto.Highlyacyclicgroups,hypergraphcovers,andtheguardedfragment.J.ACM,59(1):5:1{5:40,2012.[24]F.Picalausa.Guardedstructuralindexes:theoryandapplicationtorelationalRDFdatabases.PhDthesis,UniversiteLibredeBruxelles,2013.[25]F.Picalausa,Y.Luo,G.H.L.Fletcher,J.Hidders,andS.Vansummeren.Astructuralapproachtoindexingtriples.InESWC,volume7295,pages406{421.Springer,2012.[26]E.Prud'hommeauxandA.Seaborne.SPARQLquerylanguageforRDF.Technicalreport,W3CRecommendation,2008.[27]P.Ramanan.CoveringindexesforXMLqueries:bisimulation�simulation=negation.InVLDB2003,pages165{176,2003.[28]B.Rossman.Homomorphismpreservationtheorems.J.ACM,55(3):15:1{15:53,2008.[29]D.Sangiorgi.Introductiontobisimulationandcoinduction.CambridgeUniversityPress,2012.[30]T.Tran,G.Ladwig,andS.Rudolph.ManagingstructuredandsemistructuredRDFdatausingstructureindexes.TKDE,25(9):2076{2089,2013.[31]O.Udrea,A.Pugliese,andV.S.Subrahmanian.GRIN:agraphbasedRDFindex.InAAAI2007,pages1465{1470,2007.[32]M.Yannakakis.Algorithmsforacyclicdatabaseschemes.InVLDB1981,pages82{94.IEEE,1981.APPENDIXA.GUARDEDBISIMULATIONThede nitionofguardedbisimulationduetoAndrekaetal.[2]isrecalledhereforcompleteness.GiventwosetsAandBoffactsandatoms,afunctionf:X!YisapartialisomorphismfromAtoBifitisbijectiveandf(AjX)=BjY.De nition44(Guardedbisimulation).Letdb1anddb2bedatabases.Aguardedbisimulationfromdb1todb2isanonemptysetIof nitepartialisomorphismsfromdb1todb2suchthatthefollowingforthandbackconditionsaresatis ed.Foreveryf:X!Y2IandforeverysetX0guardedindb1,thereexistsapartialisomorphismg:X0!Y02IsuchthatgandfagreeonX\X0.(GuardedBi-simulationForth).Foreveryf:X!Y2IandforeverysetY0guardedindb2,thereexistsapartialisomorphismg:X0!Y02Isuchthatg�1andf�1agreeonY\Y0.(GuardedBisimulationBack)

By: pamella-moone
Views: 34
Type: Public

Figure1:Graphsaboutacademicrelationsbetweenprofessors,phdstudents,andb - Description


GraphG prof 1 prof 2 prof 3 prof 4 phd 5 stud 6 stud 7 adv adv adv adv adv sup sup GraphI 1 prof 23 prof 4 prof 5 phd 67 stud adv adv adv sup 1Fortheformaldevelopmentinthispaperitwillbeconvenientt ID: 356178 Download Pdf

Related Documents