com andreaskaltenbrunneryanavolkovich barcelonamediaorg ABSTRACT Core decomposition has proven to be a useful primitive for a wide range of graph analyses One of its most appealing features is that unlike other notions of dense subgraphs i can be com ID: 27921 Download Pdf

221K - views

Published bycheryl-pisano

com andreaskaltenbrunneryanavolkovich barcelonamediaorg ABSTRACT Core decomposition has proven to be a useful primitive for a wide range of graph analyses One of its most appealing features is that unlike other notions of dense subgraphs i can be com

Download Pdf

Download Pdf - The PPT/PDF document "Core Decomposition of Uncertain Graphs F..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Page 1

Core Decomposition of Uncertain Graphs Francesco Bonchi Francesco Gullo Andreas Kaltenbrunner Yana Volkovich Yahoo Labs, Spain Barcelona Media - Innovation Centre, Spain bonchi,gullo @yahoo-inc.com andreas.kaltenbrunner,yana.volkovich @barcelonamedia.org ABSTRACT Core decomposition has proven to be a useful primitive for a wide range of graph analyses. One of its most appealing features is that, unlike other notions of dense subgraphs, i can be computed linearly in the size of the input graph. In this paper we provide an analogous tool for uncertain graphs, i.e., graphs whose

edges are assigned a probability of existence. The fact that core decomposition can be com- puted eﬃciently in deterministic graphs does not guarantee eﬃciencyinuncertaingraphs, where eventhesimplest graph operations may become computationally intensive. Here we show that core decomposition of uncertain graphs can be carried out eﬃciently as well. We extensively evaluate our deﬁnitions and methods on a number of real-world datasets and applications, such as inﬂuence maximization and task-driven team formation Categories and Subject Descriptors H.2.8 [

Database Management ]: [Database Applications- Data Mining]; G.2.2 [ Discrete Mathematics ]: [Graph Theory-Graph Algorithms] Keywords uncertain graphs; dense subgraph; core decomposition 1. INTRODUCTION Uncertain graphs , i.e., graphs whose edges are assigned a probability of existence (see an example in Figure 1), arise in several emerging applications [24, 14, 15]. For instance in biological networks and protein-interaction networks ver- tices represent genes and/or proteins, while edges represe nt interactions among them. Since the interactions are derive through noisy and error-prone

laboratory experiments, the existence of each edge is uncertain [4, 26, 24]. In social networks uncertainty arises for various reasons [1]. Edge probabilities may represent the outcome of a link-prediction task [20] or the inﬂuence of one person on another, like in Permission to make digital or hard copies of all or part of thi s work for personal or classroom use is granted without fee provided that copies ar e not made or distributed for proﬁt or commercial advantage and that copies bear this n otice and the full cita- tion on the ﬁrst page. Copyrights for components of

this work owned by others than ACM must be honored. Abstracting with credit is permitted. T o copy otherwise, or re- publish, to post on servers or to redistribute to lists, requ ires prior speciﬁc permission and/or a fee. Request permissions from permissions@acm.or g. KDD’14, August 24–27, 2014, New York, NY, USA. Copyright 2014 ACM 978-1-4503-2956-9/14/08 ...$15.00. http://dx.doi.org/10.1145/2623330.2623655. Figure 1: An uncertain graph and its k, core de- composition for = 0 04 . Vertex 1 has core number 1, vertices 2 and 7 have core number 2, and vertices 3, 4, 5 and 6 have core

number 3. viral marketing [11]. Uncertainty can also be intentionally injected for privacy purposes [7]. Finding dense subgraphs is a fundamental primitive in many graph-analysis tasks [21]. There exist many diﬀer- ent deﬁnitions of what a dense subgraph is, e.g., cliques, n-cliques, n-clans, k-plexes, f-groups, n-clubs, lambda s ets most of which are NP -hard to compute or at least quadratic in the size of the input graph. In this respect, the notion of core decomposition is particularly appealing as ( ) it can be computed in linear time [5], and ( ii ) it is related to many

other deﬁnitions of a dense subgraph (as discussed later). The core of a graph is deﬁned as a maximal subgraph in which every vertex is connected to at least other ver- tices within that subgraph. The set of all -cores of a graph forms the core decomposition of [25]. The fact that core decomposition can be performed in linear time in de- terministic graphs does not guarantee eﬃciency in uncertai graphs. Indeed, in such graphs even the simplest tasks may become hard. As an example, consider the two-terminal- reachability problem, which asks whether two query vertices are

connected. In a deterministic graph the solution to this problem requires a simple scan of the graph. Instead, in un- certain graphs, computing the probability that two vertice are connected is a # -complete problem [28]. Thus, a major question we aim at answering in this pa- per is: can the core decomposition of an uncertain graph be computed eﬃciently? Related work and applications. Existing research on uncertain graphs has mainly focused on querying [15, 33, 24, 31] and mining, particularly on extracting frequent sub graphs [34] or subgraphs that are connected with high prob- ability

[14], and clustering [22, 18]. Core decomposition of deterministic graphs has been ex- ploited toanalyse thenatureofanetworkanddiscover dense

Page 2

substructures [2, 17]. It has been applied in many diﬀerent domains, such as bioinformatics [30], software engineerin [32], and social networks [17]. Core decomposition has been also used to speed-up the computation of more complex def- initions of a dense subgraph. For instance, it serves to ﬁnd maximal cliques more eﬃciently [10], and it is at the ba- sis of linear-time approximation algorithms for the densest-

subgraph problem [19] and the densest at-least- -subgraph problem [3]. It is also used to approximate betweenness cen- trality [13]. A core-decomposition tool for uncertain graphs would thus provide a natural extension of all these appli- cations to the context of uncertain graphs. Other direct applications of core decomposition of uncertain graphs in- clude inﬂuence maximization and task-driven team forma- tion , which we showcase in Section 6 and 7, respectively. In inﬂuence maximization [16], the probability of an edge u,v ) represents the inﬂuence that exerts on ,

i.e., the likelihood that some action/information propagates from to . The greedy algorithm [12] traditionally used to ﬁnd the users that maximize the information spread over the network requires a number of Monte Carlo simulations that largely limit its eﬃciency. In Section 6 we show how our probabilistic core-decomposition tool can be used to speed up the inﬂuence-maximization process. In task-driven team formation , the input is a collabora- tion graph = ( V,E, ), where vertices are individuals and edges exhibit a probabilistic topic model representing the topic(s) of

past collaborations. A query is a pair T,Q where is a set of terms describing a new task, and is a set of vertices. The goal is to ﬁnd an answer set of vertices , such that is a good team for the task described by . The given query task , along with the topic model , induces a (single) probability value for each edge ( u,v , such that u,v ) represents the like- lihood that and collaborate on . This gives rise to an uncertain graph to which one can naturally apply core decomposition in order to ﬁnd the desired team (Section 7). Challenges and contributions. In this paper we study

the problem of core decomposition of uncertain graphs, which, to the best of our knowledge, has never been con- sidered so far. We introduce (Section 2) the notion of ( k, core as a maximal subgraph whose vertices have at least neigbours in that subgraph with probability no less than here [0 1] is a threshold deﬁning the desired level of certainty of the output cores. Let the -degree of a vertex be the maximum degree such that the probability for to have that degree is no less than . We design an algorithm for ﬁnding a ( k, )- core decomposition that iteratively removes the

vertex hav ing the smallest -degree and prove its correctness (Section 3). The proposed algorithm resembles the traditional al- gorithm for computing the core decomposition of a deter- ministic graph [5]; however, as usual when the attention is shifted from the deterministic context to uncertain graphs the adaptation of that algorithm is non-trivial. A major challenge is the capability of handling large graphs. Two main critical steps aﬀect our algorithm: computing initial -degrees and updating -degrees whenever a vertex is removed from the graph. While the corresponding steps in the

deterministic case (i.e., computing and updating the degree of a vertex) are straightforward, performing them eﬃciently in uncertain graphs needs a great deal of atten- tion; approaching them na ıvely, indeed, may even lead to intractable (exponential) time complexity. We show how to overcome the exponential-time complexity by devising a novel yet eﬃcient dynamic-programming method to com- pute -degrees from scratch. We also exploit the same intu- ition underlying the dynamic-programming algorithm so as to eﬃciently update -degrees after a vertex removal. As a result,

we show that computing a ( k, )-core decomposition takes ∆) time, where is the number of edges in the input uncertain graph and ∆ is the maximum -degree. As a further contribution, we devise a novel method to improve the eﬃciency of the proposed ( k, )-core- decomposition algorithm (Section 4). The idea is to exploit a fast-to-compute lower bound on the -degree that can be used as a placeholder during the ﬁrst iterations while being replaced with the actual -degree only when the vertex at hand is selected and the graph has become smaller. Finally, we report

experiments on eﬃciency and numeri- cal stability on real-world graphs (Section 5) and show our proposal at work in two real-life applications (Sections 6 7). 2. PROBLEM DEFINITION Cores of deterministic graphs. Before focusing on un- certain graphs, we brieﬂy recall the problem of computing cores of deterministic graphs. Let = ( V,E ) be an undi- rected graph, where is a set of vertices and is a set of edges. For every vertex , let deg and deg ) denote the degree of in and in a sub- graph of , respectively. Also, given a set of vertices , let denote the subset of edges induced

by i.e., u,v C,v Definition 1 ( -core). The -core (or core of order ) of is a maximal subgraph = ( C,E such that deg . The core number (or core index ) of a vertex , denoted , is the highest order of a core that contains . The set of all -cores of , for all , is the core decomposition of The notion of -core is strictly related to the notion of shell , that is the subgraph induced by the set of all vertices having core number equal to . Note that neither -cores nor -shellsare necessarilyconnectedsubgraphs. Also, while these two notions usually refer to subgraphs of the input graph,

intheremainderwe slightly abuseofnotation andde- note by -core (or -shell) both the subgraph = ( C,E itself and the vertex set that induces All -shells of a graph form a partition of the vertex set , while all -cores are nested into each other: = max )). As a result, the core decomposition of is unique and fully determined by the core number ) of all vertices in : the -core of simply corresponds to (the subgraph induced by) the set of all vertices having core number Batagelj and Zaverˇsnik [5] show how to compute the core decomposition of a graph in linear time (Algorithm 1). The

algorithm iteratively removes the smallest-degree ver tex and sets the core number of the removed vertex accord- ingly. Vertices are thus required to be ordered based on their degree. Deﬁning the initial vertex ordering and keep- ing vertices ordered during the execution of the algorithm take )and (1)time, respectively. Theideaistoemploy an -dimensional vector whose single cells ] store all vertices having degree equal to in the current graph. The overall time complexity of the algorithm is hence ).

Page 3

Algorithm 1 cores Input: A graph = ( V,E ). Output: An -dimensional

vector containing the core number of each 1: ,..., 2: for all do 3: deg 4: deg )] deg )] ∪{ 5: end for 6: for all = 0 ,...,n do 7: while D do 8: pick and remove a vertex from 9: 10: for all : ( u,v E, > k do 11: move from ]] to 1] 12: 13: end for 14: remove from 15: end while 16: end for Cores of uncertain graphs. Let = ( V,E,p ) be an uncertain graph, where (0 1] is a function that assigns a probability of existence to each edge. For the sake of brevity, we hereinafter denote the probabilities with . For every vertex , let u,v denote the set of edges incident to , and its size. To

deﬁneour notion of core decomposition of an uncertain graph, we resort to the well-known possible-world semantics which has been recognized as a sound principle to deﬁne queries on probabilistic data [9]. Broadly, such a princi- ple interprets the probabilistic data as a set of determinis tic instantiations, called possible worlds , each of which associ- ated with its probability of being observed. In the context of uncertain graphs, the bulk of the literature assumes the probabilities of existence of the edges independent from on another [24, 14, 15]. Under this assumption, the

possible- world semantics interprets an uncertain graph with edges as a set of 2 possible deterministic graphs (worlds), each of which containing a subset of the edges in . More precisely, an uncertain graph = ( V,E,p ) yields a set of possible graphs = ( V,E , and the probability of observing a possible graph = ( V,E v G is: Pr( ) = (1 (1) According to the possible-world semantics, answering a probabilistic query means to derive a probability distri- bution over all possible deterministic answers to the query , where the probability of an answer corresponds to the sum of the probabilities of

all worlds where is the an- swer to . As this answer distribution is usually too large and sparse to be explicitly interpreted or computed/stored the general turnaround adopted is to assign a score to each domain object based on its probability of being part of an answer to the probabilistic query , and return the objects having highest scores as a ﬁnal answer to [9]. We cast such a general framework to our context by deﬁn- ing the score of each vertex to be part of a -core as We consider undirected graphs for the sake of presentation a nd con- sistency with the literature on

core decomposition. Howeve r, all our deﬁnitions/methodsapply to directedgraphs too, by simply replacing the notion of degree with either in-degree or out-degree. In deed, in Section 6, where we focus on inﬂuence maximization , the graph is directed and we deﬁne probabilistic cores based on out-degr ee. the probability that has degree no less than in , i.e., Pr[ deg ]. Then, we employ a classic threshold-based approach to decide which vertices should actually form a core based on their scores. As a result, the notion of proba- bilistic k, -core we come up with is the

following: Definition 2 (Probabilistic ( )-cores). Given an uncertain graph = ( V,E,p , and a threshold [0 1] the probabilistic ( k, )-core of is a maximal subgraph = ( C,E C,p such that the probability that each vertex has degree no less than in is greater than or equal to , i.e., : Pr[ deg The notion of -core number immediately follows from the deﬁnition of ( )-core and is deﬁned as the highest order of a ( )-core containing The problem we address in this work is the following. Problem 1 (ProbCores). Given an uncertain graph and a probability threshold [0 1] , ﬁnd the k,

core decomposition of , that is the set of all ( )-cores of Our deﬁnition of core decomposition of an uncertain graph, has the desirable feature of being unique, as formall shown in the next theorem. Theorem 1. Given an uncertain graph and a probabil- ity threshold , the k, -core decomposition of is unique. Proof. We prove the theorem by showing that cannot have more than one ( k, )-core, for all . Assume that has two( k, )-coresanddenotethemby and , respectively. According to Deﬁnition 2, it holds that is a maximal subgraph of such that ∈ H : Pr[ deg , and the same happens

for . Combining the ( k, )- core conditions of and leads to the subgraph to satisfy the ( k, )-core condition too, as ∈ H Pr[ deg ∈ H : Pr[ deg clearly implies that ∈ H ∪H : Pr[ deg ∪H . This means that neither nor are maximal, thus contradicting the hypothesis. The theorem follows. An example of ( k, )- core decomposition of an uncertain graph is provided in Figure 1. 3. COMPUTING PROBABILISTIC CORES For a vertex of the input uncertain graph , the proba- bility Pr[ deg ] can be expressed as: Pr[ deg ] = vG Pr( (2) where is the set of all possible graphs drawn

from where has degree , i.e., v G | deg It is easy to see that such a probability value is mono- tonically non-increasing with , i.e., Pr[ deg 0] Pr[ deg 1] ... Pr[ deg ]. Then, given a threshold , for every vertex in the graph, there exists a value [0 ..d ] such that Pr[ deg , for all and Pr[ deg < , for all h > . We call this value the -degree of vertex v. Definition 3 ( -degree). Given an uncertain graph = ( V,E,p and a threshold [0 1] , the -degree deg of a vertex is deﬁned as deg ) = max [0 ..d Pr[ deg Let also deg be the -degree of in a subgraph

Page 4

Algorithm 2

)-cores Input: An uncertain graph = ( V,E,p ), a threshold [0 1]. Output: An -dimensional vector containing the -core num- ber of each 1: compute deg ) for all 2: ,..., 3: for all do 4: deg 5: deg )] deg )] ∪{ 6: end for 7: for all = 0 ,...,n do 8: while D do 9: pick and remove a vertex from 10: 11: for all : ( u,v E, > k do 12: recompute deg 13: move from ]] to deg )] 14: deg 15: end for 16: remove from 17: end while 18: end for Intuitively, the notion of -degree gives an idea of the de- gree of a vertex given a speciﬁc threshold . We exploit the notion -degree to adapt the cores

algorithm used for de- terministic graphs to the context of uncertain graphs. The proposed algorithm, called )-cores (Algorithm 2), fol- lows the same scheme as in the deterministic case with the main diﬀerence of the use of the -degree. The soundness of the proposed algorithm is shown in the following theorem. Theorem 2. Given an uncertain graph and a threshold , Algorithm 2 provides the ( )-core decomposition of Proof. For every and every subgraph C,E ,p ) of , it is easy to see that deg deg ), as the -degree computation in relies on more successful events than those encountered in .

This implies that deg ) is a monotonic vertex property function [5], where, for every and , a vertex property func- tion on is a function v,C ) : , and the mono- tonicity property holds if ,C implies that v,C v,C ). The proof is completed by the result by Batagelj and Zaverˇsnik [5], who show that, for a monotonic vertex property function v,C ), the algo- rithm that repeatedly removes a vertex with the smallest value gives the desired core decomposition. Instead of computing/updating standard degrees, in the probabilistic case onethusneedsto( )computeall -degrees at the beginning of the

algorithm (Line 1), and ( ii ) update the -degree of a neighbour of the currently being processed vertex (Line 12). While computing/updating degrees in the deterministic case is straightforward, for the -degrees such steps are non-trivial, as shown next. Computing initial -degrees To show how to derive -degrees from scratch, we ﬁrst fo- cus on the computation of Pr[ deg ] for a vertex , and That work states that the time complexity of such an algorith m is max D, log ), where is the maximum degree. But this is a general result for vertex property functions that can be updated linearly

in the degree of a vertex. For any speciﬁc vertex pro perty function, such as our -degree, the complexity can be higher or lower. note that Pr[ deg ] is equal to the sum of the probabil- ities Pr[ deg ) = ] either for all k..d ] or, equivalently, for all [0 ..k 1]: Pr[ deg ] = Pr[ deg )= ] =1 =0 Pr[ deg )= (3) Furthermore, we observe that each individual Pr[ deg ) = ] can in turn be computed considering all subsets of edges of size and summing over the probabilities that all and only the edges in these various exist: Pr[deg( ) = ] = (1 (4) The sum in the above formula is over all

subsets ; thus, a na ıve computation would lead to a time complexity exponential in the size of . We can however manage this by rearranging the formula as Pr[ deg ) = ] = i,N where (1 ), i,N ) = and . This rearrangement allows us to exploit the next recursive formula, which has originally been introduc ed in [8] for sampling from a ﬁnite population with unequal probabilities and without replacement: i,N ) = =1 1) +1 j,N j,N (5) where j,N ) = ( . Now, it is easy to see that Equation(5)allows for computingall individualPr[deg( ) = ] values, for all [0 ..k 1] (which, according to

Equa- tion (3), are needed to derive the desired Pr[deg( ]) in polynomial time, precisely in kd ) time. A dynamic-programming method. Although the above way of computing Pr[deg( ) = ] solves a seemingly exponential-time problem, it still has weaknesses due to th recursive formula in Equation (5). Firstly, as the formula i n- volvesbothproductsandsumsof valuesthatcanbeeither very large (when 1) or very small (when 0), it may incur numerical-stability issues, which might make the computation of Pr[deg( ) = ] problematic when executed by a computer. Secondly, using such a formula, the -degree of

a vertex when one of its incident edges is removed can- not be recomputed faster than a from-scratch computation. For the above reasons, we propose here an alternative way of computing Pr[deg( ) = ]. Consider a vertex and an edge incident to , and let \{ denote the subgraph of where is not present. The methodis based on thefollowing key observation: the event has degree in ” implies that either exists and has degree 1 in \{ ” or does not exist and has degree in \{ ”. This way, the probability for to have degree in the original graph can be computed as a linear combination of the probabilities

that has degree either 1 or in the subgraph \{ The above reasoning can be generalised to every subgraph of and formally expressed in the next theorem (for which we omit a formal proof due to limited space). Theorem 3. Given an uncertain graph = ( V,E,p and a vertex , let ,...,e be the set of all edges incident to ordered in some way. Also, given a subset

Page 5

, let deg denote the degree of in the subgraph = ( V,E ,p . For all [1 ..d 1] it holds that: Pr[ deg |{ ,...,e +1 ) = ] = +1 Pr[ deg |{ ,...,e ) = 1]+ (6) +(1 +1 )Pr[ deg |{ ,...,e Theorem 3 provides a principled way to

eﬃciently com- pute Pr[ deg ) = ] based on the dynamic-programming paradigm. Particularly, we take an arbitrary ordering of the edges incident to the vertex being currently under consideration and deﬁne a proper recursive formula that allows for computing partial solutions relying only on the ﬁrst edges. The ultimate score (i.e., the actual value of Pr[ deg ) = ]) is available only when all the edges have been considered; this makes the overall computation inde- pendent from the speciﬁc ordering of the edges. Formally, let h,j ) = Pr[ deg |{ ,...,e ) = ], for all [0

..d ], ..i ]. We set the following base cases: (0 0) = 1 h, 1) = 0 for all [0 ..d h,j ) = 0 for all [0 ..d ,j +1 ..i while we exploit Equation (6) to compute the generic dynamic-programming recursive step as h,j ) = ,j 1)+(1 ,j for all [1 ..d ], [0 ..h ]. We need to compute all values so as to get to ,i ), which corresponds to the desired probability Pr[ deg ) = ]. This requires id time. Moreover, one can notice that the values of the entire set ,j =0 (not just ,i )) correspond to the actual probability values Pr[ deg ) = =0 . Thus, employing the proposed dynamic-programmingmethod and setting

1, the probability values Pr[ deg ) = 0] ,..., Pr[ deg ) = 1], which are required for computing Pr[ deg accordingtoEquation(3), can all bederivedin kd )time. Thus, the dynamic-programming method just described has the same complexity as the method based on Equa- tion (4). But, at the same time, it ( ) alleviates the numerical-stability shortcomings, as the numbers involve into Equation (6) are all probabilities 1 (unlike the num- bers which range from [0 )), and ( ii ) can easily be employed for eﬃciently updating -degrees when an edge is removed from the graph, as described next. Time

complexity. The -degree of a vertex can be com- puted incrementally. We start with = 0 and Pr[ deg 0] = 1. Then, we increase one by one and compute Pr[ deg ] as Pr[ deg 1] Pr[ deg ) = 1]. We stop once Pr[ deg < , and we set deg ) = 1. This way, we need to compute probabilities Pr[ deg ) = only for = 0 ,..., deg ) + 1, which, according to the ﬁndings reported above, leads to a time complexity of deg ). Clearly, in the worst case, such a com- plexity equals ), but we expect in practice deg )rea- sonably lower than , especially for those vertices having very large and/or large enough

values. Computing all -degrees hence takes deg ). Denoting by ∆ the maximum -degree over all vertices in the graph, i.e., ∆ = max deg ), the complexity can be more compactly expressed as ∆) = ∆). Updating -degrees We now consider the case where the -degree of a vertex needs to be updated because an edge incident to has been removed. We recall that this is the other crucial step of our )-cores algorithm (Algorithm 2, Line 12). As anticipated, we can exploit Theorem 3 to avoid from-scratch recomputations. The problem can be re- duced to (eﬃciently) updating the

probabilities Pr[ deg ) = 0] ,..., Pr[ deg ) = deg )], whose earlier values are avail- able because of the computation of the earlier -degree. Once all these new probabilities are computed, the new -degree can be derived by the same incremental process described in the previous paragraph“Time complexity”. Let denote the edge to be removed and let Pr[ deg | ) = ], for all [0 ,..., deg )], be a short- hand for the new probabilities Pr[ deg \{ ) = ] to be computed. Such Pr[ deg | ) = ] values can be derived by rearranging Equation (6) as follows: Pr[ deg | )= ]= Pr[ deg )= Pr[ deg | )= 1] (7)

This way, one can set Pr[ deg | ) = 0] = Pr[ deg ) = 0], and apply Equation (7) to compute the remaining Pr[ deg | ) = ] values, for all [1 .. deg )]. Each prob- abilityPr[ deg | ) = ] takesconstanttime. Computingall the new probabilities, and, hence, updating the -degree of , globally takes deg )) time, thus improving upon the deg ) time of a from-scratch recomputation. Overall running time of )-cores We analyse now the overall time complexity of our )- cores algorithm. The initialisation phase (Lines 1–6) is dominated by the computation of the initial -degree for all vertices, which takes

∆) time (∆ is the maximum -degree over all vertices). In the main cycle (Lines 7 18), like the deterministic case, each vertex is visited onl once and then removed from the graph. For each vis- ited vertex , the -degree of all its neighbours has to be updated. As reported above, for a single neighbour , this takes deg )). Thus, the main cycle globally takes :( u,v deg )) = ∆) = ∆). In conclusion, the running time of the )-cores algorithm is therefore ∆). 4. SPEEDING-UP )-cores In this section we show how to further speed-up our )-cores algorithm. Our key

observation is that the main bottleneck of )-cores is the computation of initial -degrees (experimentally conﬁrmed in Section 5): although this step is asymptotically as fast as updating -degrees af- ter a vertex removal, the latter is in practice faster as it is performed on a graph that gets progressively smaller. In this regard, we derive a fast-to-compute lower bound on the -degree and use it as a placeholder during the ﬁrst itera- tions, while replacing it with the actual -degree only when the vertex at hand is going to be processed. This way, the initial -degrees can

becomputedonly whenactually needed andon asmaller graph, thusleadingtothedesired speed-up. In the following we provide the details of our lower bound on the -degree and show how to eﬃciently update this bound after vertex removals. Then, we describe how to in- corporate such ﬁndings into the enhanced algorithm.

Page 6

Lower bound on the -degree. We deﬁne our lower bound on the -degree in terms of the regularised beta func- tion . Given a real number [0 1] and two integers and , the regularized beta function a,b ) is deﬁned as the ra- tio between the

incomplete beta function a,b ) and the beta function a,b ) [29]: a,b ) = a,b a,b (1 Given a vertex in the input graph, let min ) denote the minimum probability on the edges incident to , i.e., min ) = min . The next lemma shows how the probability for to have degree no less than can be lower- bounded by using the regularised beta function Lemma 1. Given an uncertain graph = ( V,E,p , for every vertex and for all [0 ..d it holds that Pr[ deg min k,d +1) Proof. Consider a vertex having as many incident edges as , and assume that each edge incident to has probability min ). It is easy to see that

Pr[deg( ) = Pr[deg( ) = ], for all . Exploiting Equation (4) we get: Pr[deg( ) = Pr[deg( ) = ] = min (1 min )) = min )) (1 min )) Combining such a result with Equation (3) we obtain: Pr[ deg ] = Pr[ deg ) = min )) (1 min )) min k,d +1) The lemma follows. Thedesiredlower boundon -degreecannowimmediately be derived by exploiting Lemma 1. We denote such a lower bound by lb and formally state it in the next theorem. Theorem 4. Given an uncertain graph = ( V,E,p , for every vertex it holds that deg lb )=max [0 ..d min k,d +1) The computation of the above lower bound is very fast. For a ﬁxed

, the values a,b ) of the regularised beta func- tion are monotonically non-increasing as increases and/or decreases. Therefore, the lower bounds on Pr[ deg are monotonically non-increasing as increases and one can thus perform binary search to derive the maximum such that min k,d +1) , which, according toTheorem 4, corresponds to the lower bound lb ). The computation of lb ) requires a logarithmic (in the number of edges of ) number of evaluations of . Each evaluation of can be computed in constant time using tables [23]. Thus, computing lb ) for a vertex takes (log ) time. A major feature of

the lower bound lb is its fast from- scratch computation. Here we show that it can also be up- dated very eﬃciently (i.e., in constant time) when an edge is removed from the graph. To this end, we ﬁrst need to report a couple of results. We start by showing that the -degree of a vertex can decrease at most by one when an edge incident to is removed (Lemma 2). Lemma 2. Given an uncertain graph = ( V,E,p and a vertex , let be an edge incident to and let = ( V,E \{ ,p be the subgraph of where is missing. Also, let deg be the -degree of in . It holds that deg > deg Proof. Pr[ deg ] =

Pr[ deg )= ]+ Pr[ deg )= 1]= Pr[ deg ) = {z Pr[ deg 1] (1 )Pr[ deg ) = 1] Pr[ deg 1] By the deﬁnition of -degree we know that Pr[ deg deg )+ 1] < ; thus, setting deg )+ 2 in the above inequality, we get Pr[ deg deg )+2] Pr[ deg deg )+1] <η. Then, deg < deg ) + 2, or, equivalently, deg > deg 2. The lemma follows. Based on the above lemma, we can also prove that the lower bound lb ) of a vertex can decrease at most by one when an edge incident to is removed. Theorem 5. Given an uncertain graph = ( V,E,p and a vertex , let be an edge incident to and let = ( V,E \{ ,p be the subgraph

of where is missing. Also, let lb be the lower bound on the -degree of in . It holds that lb > lb Proof. Consider a vertex having as many incident edges as , and assume that each edge incident to has probability min ). It is easy to see that the -degree deg ) of equals the lower bound lb ). Combin- ing this with Lemma 1, we get lb ) = deg deg 2 = lb The theorem follows. Theorem 5 can be exploited for safely updating lb in constant time. Let denote again the edge incident to to be removed and let be the subgraph of where is missing. Thus, lb ) denotes the earlier lower bound of while lb

)denotesthenewlower boundtobecomputed after the ’s removal. The idea is to compute (in constant time) just the value min lb 1) lb )+1) = min lb ,d lb )). Lemma 1 ensures that Pr[ deg lb )] min lb ,d lb )) Thus, if min lb ,d lb )) is still , then the lower bound has not changed, i.e., lb ) = lb ). Otherwise, it means that the lower bound has decreased. According to Theorem 5, this decreasing can be at most by one, hence we can safely set lb ) = lb 1.

Page 7

A major shortcoming of updating lb as described above is that, for each vertex , we need to load/keep-in-memory ) values of

(i.e., all values within min k,h 1) [0 ..d ,k [0 ..h ). This would penalize too much both time and space complexity of the algorithm. However, this can be overcome by still relying on Theorem 5. The idea is to simply set lb ) = max , lb every time an edge incident to is removed, no matter whether min lb ,d lb )) or not. Indeed, Theo- rem 5 guarantees that lb 1 is still a lower-bound for deg ), even thoughpossibly less tight. This way our algo- rithm would require only ) values of for each vertex , i.e., just the values min k,d +1) [0 ..d The E-( )-cores algorithm. Wenowprovidethedetails of our

enhanced )-cores (for short, E-( )-cores algorithm (pseudocodeomittedforspacereasons). Thealgo rithm follows the scheme of the basic )-cores algorithm (Algorithm 2). The main diﬀerence is that, for each vertex , the lower bound lb ) is computed in the initialisation phase, ratherthantheexact -degree. Aset keepstrace of the vertices for which the exact -degree has not been com- puted yet. Right after initialisation, corresponds to the whole vertex set . In the main cycle, vertices are processed based on their (lower bound on) -degree. When a vertex is being processed, it is primarily

checked whether its exac -degree is already available. If not, the exact -degree of is computed and is moved to the proper set of the vector , so that it can be processed in the correct (possibly later) iteration. Otherwise, if the exact -degree of is available, the -core number of is set and the -degrees (either the exact or the lower bounds) of all ’s neighbours are updated. The worst-case time complexity of E-( )-cores is the sameasthebasic )-cores algorithm, i.e., ∆). How- ever, smaller running times are expected in practice due to the lazy computation/updating of -degrees in reduced

ver- sions of the input graph. 5. EXPERIMENTS In this section we report quantitative experiments on ef- ﬁciency and numerical stability of our )-cores and E- )-cores algorithms (Sections 3 and 4). For this task we use the following real-world uncertain graphs. Flickr www.flickr.com = 24125, = 300836). We borrowed the dataset from [24], where the probability of an edge between two users is deﬁned based on homophily the principle that similar interests indicate social ties. Par- ticularly, [24] uses as a measure of homophily the Jaccard coeﬃcient of the interest groups

shared by the two users. DBLP www.informatik.uni-trier.de/~ley/db/ 684911, = 2284991). The dataset was borrowed from [24, 15]. Twoauthorsare connectediftheyco-authored at least once, and the probability on an edge expresses the fact that the collaboration has not happened by chance: the more the collaborations, the larger the probability. Pre- cisely, [24, 15] deﬁne the probability of each edge based on an exponential function to the number of collaborations. BioMine biomine.org = 1008200, = 6742939). A snapshot of the database of the BioMine project [26] con- taining biological

interactions. Edges inherently come wi th We implemented our code in Java and run experiments on a 2.83GHz, 32GB Intel Xeon server. Table 1: Times (secs) of the proposed methods for com- puting ( )-core decomposition (precision 64 bits). The column gain (%) ” reports the gain of the E-( )-cores algorithm over the ( )-cores algorithm. initial main initial main -degrees cycle total -degrees cycle total gain (%) Flickr )-cores Flickr E-( )-cores 0.1 15.45 8.88 24.33 14.41 7.98 22.39 7.99% 0.3 13.73 7.89 21.61 12.90 7.22 20.12 6.89% 0.5 12.56 7.33 19.89 11.86 6.71 18.57 6.62% 0.7 11.45 6.64 18.09

10.82 6.14 16.96 6.25% 0.9 9.86 5.72 15.58 9.34 5.32 14.66 5.87% DBLP )-cores DBLP E-( )-cores 0.1 53.81 36.92 90.73 38.23 26.45 64.68 28.71% 0.3 49.08 33.16 82.24 36.28 25.21 61.48 25.24% 0.5 44.74 31.14 75.88 33.98 24.45 58.43 23.00% 0.7 40.65 28.40 69.05 31.86 23.07 54.92 20.46% 0.9 35.54 24.42 59.96 28.40 21.06 49.46 17.51% BioMine )-cores BioMine E-( )-cores 0.1 4801 1549 6350 4388 1404 5792 8.78% 0.3 4704 1542 6246 4333 1447 5780 7.46% 0.5 4645 1538 6183 4281 1404 5685 8.05% 0.7 4568 1523 6091 4240 1403 5643 7.35% 0.9 4498 1478 5977 4151 1423 5575 6.72% probabilities. The probability of

any edge provides eviden ce that the interaction actually exists. Eﬃciency. Table 1 reports on the running times exhibited by our )-cores (left) and E-( )-cores (right) algo- rithms on the selected datasets. Times are split by the main phases of computing initial -degrees and running the main cycle. Both algorithms are very fast on Flickr and DBLP They take on average around 20 and 60 seconds, respec- tively. On BioMine , which is much larger and denser, clearly the time increases. However, the time required by our al- gorithms on the latter dataset is in the order of one hour. This is

reasonable for networks of such size and testiﬁes the applicability of our methods to very large uncertain graphs As expected, E-( )-cores runs faster than the basic )-cores algorithm, allowing a reduction of the total time up to around 30% ( DBLP = 0 1). The gain is more evident on the larger datasets (i.e., DBLP and BioMine ) and is generally increasing as decreases. The latter ﬁnding is expected because the smaller , the larger the -degree of a vertex, and, thus, the better the chance for the lower-bound to be tighter and lead to better pruning. Larger -degrees for smaller is

also the reason why times (for both phases and both algorithms) are increasing with smaller Numerical stability. As discussed in Section 3, proba- bilities may lead to numerical instability. To prevent this one can exploit native solutions provided by modern pro- gramming languages to enlarge range and/or precision of the numerical representation. As a side eﬀect, this would slow down the overall computation as larger precision im- plies slower arithmetic computations. Thus, the goal is to minimise the number of critical operations that may lead to numerical instability, to avoid using

a too large preci- sion with the aim of achieving reasonable accuracy. As re- ported in Section 3, a major feature of the novel dynamic- programming method we employ in our algorithms to com- pute/update -degrees is to alleviate such numerical issues. We next provide experimental evidence on this. First, we report results by varying the precision used for representing numbers (we consider 32, 64, 128, and 256 bits

Page 8

Table 2: Accuracy of ( )-core index for = 0 w.r.t. de- terministic core index (ground truth) for diﬀerent values of precision (bits). dataset pr=32 pr=64

pr=128 pr=256 avg absolute error Flickr 6.17 5.12 3.4 2.26 DBLP 0.27 0.1 0.03 0.01 BioMine 2.18 1.25 0.41 0.14 % vertices with non-zero error Flickr 31.69% 18.91% 11.92% 6.00% DBLP 17.48% 2.27% 0.51% 0.18% BioMine 1.51% 1.11% 0.47% 0.09% as precision levels). We note that, for = 0, the ( )-core decomposition of an uncertain graph should ideally corre- spond to the core decomposition of the deterministic graph derived from by ignoring probabilities. Thus, we measure accuracy by comparing, for each vertex, the 0-core number outputted by our algorithms with the core number returned by the

standard core algorithm (Algorithm 1) on such a deterministic graph. Tables 2 and 3 show accuracy results (in terms of per- vertexaverage absolute error and percentage of vertices wi th core number other than the exact one) and running times, respectively. We report times separately for )-cores and E-( )-cores , while accuracy is the same for both. As expected, larger precision leads to better accuracy and worse eﬃciency. Particularly, theresultsshowalineartre nd: doubling the precision, time doubles while errors get halve d. We also compare the results of our algorithms when equipped

with the proposed dynamic-programming method to the results of our algorithms equipped with the method thatcomputes/updates -degreesusingtheformulainEqua- tion (5). We denote our proposed combination )-cores + dynamic-programming method” simply as )-cores while we refer to the “baseline” combination )-cores + Equation (5)-based method” as Eq 5. These results are summarised in Table 4 (precision 64 bits). Our method out- performs Eq 5 in terms of both average absolute error and percentage of vertices with non-zero error. Particularly, the average absolute error of the Eq 5 method is reduced by

9% Flickr ), 41% ( DBLP ), and 40% ( BioMine ). 6. INFLUENCE MAXIMIZATION The inﬂuence-maximization problem [16], has received a great deal of attention over the last decade. It requires to ﬁnd a set of vertices , with , that maximizes the expected spread , i.e., the expected number of vertices that would be infected by a viral propagation started in , under a certain probabilistic propagation model. The independent cascade model [16] is a widely-used prop- agation model; under this model, the problem of ﬁnding a set of vertices that maximizes the expected spread ) is NP

-hard. However, the submodularity of ) al- lows the Greedy algorithm that iteratively adds to the vertex bringing the largest marginal gain to the objective function to achieve a (1 ) approximation guarantee. Un- fortunately, ﬁnding the maximum-marginal-gain vertex re- quires to solve a #P -complete reliability problem. Hence, existing approaches usually apply sampling methods (e.g., Monte Carlo) to estimate the best seed vertex at each it- eration of the algorithm. This drastically aﬀects the eﬃ- In our implementation, we use the BigDecimal Java API, which allows for

representing numbers arbitrarily large and/or s mall, and with arbitrary user-deﬁned precision (up to “unlimited”pr ecision). Table 3: Times (secs) of the two proposed methods for computing ( )-core decomposition, for = 0 , for dif- ferent values of precision (bits). prec. initial main initial main (bits) -degrees cycle total -degrees cycle total gain (%) Flickr )-cores Flickr E-( )-cores 32 6.96 3.83 10.79 6.63 3.73 10.36 3.94% 64 15.23 8.89 24.12 14.08 7.94 22.02 8.72% 128 25.55 14.48 40.03 23.69 12.92 36.62 8.53% 256 34.35 22.13 56.48 31.95 19.68 51.63 8.59% DBLP )-cores DBLP E-(

)-cores 32 26.71 20.22 46.93 19.46 15.51 34.97 25.48% 64 56.73 39.19 95.92 40.98 27.17 68.14 28.96% 128 86.65 59.81 146.5 62.84 40.40 103.2 29.51% 256 128.7 89.14 217.8 91.15 59.30 150.5 30.93% BioMine )-cores BioMine E-( )-cores 32 2376 704 3080 2021 659 2681 12.97% 64 5452 1693 7145 4738 1390 6128 14.24% 128 9815 3146 12961 8153 2607 10760 16.98% 256 13296 5055 18351 11274 4515 15789 13.96% Table 4: Accuracy of the proposed method in terms of error w.r.t. a ground truth (precision 64 bits). avg absolute error vertices w. non-zero error dataset )-cores Eq )-cores Eq Flickr 5.12 5.62 18.91%

19.91% DBLP 0.1 0.17 2.27% 4.42% BioMine 1.25 2.07 1.11% 1.36% ciency of the algorithm, thus limiting its applicability on ly to moderately-sized networks (the time complexity of the al gorithm is sTnm ), where is the number of Monte Carlo samples, with [1000 10000], usually). Optimizations of the basic algorithm have been deﬁned which exploit the sub- modularity of to avoid unneeded computations [12], but the improvement achieved is typically not enough to handle large graphs (in the experiment that we show below, on a moderately sized graph a state-of-the-art algorithm such a Celf++

[12] could not ﬁnish after several weeks). Withinthisview, auseful application ofour ( k, )-corede- composition is to provide a way to speed-up the execution of the Greedy algorithm. The idea is simple: just reduce the input graph by keeping only the inner-most -shells and run the (optimized version of the) Greedy algorithm on such a reduced graph. The rationale here is that, as ex- perimentally observed in [17], the core decomposition of th deterministic version of , is a direct indicator of the ex- pected spread of a vertex: the higher the core index is, the more likely the vertex is

an inﬂuential spreader. The ﬁnding in [17] however exploits cores derived from a deterministic versionoftheinputgraph, thuscompletelyignoringitspro b- abilistic nature. We conjecture that exploiting a notion of core decomposition deﬁned ad-hoc for uncertain graphs can only positively aﬀect the behaviour observed in [17]. We next empirically show the correctness of our conjecture. Experiments. We use a small directed graph from Twitter = 21882 = 372005), and a set of propagations of URLs in the social graph, which we use as past evidence to learn the inﬂuence

probabilities (we employ the traditiona methoddescribedin[11] for this). Eachedge( u,v )expresses the fact that is a follower of and the corresponding prob- ability provides evidence that an action performed by will be performed by as well. The objective here is to show that running the standard Greedy inﬂuence-maximizationalgorithmonareducedver- sion of the graph given by the inner-most ( k, )-shells allows to achieve high-quality results while keeping the running time small. We test our method replacing the notion of de-

Page 9

Table 5: Expected spread achieved by the

proposed )-cores-based method vs. some baselines with vary- ing the output set size = 10 = 20 = 30 )-cores 9570 9606 9610 out-degree 9014 9016 9130 -degree 9019 9089 9125 exp-degree 9012 9093 9123 cores 9134 9192 9223 gree with out-degree (given that the graph is directed) and setting = 0 5. We obtain 8 cores and keep the three inner- most ( k, )-shells. This gives a reduced graph with 2064 vertices and 86142 edges. We run the optimized version of the Greedy algorithm deﬁned in [12], i.e., the Celf++ al- gorithm, on such a reduced graph and take the seed vertices outputted as our

result. For accuracy evaluation, we compute the expected spread achieved by on the whole graph (using Monte Carlo sam- pling with 10000 samples). As criteria for comparison, we use the top- vertices ranked according to the following baseline ranking functions: ( ) maximum out-degree (ignor- ing probabilities, as suggested in the seminal work on inﬂu- ence maximization [16]), ( ii ) maximum -degree, ( iii ) maxi- mum expected degree (computed by summing the probabil- ities on the edges outgoing from a vertex), and ( iv ) vertices computed by running Celf++ on the graph reduced ac-

cording to deterministic core decomposition (ignoring prob- abilities). Note that we could not use the results of the direct execution of Celf++ on the whole graph due to its excessive running time (it could not ﬁnish in several weeks) The results reported in Table 5 (we vary from 10 to 30) show how our )-cores -based method evidently out- performs all the baselines, allowing to increase the spread up to 590 ( out-degree ), 551 ( -degree ), 558 ( exp-degree ), and 436 ( cores ). As far as eﬃciency, we report runtimes in the order of 4–5 hours (with = 30), which are times clearly

aﬀordable—contrast to the unaﬀordable runtime of the direct execution of Celf++ on the whole graph. 7. TASK-DRIVEN TEAM FORMATION In task-driven team formation we are given a collabora- tion graph = ( V,E, ), where vertices are individuals and edges are assigned a probabilistic topic model , represent- ing (a distribution on) the topics exhibited by past collab- orations. The topic model can be produced by standard methods, such as the popular Latent Dirichlet Allocation (LDA) [6]. The input of LDA (or any other similar method) is ( ) a number of topics, and ( ii ) for each edge

( u,v a document u,v ) representing all the past collaborations between and . The document u,v ) is a bag of terms coming from a ﬁnite vocabulary Σ. The output is the topic model , that is: for each edge ( u,v and each topic [1 ..Z ], the probability u,v = ( u,v ) that the collaborations between and are on the topic , with =1 u,v = 1. for each term Σ, a distribution over topics, i.e., for each topic [1 ,Z ], the probability that the term has been generated by the topic , with = 1. A task-driven team-formation query is a pair T,Q where Σ is a set of terms describing a

task, and is a set of vertices (possibly even a single vertex). The goal is to ﬁnd an answer vertex set , with , which is a good team to perform the task described by the terms in Being a good team means having a good aﬃnity among the team members with respect to the given task. We report more formal details on this in the following. The querytask , together with the topic model , induce a single probability value u,v ) for each edge ( u,v such that u,v ) represents the likelihood that has been generated by a collaboration between and u,v ) = u,v ) = =1 u,v (8) Hence, given a

task , the input collaboration graph yields an uncertain graph = ( V,E,p ). This way, given and a set of query vertices , the task of ﬁnding a good team for the query at hand directly translates into ﬁnding a subgraph of that represents a good community for . Formally, the goal is to ﬁnd a connected subgraph = ( ,E ) of that ( ) contains all query vertices ), and ( ii ) maximizes a notion of density. Particu- larly, as far as the density measure, the minimum degree has been widely recognized as a principled choice for this kind of problem. We therefore rely on this notion

of density and ask for the subgraph to maximize the minimum -degree of a vertex in . The resulting problem statement is: Problem 2 (Task-Driven Team Formation). Given a collaboration graph = ( V,E, and a query T,Q , let be the uncertain graph derived from and as described in Equation (8). Given a threshold [0 1] we want to ﬁnd a connected subgraph = ( ,E of induced by a set of vertices such that = argmax min deg Exploiting ( )-cores for team formation. We now show that Problem 2 can be optimally solved by resorting to our notion of ( )-core decomposition. This result is stated in the

next theorem (we omit the proof for space reasons). Theorem 6. Given an uncertain graph and a thresh- old [0 1] , let ,C ,...,C be the ( )-core decomposition of (with ), and, given a set of query vertices , let be the smallest- sized core in such that every belongs to the same connected component of Then, the solution to Problem 2 is given by the connected component of that contains Theorem6providesuswithaprincipledwayofsolvingProb- lem 2. The solution can be summarized as follows: 1. Given a collaboration graph = ( V,E, ) and a task- driven team-formation query T,Q , derive the uncer- tain

graph = ( V,E,p ) (Equation (8)). 2. Compute the ( )-core decomposition of As argued in [27], maximizing the minimum degree provides a b et- ter evidence of the goodness of a community than, e.g., the ma xi- mization of the average degree, which is instead more suitab le for dense-subgraph discovery. As Equation (8) can produce very small probabilities, in our imple- mentation we prune by removing edges with probability smaller than a threshold = 10 16 in our experiments).

Page 10

Table 6: Three examples of task-driven team-formation queries and c orresponding results. gene,express ,

T xml,tree , T auction,model H.V.Jagadish H.V.Jagadish,S.Muthukrishnan S.Muthukrishnan Brian D. Athey, Giovanni Scardoni, Kathleen A. Stringer, Venkateshwar G. Keshamouni, S. Muthukrishnan Uri Nadav, Noam Nisan, Jon Feldman, Jing Gao, Terry E. Weymouth, Vasudeva Mahavisno, Panagiotis G. Ipeirotis, Vahab S. Mirrokni, Gagan Aggarwal, Charles F. Burant, Christopher W. Beecher, Lauri Pietarinen Tanmoy Chakraborty, Aranyak Mehta Maureen A. Sartor, Alla Karnovsky, Rork Kuick, H. V. Jagadish Evdokia Nikolova, S. Muthukrishnan Zach Wright, James D. Cavalcoli, Gilbert S. Omenn, Divesh Srivastava,

Martin Pal, Cliﬀord Stein, Eyal Even-Dar H. V. Jagadish , Carlo Laudanna, Tim Hull, Nick Koudas Florin Constantin, Yishay Mansour Barbara R. Mirel, V. Glenn Tarcea 3. Visit the cores in starting from the smallest-sized one (i.e., the inner-most core), until ﬁnding 4. Return the connected component of containing as the solution to Problem 2. Experiments. We consider task-driven team formation in the context of collaborations among computer-science re- searchers. We build a collaboration network from the DBLP database ( www.informatik.uni-trier.de/~ley/db/ ): ver- tices are authors

and an edge connects two authors if they co-authored at least once. The resulting graph has 1089442 and = 4144697. For each edge, we take the bag of words of the titles of all papers coauthored by the two authors (words are stemmed and stop-words are removed), and apply LDA to infer the topic model (we set = 100). In Table 6 we report the results of three task-driven team- formation queries. The ﬁrst two queries share the query ver- tex H. V. Jagadish , but the ﬁrst task is about gene expres- sion while the second one is about xml : as expected the two proposed teams are very

diﬀerent. The third query shares with the second one the vertex S. Muthukrishnan ; but, un- like the previous one that is about xml (a database topic), the third query is about auction models (an algorithm- theory topic): the diﬀerent teams proposed correctly reﬂec the diﬀerence in the tasks. It is worth noticing that the extraction of these teams, following the process described above and exploiting our eﬃcient ( )-core decomposition, takes approximately 2-3 seconds on a commodity laptop. 8. CONCLUSIONS In this paper we extend the graph tool of core decom-

position to the context of uncertain graphs. We deﬁne the k, )-core concept, and we devise eﬃcient algorithms for computing a ( k, )-core decomposition. As a future work, we plan to investigate the relationship between ( k, )-cores and other deﬁnitions of (probabilistic) dense subgraphs, s as to exploit the former as a speeding-up preprocessing. 9. REFERENCES [1] E. Adar and C. Re. Managing Uncertainty in Social Network s. IEEE Data Eng. Bull. , 30(2):15–22, 2007. [2] J. I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, and A. Vespignani. Large scale networks

ﬁngerprinting and visualization using the k-core decomposition. In NIPS , 2005. [3] R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW , 2009. [4] S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting Protein Complex Membership using Probabilisti Network Reliability. Genome Res. , 14:1170–1175, 2004. [5] V. Batagelj and M. Zaverˇsnik. Fast algorithms for deter mining (generalized) core groups in social networks. Advances in Data Analysis and Classiﬁcation , 5(2):129–145, 2011. [6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent

dirichlet allocation. JMLR , 3:993–1022, 2003. [7] P. Boldi, F. Bonchi, A. Gionis, and T. Tassa. Injecting Uncertainty in Graphs for Identity Obfuscation. PVLDB 5(11):1376–1387, 2012. [8] X. H. Chen, A. P. Dempster, and J. S. Liu. Weighted ﬁnite population sampling to maximize entropy. Biometrika 81:457–469, 1994. [9] N. Dalvi and D. Suciu. Eﬃcient query evaluation on probabilistic databases. In VLDB , pages 864–875, 2004. [10] D. Eppstein, M. L oﬄer, and D. Strash. Listing all maximal cliques in sparse graphs in near-optimal time. In ISAAC , 2010. [11] A. Goyal, F.

Bonchi, and L. V. Lakshmanan. Learning inﬂuence probabilities in social networks. In WSDM , 2010. [12] A. Goyal, W. Lu, and L. V. Lakshmanan. Celf++: optimizin the greedy algorithm for inﬂuence maximization in social networks. In WWW , pages 47–48, 2011. [13] J. Healy, J. Janssen, E. E. Milios, and W. Aiello. Characterization of graphs using degree cores. In WAW , 2006. [14] R. Jin, L. Liu, and C. C. Aggarwal. Discovering Highly Reliable Subgraphs in Uncertain Graphs. In KDD , 2011. [15] R. Jin, L. Liu, B. Ding, and H. Wang. Distance-Constrain Reachability Computation in

Uncertain Graphs. PVLDB 4(9):551–562, 2011. [16] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the sp read of inﬂuence through a social network. In KDD , 2003. [17] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchn ik, H. E. Stanley, and H. A. Makse. Identifying inﬂuential spreaders in complex networks. Nature Physics 6, 888 , 2010. [18] G. Kollios, M. Potamias, and E. Terzi. Clustering large probabilistic graphs. TKDE , 25(2):325–336, 2013. [19] G. Kortsarz and D. Peleg. Generating sparse 2-spanners J. Algorithms , 17(2):222–236, 1994. [20] D. L.-Nowell and J.

Kleinberg. The Link Prediction Prob lem for Social Networks. In CIKM , 2003. [21] V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data . 2010. [22] L. Liu, R. Jin, C. Aggrawal, and Y. Shen. Reliable cluste ring on uncertain graphs. In ICDM , 2012. [23] K. Pearson. Tables of the Incomplete Beta-Function Cambridge University Press, 1968. [24] M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-Nea rest Neighbors in Uncertain Graphs. PVLDB , 3(1):997–1008, 2010. [25] S. B. Seidman. Network structure and minimum

degree. Social Networks , 5(3):269–287, 1983. [26] P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, and H. Toivonen. Link Discovery in Graphs Derived from Biologic al Databases. In DILS , 2006. [27] M. Sozio and A. Gionis. The community-search problem an how to plan a successful cocktail party. In KDD , 2010. [28] L. G. Valiant. The Complexity of Enumeration and Reliab ility Problems. SIAM J. on Computing , 8(3):410–421, 1979. [29] E. W. Weisstein. Binomial distribution. From MathWorl d—A Wolfram Web Resource. Last visited on 16/5/2013, http://mathworld.wolfram.com/BinomialDistribution.h tml

[30] S. Wuchty and E. Almaas. Peeling the yeast protein netwo rk. Proteomics , 5(2):444–449, Feb. 2005. [31] Y. Yuan, G. Wang, L. Chen, and H. Wang. Eﬃcient Subgraph Similarity Search on Large Probabilistic Graph Databases. PVLDB , 5(9):800–811, 2012. [32] H. Zhang, H. Zhao, W. Cai, J. Liu, and W. Zhou. Using the k-core decomposition to analyze the static structure of large-scale software systems. The Journal of Supercomputing 53(2):352–369, 2010. [33] L. Zou, P. Peng, and D. Zhao. Top-K Possible Shortest Pat Query over a Large Uncertain Graph. In WISE , 2011. [34] Z. Zou, H. Gao, and

J. Li. Discovering Frequent Subgraph over Uncertain Graph Databases under Probabilistic Semantics. In KDD , 2010.

Â© 2020 docslides.com Inc.

All rights reserved.