Xuliang Zhu Xin Huang Byron Choi Jianliang Xu Hong Kong Baptist University Hong Kong China Outlines Motivations Related Work KDAGProblem Algorithms Experiments Conclusions Motivations ID: 1048086
Download Presentation The PPT/PDF document "Top-k Graph Summarization on Hierarchica..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Top-k Graph Summarization on Hierarchical DAGsXuliang Zhu, Xin Huang, Byron Choi, Jianliang Xu Hong Kong Baptist University Hong Kong, China
2. OutlinesMotivationsRelated WorkKDAG-ProblemAlgorithmsExperimentsConclusions
3. MotivationsHierarchical DAGs are everywhere (e.g. Disease Ontology, ImageNet, Wikipedia Categories)Node WeightsDisease OntologyThe occurrences of diseasesImageNetThe number of picturesWikipedia CategoriesThe views under the categoriesDisease OntologyWikipedia CategoriesImageNet
4. MotivationsTop-k summarizationMassive terminologies and complex structuresDifficult to understand or visualizeAn example (k=4)disease: general conceptCOVID-19: the most important diseasecancer & flu: two categories of multiple diseases with large weightsDisease OntologyTop-4 Summarization
5. Related WorkAggregate method[X Jing et al. 2014]Select top-k vertices with maximum aggregate value.Lack diversityGVDO ModelTreeDisease OntologyOur methodAggregate method
6. Related WorkGraph summarizationGraph stream summarization [X Gou et al. ICDE’19]RDF graph summarization [S Cebiric et al. PVLDB’15]Summarization for keyword search [G Fakas et al. SIGMOD’15]Top-k diversificationTop-k maximum clique [L Yuan et al. VLDBJ’ 16] Top-k diversified subgraph [Z Yang et al. SIGMOD’16]
7. OutlinesMotivationsRelated WorkKDAG-ProblemAlgorithmsExperimentsConclusions
8. kDAG-ProblemSummary ScoreHere, S is the selected set, V is the set of all vertices and feq(u) is the weight of vertex u.The score is determined by the weight and distance from the selected vertices.KDAG-problemFind the set
9. kDAG-Problem AnalysisDiversityThe selected vertices are not similar.Small ScaleThe size of selected vertices is small.Large CoverageThe selected vertices can reach many important vertices.High CorrelationThe representative correlation of selected vertices to an important vertex is high.Our kDAG-Problem is NP-hard Reduction from 3-SAT problem
10. OutlinesMotivationsRelated WorkKDAG-ProblemAlgorithmsExperimentsConclusions
11. Greedy+Marginal GainLAdd maximum marginal gain into S greedily.123456710244218661234567374639421866Running ExampleMarginal Gain
12. Greedy+Marginal GainLAdd maximum marginal gain into S greedily.123456710244218661234567374639421866Running ExampleMarginal Gain
13. Greedy+Marginal GainLAdd maximum marginal gain into S greedily.1234567102442186612345673739421866Running ExampleMarginal Gain
14. Greedy+Marginal GainLAdd maximum marginal gain into S greedily.1234567102442186612345673739211866Running ExampleMarginal Gain
15. Greedy+Marginal GainLAdd maximum marginal gain into S greedily.1234567102442186612345673714211866Running ExampleMarginal Gain
16. Greedy+Marginal GainLAdd maximum marginal gain into S greedily.123456710244218661234567114211866Running ExampleMarginal Gain
17. Greedy+Monotone Submodular(1 – 1/e)-approximation guarantee
18. EXT-GreedyExtract subtree based on greedy+ answer.Apply optimal method[1] on the subtree.[1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.0305312345671024421866Running Example
19. EXT-GreedyExtract subtree based on greedy+ answer.Apply optimal method[1] on the subtree.[1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.0305312345671024421866Greedy+ answer
20. EXT-GreedyExtract subtree based on greedy+ answer.Apply optimal method[1] on the subtree.[1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.0305312345671024421866Subtree extraction
21. EXT-GreedyExtract subtree based on greedy+ answer.Apply optimal method[1] on the subtree.[1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.0305312345671024421866Optimal solution in subtree
22. EXT-GreedyExtract subtree based on greedy+ answer.Apply optimal method[1] on the subtree.[1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.0305312345671024421866Optimal solution in the original DAG
23. K-PCGSCompress DAGCompress the DAG into a new graph by removing useless vertices.Pruning CandidatesPrune the vertices from candidates whose upper bound is smaller than the top-k lower bound.12345672202460066Compressed DAGPruned Candidates (Black)Candidates (Blue)123456722 220 012 240 00 03 63 6Lower Bound (Green) Upper Bound (Red)88030 60
24. K-PCGSCompress DAGCompress the DAG into a new graph by removing useless vertices.Pruning CandidatesPrune the vertices from candidates whose upper bound is smaller than the top-k lower bound.12346722024606612346722 220 012 240 03 63 688030 60Compressed DAGPruned Candidates (Black)Candidates (Blue)Lower Bound (Green) Upper Bound (Red)
25. K-PCGSCompress DAGCompress the DAG into a new graph by removing useless vertices.Pruning CandidatesPrune the vertices from candidates whose upper bound is smaller than the top-k lower bound.1234220(24, 12)60123422 220 014 300 088030 60Compressed DAGPruned Candidates (Black)Candidates (Blue)Lower Bound (Green) Upper Bound (Red)
26. K-PCGSCompress DAGCompress the DAG into a new graph by removing useless vertices.Pruning CandidatesPrune the vertices from candidates whose upper bound is smaller than the top-k lower bound.1234220(24, 12)60123422 220 2014 300 088030 60Compressed DAGPruned Candidates (Black)Candidates (Blue)Lower Bound (Green) Upper Bound (Red)
27. K-PCGSCompress DAGCompress the DAG into a new graph by removing useless vertices.Pruning CandidatesPrune the vertices from candidates whose upper bound is smaller than the top-k lower bound.1234220(24, 12)60123422 530 2014 300 088030 60Compressed DAGPruned Candidates (Black)Candidates (Blue)Lower Bound (Green) Upper Bound (Red)
28. Complexity ComparisonMethodTime ComplexitySpace ComplexityGreedy+O(nkm)O(m)EXT-GreedyO(nkm+nk3h) O(nk2h)k-PCGSO(mh + nlogn + k|C|m’)O(m)Here, n is the size of vertices, m is the size of edges, h is the height of DAG (length of longest shortest path).|C| is smaller than n and m’ is smaller than m.
29. OutlinesMotivationsRelated WorkKDAG-ProblemAlgorithmsExperimentsConclusions
30. DatasetsDatasetHierarchical DAGWeightnmLATTMedical Entity DictionaryNumber of queries from patients4,2268,190LNURMedical Entity DictionaryNumber of queries from nurses4,2268,190ANIM“Anime" catalog in WikipediaNumber of views15,13524,498IMAGEImageNet (WordNet)Number of pictures73,29874,732YAGO3Wikipedia + WordNetSynthetic5,699,25417,512,754UCISocial NetworkOnly for efficiency test Synthetic58,790,78392,208,196
31. Evaluation metricsSummary ScoreAverage Distance1-hop Weighted Coverage
32. Methods comparedFEQ [X Jing et al. 2014]Select k nodes with the highest weight.CAGG [X Jing et al. 2014]Aggregate method with a contribution ratio.LASP [G Fakas et al. SIGMOD’15]Add the largest averaged score path greedily.
33. Effectiveness EvaluationsAverage Distance of different models.Our kDAG model outperforms state-of-the-art models with the smallest average distance.
34. Efficiency EvaluationsEfficiency evaluation of Greedy+, EXT-Greedy, k-PCGS and CAGG methods on ImageNet.The k-PCGS runs fastest among all methods and EXT-Greedy is the slowest.
35. Efficiency EvaluationsThe size of candidatesThe k-PCGS reduces at least 90% candidates in the worst case and more than 99% in most cases.
36. OutlinesMotivationsRelated WorkKDAG-ProblemAlgorithmsExperimentsConclusions
37. ConclusionsWe study the top-k graph summarization problem on large hierarchical DAGs. We propose a novel model of kDAG-problem and proof its NP-hardness.We develop three algorithms to tackle kDAG-problem:Greedy+: A baseline greedy method.EXT-Greedy: A more effective method.k-PCGS: A more efficient method.
38. Thank you for watching!Contact us: csxlzhu@comp.hkbu.edu.hk