/
Sensitivity Analysis  & Sensitivity Analysis  &

Sensitivity Analysis & - PowerPoint Presentation

isabella2
isabella2 . @isabella2
Follow
66 views
Uploaded On 2023-08-23

Sensitivity Analysis & - PPT Presentation

Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal Jian Li amp Amol Deshpande Managing Uncertain Data using Probabilistic Databases Uncertain Incomplete amp Noisy data generated by a variety of data sources ID: 1014173

query queries sensitivity probability queries query probability sensitivity amp tuple influence conjunctive result analysis problem explanations contribution set max

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Sensitivity Analysis &" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic DatabasesBhargav Kanagal, Jian Li & Amol Deshpande

2. Managing Uncertain Data using Probabilistic DatabasesUncertain, Incomplete & Noisy data generated by a variety of data sourcesProbabilistic Databases have been effectively used (1) to consisely store such data and (2) efficiently execute queries over uncertain dataNumber of systems developed and a lot of research in recent yearsMystiq, Lahar [UW], MayBMS [Cornell], Trio [Stanford], MCDB [U.F], PrDB [U.Md]

3. However, Query Evaluation Model..List all `reputed’ car sellers in `12345’ who offer `Hondas’What is the average price of a Honda car in ‘12345’ ?(1) Why is `239’ in the result and why does this tuple have such a high probability value ?(2) I am unsure about probability of x1, Will the actual result be much different from this even if it was 0.4 ?(3) I resolved the uncertainty of tuple x1 by asking an expert. What is the new result?ExplanationsSensitivity AnalysisIncremental Query Re-evaluationNo intuition provided to the user about the query resultsProbabilities computed using Bayesian inference, similarity metrics, sentiment analysis and so on.

4. We propose an interactive model for query evaluation where a user can mark a query for further analysis:(1) Providing explanations for query results(2) Listing sensitive input tuples for a given query result(3) Allowing a user to update input probabilities and quickly recomputing resultsIn this paper,

5. OutlineIntroductionProblem FormulationSensitivity AnalysisExplanationsAnalysis for Conjunctive queriesAnalysis for Aggregation queriesResults

6. Problem FormulationIndependent Tuple-uncertain probabilistic databaseQueries:Value queries: Result is a set of numerical valuesConjunctive queries, Aggregation queriesSet queries: Result is a set of categorical itemsTop-k queries, probabilistic threshold queries

7. Formal problem: Sensitivity AnalysisInfluence: (of tuple (t, p) on query result R)Change p to p+∆p, determine ∆RInfluence = ∆R / ∆pValue queries: ∆R = |Rnew - Rold| Example: Conjunctive queries Difference in output probabilitiesSet queries: ∆R = |(Rnew \ Rold) ∪ (Rnew \ Rold)| Example: Top-k queries Additional + Missing tuplesProblem: Given query q, determine top-l influential variables for qEnsuing problem: Re-evaluate query results when input probabilities are modified

8. Influence -- Related workConnection to Re & Suciu 2008. (Probabilistic Databases)Defined influence for Boolean conjunctive queriesOur Derivative-based definition is applicable to all queries and it subsumes previous definitionConnection to Meliou et al. 2011 (Causality in DB)Notion of responsibility is closely related to influence[Re & Suciu 2008, Meliou et al. 2011]

9. Formal problem: ExplanationsContribution: (of tuple set S on query result R)Set S probabilities to 0, determine ∆RProblem: Given a query q and l, determine the set V, |V|<= l, with maximum contribution

10. Sensitivity Analysis ≠ Explanations Influence is about change/derivative, contribution is about actual value A high probability tuple can have large contribution, but low influenceE[SUM]Sensitivity Analysis:Sort by score valuesExplanations:Sort by aipi values

11. OutlineIntroductionProblem FormulationConjunctive queriesSensitivity AnalysisAggregationAnalysis for Aggregation queriesResults

12. Sensitivity Analysis -- Conjunctive/SPJ QueriesFirst, evaluate lineage of output tupleSubsequently, compute probability of the formulaList all “reputed” car sellers in “12345” who offer “Honda” carsSELECT SellerIdFROM Location, CarAds, ReputedWHERE reputation = ‘good’ AND city = `Mumbai’ AND Location.SellerId = CarAds.SellerId AND CarAds.SellerId = Reputed.SellerIdProbability of x1 ∧ z1 ∧ [y1∨y2]

13. Computing Influence ?Observation: Linearity PropertyGiven Boolean formula λ(x1, x2, ..., xn), p(λ) is a linear function of each p(xi) treated individuallyP(λ) = ci pi + ci’Intuition: (Shannon Exp)However, the problem of computing all influences is NP-hard

14. Read-once formulasEach node corresponds to a boolean formula

15. Read-once formulas (Sensitivity)From Chain Rule:AND:OR:One pass algorithm to compute all influencesEach node x stores deriv(x), which is derivative of root w.r.t xUpdate equations:AND: deriv(x) = deriv(parent(x)) * Prob(sibling(x))OR: deriv(x) = deriv(parent(x)) * (1 – Prob(sibling(x)

16. Non-Read-once formulas (Sensitivity)Expand Boolean formula into an XOR of read-once formulas (DTREE, Olteanu et al. 2010)Compute derivatives separately and aggregate them together

17. Explanations (Conjunctive Queries)Requires us to compute the set of size <= l tuples that can reduce the probability the mostThe Problem is NP-hard, however we provide efficient solutions for read-once lineagesDenote: OPT (λ, k) = smallest possible probability obtained by setting k probabilities in λ to 0AND:OR:

18. OutlineIntroductionProblem FormulationAnalysis for Conjunctive queriesAnalysis for Aggregation queriesResults

19. Aggregation queries (E[MAX])MAX: SELECT E[MAX(A)] from S (1) Sort tuples by score(2) Recurrence relation:Linearity property

20. Influence - MAX queriesExploiting linearity,Just lookupO(1) timeOverall time O(n) extra !

21. Explanations -- MAX queriesNeed to compute the set of size <= l tuples that reduces the maximum the mostDenote: OPT (i, j) = smallest possible value for max[i,n], by setting j probability values to 0Overall time O(n * l) extra

22. OutlineIntroductionProblem FormulationAnalysis for Conjunctive queriesAnalysis for Aggregation queriesResults

23. Sensitivity Analysis is essential & criticalQuery: Top-3 by probabilityVery sensitive to input tuple probability (.4 - .8)More intuitive to show such a graph

24. Sensitivity Analysis has low overheadTPC-h queries (conjunctive)Computing just probabilityComputing both probability & sensitivity values

25. Computing Explanations & re-evaluation is efficientExplanation computation is efficientIncremental re-evaluation is efficient by orders-of-magnitude

26. Thank you

27. Re-Evaluation (Boolean Conjunctive Queries)2 problemsFirst update query resultUpdate influencesExactly one probability updated:Use the derivative

28. Sensitivity Analysis ≠ Explanations Influence is about change/derivative, contribution is about actual value A high probability tuple can have large contribution, but low influenceConjunctive queries Tuple x1 has high probability: low influence & high contribution Tuple y1 has low probability: high influence & low contribution

29. Formal problem: Re-evaluationRe-evaluate query results when input probabilities are modifiedExploit previous computationChallenge:Incremental computation must be more efficient than re-computing the result from scratch