Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver Canada With tools that you probably have around the house lab A simple method for multirelational outlier detection ID: 356787
Download Presentation The PPT/PDF document "A simple method for multi-relational out..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A simple method for multi-relational outlier detection
Sarah Riahi and Oliver Schulte
School of Computing ScienceSimon Fraser UniversityVancouver, Canada
With tools that you probably have around the
house
lab.Slide2
A simple method for multi-relational outlier detection
aSlide3
System Flow
Flach, P. A. (1999), Knowledge representation for inductive learning'Symbolic and Quantitative Approaches to Reasoning and Uncertainty', Springer, pp. 160--167.
CompleteDatabasePopulation ParameterValues
restrict to
target individual
vector norm
outlier score
Individual Profile
Individual Parameter
Values
Parameter
Learning
Algorithm
Parameter
Learning
Algorithm
Model
....
....
Input: Model, database, target individual.
Output: an outlier scoreSlide4
Example
A simple method for multi-relational outlier detection Model = Markov Logic Network learned for Premier League Season 2011-2012
FormulasEstimated Population ParametersEstimatedParameters for P=van PersieSavesMade(P,M)=med AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=low0.020.56SavesMade(P,M)=med AND shotsOnTarget(P,M)=high AND ShotEff(P,M)=high3.55
0.36
... (331 formulas)
....
....Slide5
Evaluation: Synthetic Data
A simple method for multi-relational outlier detectionTwo Features.Designed so that outliers are easy to distinguish from normals (sanity check).
Normals have a strong correlation, outliers none.Outliers have a strong correlation, normals none.Correlations are the same, but marginals are very different.Slide6
Bayesian Network Representation
F1=
ShotEfficiencyF2=Match_Resullt
P(F1=1)= % 50
P(F2=0|F1=0)= % 90
P(F2=1|F1=1)= % 90
Normal=Striker
P(F1=1)= % 50
P(F2=1)= % 50
Outlier=
MidFielder
P(F1=1)= % 50
P(F1=1)= % 50
(a)
(b)
P(F2=1)= % 50
P(F2=0|F1=0)= % 90
P(F2=1|F1=1)= % 90
F1=
ShotEfficiency
F2=
Match_Resullt
Normal=Striker
F1=
TackleEfficiency
F2=
Match_Resullt
F1=
TackleEfficiency
F2=
Match_Resullt
Outlier=
MidFielderSlide7
Results
AD = Breunig, M.; Kriegel, H.-P.; Ng, R. T. & Sander, J. (2000), LOF: Identifying Density-Based Local Outliers, in ‘ACM SIGMOD'.LOG = Riahi, F.; Schulte, O. & Liang, Q. (2014), 'A Proposal for Statistical Outlier Detection in Relational Structures', AAAI-StarAI
Workshop on Statistical-Relational AI. Metric = Area Under Curve ELD = average L1-norm KLD = average difference AD = use single feature marginals only (unit clauses)
LOG = outlier score = log-likelihood Slide8
A simple method for multi-relational outlier detection
Case Study: Single Features
Which formulas/rules influence outlier score the most? interpretability Which unit clauses influence outlier score the most?Slide9
Novak, P. K.; Webb, G. I. & Wrobel
, S. (2009), 'Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining', Journal of Machine Learning Research. Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel
, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational Data: A Case Study', Expert Systems and Applications
Case Study: Correlations
Which formulas/rules influence outlier score the most?
interpretability
Which associations influence outlier score the most?
Related to exception mining (Novak et al. 2009)
Individual
Rule
Confidence
Individual
Confidence
Class
Edin
Dzeko
ShotEff
= high
AND TackleEff
= medium
DribbleEff = low50%
38%Van PersieShotEff
= high AND TimePlayed = high ShotsOnTarget
= high70%50%
Confidence = conditional probabilitySlide10
Distribution Divergence Perspective
Halpern, “An analysis of first-order logics of probability”, AI Journal 1990.de Raedt, L. (2008), Logical and Relational Learning, Springer. Ch.9
Joint Value AssignmentsFrequency for Random StrikerFrequency forP=van PersieSavesMade(P,M)=low AND shotsOnTarget(P,M)=low AND
ShotEff
(P,M)=low
22%
10%
SavesMade
(P,M)
=low
AND
shotsOnTarget
(P,M)=high
AND
ShotEff
(P,M)=high
30%62%…
........
Outlier Score = Dissimilarity measure between Random Individual and Target Individual. In our work, dissimilarity measure = distribution divergence.
Could leverage other distance-type metrics as well.Slide11
Propositionalization for Outlier Detection
Lippi, M.; Jaeger, M.; Frasconi, P. & Passerini, A. (2011), 'Relational information gain', Machine Learning 83(2), 219—239.
PlayersSavesMade(P,M)=med AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=lowSavesMade(P,M)=med AND shotsOnTarget(P,M)=high AND ShotEff
(P,M)=high
(331 more)
Wayne Rooney
13%
10%
...
van
Persie
50%
62%
...
…
....
....
Construct 331-dimensional attribute vector for each individual.
One frequency/count value for each formula
pseudo-i.i.d
data view.
Like n-grams.
Apply standard single-table analysis methods. Could also use learned weights instead of sufficient statistics.Slide12
Propositionalization Results
A simple method for multi-relational outlier detection
LowCor = Normals have low correlation.HighCor = Normals have high correlation.Slide13
Summary
Outlier detection based on a statistical-relational model.Basic Idea: compare individual profile to entire population.Leverage parameter learning:Learn parameter values for individual.
Learn parameter values for entire population.Outlier score = parameter vector difference.E.g. average L1-distance.Leverage relational distance between individuals.In our work, distance ≈ distribution divergence.Outlier score = divergence between individual distribution and population distribution.Another approach: Model-based propositionalization for outlier detection.Attribute-values = frequency counts for patterns in model structure.
A simple method for multi-relational outlier detectiona