/
A simple method for multi-relational outlier detection A simple method for multi-relational outlier detection

A simple method for multi-relational outlier detection - PowerPoint Presentation

trish-goza
trish-goza . @trish-goza
Follow
449 views
Uploaded On 2016-06-10

A simple method for multi-relational outlier detection - PPT Presentation

Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver Canada With tools that you probably have around the house lab A simple method for multirelational outlier detection ID: 356787

relational outlier score detection outlier relational detection score individual high method multi learning shotsontarget parameter simple shoteff model distribution

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A simple method for multi-relational out..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A simple method for multi-relational outlier detection

Sarah Riahi and Oliver Schulte

School of Computing ScienceSimon Fraser UniversityVancouver, Canada

With tools that you probably have around the

house

lab.Slide2

A simple method for multi-relational outlier detection

aSlide3

System Flow

Flach, P. A. (1999), Knowledge representation for inductive learning'Symbolic and Quantitative Approaches to Reasoning and Uncertainty', Springer, pp. 160--167.

CompleteDatabasePopulation ParameterValues

restrict to

target individual

vector norm

outlier score

Individual Profile

Individual Parameter

Values

Parameter

Learning

Algorithm

Parameter

Learning

Algorithm

Model

....

....

Input: Model, database, target individual.

Output: an outlier scoreSlide4

Example

A simple method for multi-relational outlier detection Model = Markov Logic Network learned for Premier League Season 2011-2012

FormulasEstimated Population ParametersEstimatedParameters for P=van PersieSavesMade(P,M)=med AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=low0.020.56SavesMade(P,M)=med AND shotsOnTarget(P,M)=high AND ShotEff(P,M)=high3.55

0.36

... (331 formulas)

....

....Slide5

Evaluation: Synthetic Data

A simple method for multi-relational outlier detectionTwo Features.Designed so that outliers are easy to distinguish from normals (sanity check).

Normals have a strong correlation, outliers none.Outliers have a strong correlation, normals none.Correlations are the same, but marginals are very different.Slide6

Bayesian Network Representation

F1=

ShotEfficiencyF2=Match_Resullt

P(F1=1)= % 50

P(F2=0|F1=0)= % 90

P(F2=1|F1=1)= % 90

Normal=Striker

P(F1=1)= % 50

P(F2=1)= % 50

Outlier=

MidFielder

P(F1=1)= % 50

P(F1=1)= % 50

(a)

(b)

P(F2=1)= % 50

P(F2=0|F1=0)= % 90

P(F2=1|F1=1)= % 90

F1=

ShotEfficiency

F2=

Match_Resullt

Normal=Striker

F1=

TackleEfficiency

F2=

Match_Resullt

F1=

TackleEfficiency

F2=

Match_Resullt

Outlier=

MidFielderSlide7

Results

AD = Breunig, M.; Kriegel, H.-P.; Ng, R. T. & Sander, J. (2000), LOF: Identifying Density-Based Local Outliers, in ‘ACM SIGMOD'.LOG = Riahi, F.; Schulte, O. & Liang, Q. (2014), 'A Proposal for Statistical Outlier Detection in Relational Structures', AAAI-StarAI

Workshop on Statistical-Relational AI. Metric = Area Under Curve ELD = average L1-norm KLD = average difference AD = use single feature marginals only (unit clauses)

LOG = outlier score = log-likelihood Slide8

A simple method for multi-relational outlier detection

Case Study: Single Features

Which formulas/rules influence outlier score the most? interpretability Which unit clauses influence outlier score the most?Slide9

Novak, P. K.; Webb, G. I. & Wrobel

, S. (2009), 'Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining', Journal of Machine Learning Research. Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel

, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational Data: A Case Study', Expert Systems and Applications

Case Study: Correlations

Which formulas/rules influence outlier score the most?

interpretability

Which associations influence outlier score the most?

Related to exception mining (Novak et al. 2009)

Individual

Rule

Confidence

Individual

Confidence

Class

Edin

Dzeko

ShotEff

= high

AND TackleEff

= medium

DribbleEff = low50%

38%Van PersieShotEff

= high AND TimePlayed = high ShotsOnTarget

= high70%50%

Confidence = conditional probabilitySlide10

Distribution Divergence Perspective

Halpern, “An analysis of first-order logics of probability”, AI Journal 1990.de Raedt, L. (2008), Logical and Relational Learning, Springer. Ch.9

Joint Value AssignmentsFrequency for Random StrikerFrequency forP=van PersieSavesMade(P,M)=low AND shotsOnTarget(P,M)=low AND

ShotEff

(P,M)=low

22%

10%

SavesMade

(P,M)

=low

AND

shotsOnTarget

(P,M)=high

AND

ShotEff

(P,M)=high

30%62%…

........

Outlier Score = Dissimilarity measure between Random Individual and Target Individual. In our work, dissimilarity measure = distribution divergence.

Could leverage other distance-type metrics as well.Slide11

Propositionalization for Outlier Detection

Lippi, M.; Jaeger, M.; Frasconi, P. & Passerini, A. (2011), 'Relational information gain', Machine Learning 83(2), 219—239.

PlayersSavesMade(P,M)=med AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=lowSavesMade(P,M)=med AND shotsOnTarget(P,M)=high AND ShotEff

(P,M)=high

(331 more)

Wayne Rooney

13%

10%

...

van

Persie

50%

62%

...

....

....

Construct 331-dimensional attribute vector for each individual.

One frequency/count value for each formula

pseudo-i.i.d

data view.

Like n-grams.

Apply standard single-table analysis methods. Could also use learned weights instead of sufficient statistics.Slide12

Propositionalization Results

A simple method for multi-relational outlier detection

LowCor = Normals have low correlation.HighCor = Normals have high correlation.Slide13

Summary

Outlier detection based on a statistical-relational model.Basic Idea: compare individual profile to entire population.Leverage parameter learning:Learn parameter values for individual.

Learn parameter values for entire population.Outlier score = parameter vector difference.E.g. average L1-distance.Leverage relational distance between individuals.In our work, distance ≈ distribution divergence.Outlier score = divergence between individual distribution and population distribution.Another approach: Model-based propositionalization for outlier detection.Attribute-values = frequency counts for patterns in model structure.

A simple method for multi-relational outlier detectiona