# Chapter 6 - Basic Similarity Topics

Embed code:

## Chapter 6 - Basic Similarity Topics

Download Presentation - The PPT/PDF document "Chapter 6 - Basic Similarity Topics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

### Presentations text content in Chapter 6 - Basic Similarity Topics

Slide1

Chapter 6 - Basic Similarity Topics

Case-based reasoning

Slide2

Introduction

Common term in everyday language, where two objects usually are considered similar if they look or sound similar

Similarity is a core concept within CBR

From a CBR perspective: «Two problems are similar if they have similar solutions»

Not as clear defined as the term equality

Accepted that similarity is subjective and requires approximate rather than exact reasoning

Slide3

Similarity and case representation

Similarity measures are defined to compare objects (cases) The measures operate on the case representationSimilarity is the essential function used for retrieval and the link between case representation and retrievalOnly consider attribute-value case representations and attribute-based similarity measures

Slide4

The mathematics of similarity

Two influencing factors:

Fuzzy sets

offers a background to model inexact expressions. Do not deal with classical yes-or-no answers, but rather ones that have vague character

Metrics

are used in mathematics whenever approximations (rather than exact solutions) are involved. This make them suitable for modeling similarity

Similarity measures may inherit and benefit from properties of these two factors. Examples of such properties are symmetry, transitivity, etc.

Slide5

Two mathematical models of similarity

Similarity as a

relation: Qualitative measure comparing different similaritiesExample: two objects are more similar to each other than two other objectsR(x,y,z) ⇔ «x is at least as similar to y as x is to z»Allows the definition the nearest neighbour conceptThe nearest neighbor of x is the y for which the R-relation above holds for all z

Example of k-NN where k=3

Slide6

Two mathematical models of similarity

Similarity as a

function

:

Make similarity quantitative by expressing how similar two objects are

Assigning a

number

/

degree

of similarity to pairs of objects

Def.: A similarity measure for a problem space P is a function

sim: P x

P

[0,1

]

Example of similarity functions and how they may be compared

sim (x,y) ≥ sim (x,z) ⇔ «x is at least as similar to y as x to z»

Slide7

Distances

Proxy to similarities, both look at the same object from different point of view

In most situations we can freely choose between distances and similarities

It is possible to convert between similarities and distances. However, such a transformation may not necessarily conserve the exact numerical similarity/distance values

Slide8

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similaritiesInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

Slide9

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similaritiesInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

Measures similarity by counting certain occurrences in the representation Count the number of family members for tax purposesExample: Hamming measures

Slide10

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similaritiesInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

Applicable to attributes with numerical values

Arise as variations of Euclidean metrics

Typically distance functions that represent a travel view

Slide11

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similaritiesInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

The measure counts the number of operations required to transform one object into another

Example: Levenshtein distance. Uses insertion, deletion and modification as possible change actions and counts the number of changes required

Slide12

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similaritiesInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

The

structure

in

which

the

knowledge

is

presented

plays

a

role

, e.g.

object

-orient

representation

Refers mainly to attributes that have

symbolic

attribute

values from with the attribute-based structure is

built

Slide13

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similarityInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

Information and knowledge plays an essential role

Often used for texts; considered similar if they provide similar information to the user

Slide14

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similarityInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

Weight the importance of different aspects contributing to similarity

Not a type in itself, but rather may rather be used in combination with the other types

Slide15

Types of similarity measures

Counting similaritiesMetric similaritiesTransformation similaritiesStructure-oriented similarityInformation-oriented similaritiesRelevance-oriented similaritiesDynamic-oriented similarities

Consider and compare dynamic processes

Slide16

Local-global principle of similarity

Useful when dealing with complex structures

The principle: Each object is constructed from atomic parts, by some construction process.

Possible to compare the atomic parts by using local measures, before comparing the more complex structure.

Determine the influence of each one of the local parts should have on the global measure by assigning weights to each part

Difficult problem to determine the weights

Slide17

Virtual attributes

A problem with the local-global principle arises when there are dependencies between the attributes that influence similarity

Example: bank

loans

Reliability

for

getting

a

loan

depends

on

both

income

and

spending

Assigning

weights to independent attributes make little sense

Introduce additional attributes that reflect the dependencies explicitly

Such attributes are defined in terms of the given attributes and are called virtual attributes

Allows simpler similarity measure

Slide18

Which similarity measure should be used?

Some influencing factors for the choice are:

Case representation

Size of case base

Efficiency needed for retrieval

Number of values in the domain of the attributes

Useful guidelines:

Try to ensure compatibility between case representation and the similarity measure

If possible, apply the local-global principle for complex structures

Slide19

Summary

Link between case representation and retrieval

There is no clear definition of the concept and there exists a variety of different types of measures

Similarity measures are heavily influenced by mathematics. Two mathematical ways to represent similarity is as a function or as a

relation

The

local

-global

principle

may

also

apply

to

similarity

measures

What type of similarity measure that should be used depends on the objects to be compared

Slide20

Few

comparisons

,

missing

an

overview

of the differences between the different types of similarity measures

Mainly descriptive presentation, making it difficult to distinguish between the different measures

What that the implications of choosing one type of measure over the

other

In a later

chapter

?