/
Measuring Distance Measuring Distance

Measuring Distance - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
392 views
Uploaded On 2016-03-25

Measuring Distance - PPT Presentation

Input for Multidimensional Scaling and Clustering Distances and Similarities Both are ways of measuring how similar two objects are Distances increase as objects are less similar The distance of an object to itself is 0 ID: 269331

darl distance similarity measures distance darl measures similarity false 3619 3520 2866 2871 variables 3043 maximum 3036 distances objects

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Measuring Distance" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Measuring Distance

Input for Multidimensional Scaling and ClusteringSlide2

Distances and Similarities

Both are ways of measuring how similar two objects are

Distances increase as objects are less similar. The distance of an object to itself is 0

Similarities increase as objects are more similar. The similarity of an object to itself is the maximum value for

the similarity

measureSlide3

Distance Examples

Mileage between two towns measured in straight line (Euclidian) distance (“as the crow flies”), as driving distance, or as great circle (spherical) distance

Instead of geographic locations we can treat measurements such as length, width, and thickness of an artifact as defining its positionSlide4

Similarity Examples

The number of characteristics two objects have in common (cultural traits, genes, presence/absence traits)

Similarity measures can be converted to distances by subtracting each similarity from the maximum possible similaritySlide5

Interval/Ratio Measures

Manhattan Distance (or City Block, 1-norm)

Euclidian Distance (and Squared Euclidian Distance, 2-norm)

Minkowski

Distance (p-norm)

Chebyshev

Distance (Maximum Distance, infinite norm)Slide6

Definitions

p

Distance

1

Manhattan

2

Euclidian

p

Minkowski

Infinity

Chebyshev

, MaximumSlide7

Counts

Ecologists use counts of species between plots to analyze compositional changes in community structure

Bray-Curtis compares the number of specimens and number of overlapping speciesSlide8

DefinitionsBray Curtis Dissimilarity

Note: If samples j and k are percentages,

then the denominator becomes 200.Slide9

Ordinal Measures

Few measures specifically for rank data, but rank correlation coefficients (spearman, Kendall) can be usedSlide10

Dichotomies

Can use interval/ratio measures

Numerous options based on 2x2 table

Many similarity measures based on weighting of presence/presence and absence/absence

Subtract from 1 to create distancesSlide11

Definitions

Present

Absent

Present

a

b

Absent

c

d

Simple Matching Coefficient: (

a+d

)/(

a+b+c+d

)

Jacard’s

Coefficient (asymmetric binary): a/(

a+b+c

)

Phi and Yule’s Q measures of association

ade4 and proxy have many different options for dichotomiesSlide12

Nominal Variables

Similarity can be measured with chi-square based measures

Convert to multiple dichotomies

E.g. Temper: Sand, Silt, Gravel becomes three variables:

TSand

,

TSilt

,

Tgravel

Then use measures for dichotomies/ metric variablesSlide13

Multiple Types

Gower’s Index is the only one that computes a similarity index using variables with different levels of measurement. Take the mean of the variables:

Presence/Absence –

Jaccard

Categorical – 1 if the same, 0 if not

Interval/Ratio/Ranks – absolute difference divided by rangeSlide14

Issues

Weighting – how to weight variables with different variances – standardization, weighting

Correlations between variables – how (and whether) to take correlations into account (

Mahalanobis

Distances)Slide15

Distance Matrix

For simple analyses, dist() in base R provides

euclidean

, maximum,

manhattan

,

canberra

, binary (

Jaccard

), and

minkowski

Other packages including different measures: Many others. See packages ade4,

amap

, cluster, ecodist, labdsv, proxy, and vegan Slide16

# Load

Darl

#

Rcmdr

to create

scatterplot

matrix

> Euclid <- dist(Darl[,2:5])

> Euclid

35-3043 35-2871 35-2866 36-3619 36-3520

35-2871 11.437657

35-2866 5.380520 6.542935

36-3619

14.621217

3.682391 9.570266 36-3520

15.309148

4.068169 10.163661 1.757840

36-3036 7.760155 4.442972 2.495997 7.195832 7.860662

>

scatterplot

(

Width~Length

,

reg.line

=lm, smooth=FALSE,

spread=FALSE,

pch

=16,

id.n

=6,

boxplots

=FALSE,

ellipse=TRUE, grid=FALSE, data=

Darl

)

>

mahalanobis

(

Darl

[,2:3], mean(

Darl

[,2:3]),

cov

=

cov

(

Darl

[,2:3]))

35-3043 35-2871 35-2866 36-3619 36-3520 36-3036

2.2577596 1.8173684 0.4641912 2.9652763 1.7527347 0.7426699Slide17
Slide18

>

install.packages

("

ecodist

")

> library(

ecodist

)

>

Mahal

<- distance(

Darl

[,2:3], method="

mahalanobis")> Mahal

35-3043 35-2871 35-2866 36-3619 36-352035-2871 4.9367446

35-2866 0.6900956 2.8905096

36-3619

8.5903617 7.5849187

4.7250487

36-3520 6.8826044 0.6084649 3.6631704 4.9720621

36-3036 2.4467510 4.8835727 0.8163226 1.9192663 4.3901066Slide19

#

Rcmdr

> .PC <-

princomp

(~

Length+Weight

,

cor

=TRUE, data=

Darl

)

> Darl$PC1 <- .

PC$scores

[,1]> Darl$PC2 <- .PC$scores

[,2]# Typed commands>

PCDist

<- dist(

Darl

[,6:7])

> PCDist

35-3043 35-2871 35-2866 36-3619 36-3520

35-2871 2.5498737

35-2866 2.1968323 1.1918768

36-3619 3.7858013 1.2539806 1.9883494

36-3520 4.2220041 1.8034110 2.1957351 0.7029308

36-3036 2.6677120 0.9201698 0.5717135 1.4339465 1.6290415

>

scatterplot

(PC2~PC1,

reg.line

=FALSE, smooth=FALSE,

spread=FALSE, grid=FALSE,

boxplots

=FALSE,

pch

=16,

ellipse=TRUE,

id.n

=6, span=0.5, data=

Darl

)

[1] "35-3043" "35-2866" "36-3619" "36-3520" "35-2871" "36-3036"Slide20