/
Main Project total points:  500 Main Project total points:  500

Main Project total points: 500 - PowerPoint Presentation

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
440 views
Uploaded On 2018-01-10

Main Project total points: 500 - PPT Presentation

200500 40 finished by March 27 Introduction Background Partial ResultsDiscussion Acknowledgement Author contribution fundingconflicts References 250500 50 finished by April 5 ID: 622388

distance data sets intervals data distance intervals sets matrix eigenvector dimensional http analysis overlap high www srep01236 qualitative method

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Main Project total points: 500" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Main Project total points: 500

200/500 = 40% finished by March 27

Introduction, Background,

Partial Results/Discussion,

Acknowledgement, Author contribution,

funding/conflicts, References

250/500 = 50% finished by April 5

400/500 = 80% finished by April 17

5

00/500 = 100% finished by April 26Slide2

But you may want to focus on Euclidean distance AFTER normalizing:

From databasics3900.r:> # one way to normalize data

> scaledata2 <- scale(data2)

# scales data so that mean = 0, sd = 1> colMeans(scaledata2)

# faster version of apply(scaled.dat, 2, mean) # shows that mean of each column is 0

Sepal.Length Sepal.Width Petal.Length

Petal.Width -4.480675e-16 2.035409e-16 -2.844947e-17 -3.714621e-17 > apply(scaledata2, 2, sd

) # shows that standard deviation

# of each column is 1 Sepal.Length

Sepal.Width

Petal.Length

Petal.Width 1 1 1 1

-------------------------------------------------------------------------------------------P<- select(tbl_df(scaledata2), Petal.Length

) # Choose filter

m1 <- mapper1D( # Apply mapper distance_matrix = dist

(

data.frame(scaledata2)), filter_values = P, num_intervals = 10, percent_overlap = 50, num_bins_when_clustering = 10)

# save data to current working# directory as a text filewrite.table(scaledata2, "data.txt", sep=" ", row.names = FALSE, col.names = FALSE)

See LABS/ directory for useful R filesSlide3

https://writingcenter.uiowa.edu/#

services

Slide4

Mini-presentations in class:

End of March or beginning of April.

Over anything related to your project.

5 – 10 minutes/person.

Visit speaking center before presentation. Submit summary of visit – what did you learn.Slide5

https://

speakingcenter.uiowa.edu/about-us Slide6

Modified from

http://www.garrreynolds.com/preso-tips/design/1. Keep it

SimpleLots of white space is good: The less clutter you have on your slide, the more powerful your visual message will become.

2. Limit bullet points & text3. Limit

transitions & builds (animation)Only use animations that illustrate a point.

Don’t use unnecessary animations. 4. Use high-quality graphics5. Have a visual theme, but avoid using PowerPoint templates6. Use appropriate charts7. Use color well

8. Choose your fonts welluse the same font set throughout your entire slide presentation, and use no more than two complementary sans-serif fonts (e.g., Arial and Arial Bold)

.

9. Use video or audio when appropriate. 10. Organize your talk: Spend time in the slide sorter (or print out your slides at least 6 to a page).Slide7

7

Photograph, drawing, diagram, or graph

supporting the headline message (no bulleted list)

Call-out(s), if needed:

no more than two lines

In an assertion-evidence slide, the headline is a sentence that succinctly states the slide’s main message

PowerPoint

Template:

http://

writing.engr.psu.edu/AE_template_PSU.ppt

Slide8

http://www.bigdata.uni-frankfurt.de/wp-content/uploads/2015/10/

Evaluating-Ayasdi’s-Topological-Data-Analysis-For-Big-Data_HKim2015.pdf

Slide9

We are not (currently) covering persistent homology including barcodesSlide10
Slide11

Icon

Quiz 9 Reading

(

10 points; Due 4/6 at 7:00 AM)over first page:Slide12

Introduction

The purpose of this paper is to introduce a new method

for the qualitative

analysis, simplification and visualization

of high dimensional data sets, as well as the qualitative analysis of

functions on these data sets.Slide13

Introduction

The purpose of this paper is to introduce a new method for the

qualitative

analysis, simplification and visualization

of high dimensional data sets, as well as the qualitative analysis of

functions on these data sets.Slide14

http://

www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

A) Data Set

Example: Point cloud data representing a hand.

B) Function f : Data Set  R Example: x-coordinate f : (x, y, z)

 x Put data into overlapping bins. Example: f

-1(

ai, bi

) Cluster each bin & create network.

Vertex = a cluster of a bin.

Edge = nonempty intersection between clustersSlide15

Introduction

The purpose of this paper is to introduce a new method

for the qualitative

analysis, simplification and visualization

of high dimensional data sets, as well as the qualitative

analysis of functions on these data sets.Slide16

Different types of data setsSlide17

Creating overlapping binsSlide18

Filter function: eccentricitySlide19

k

nn distance with k = 5, 3 intervals, 50% overlap

20

%

overlapSlide20

k

nn

distance with k = 5

3 intervals, 50% overlap

[

(

)

)

]

(Slide21

k

nn distance with k = 5, 50% overlap3 intervals

5

intervals

10 intervals

100

intervalsSlide22

k

nn distance with k = 50, 50% overlap3 intervals

5

intervals

10 intervals100 intervalsSlide23

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 0

5

intervals, 50% Overlap Slide24

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 1

5

intervals, 50% Overlap Slide25

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 1

20 intervals, 20% Overlap Slide26

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 1

20 intervals, 50% Overlap Slide27

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 1

20 intervals, 80% Overlap Slide28

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 1

20 intervals, 80% Overlap

--Balanced Slide29

Introduction

The purpose of this paper is to introduce a new method

for the

qualitative

analysis,

simplification and visualization of high dimensional data sets, as well as the

qualitative analysis of functions on

these data sets.

Ex: 1.) f(x) = ||x|| 2.) g(x1, …, x

n-1) = xn

3.) DSGA

decomposition of the original tumor vector

into the Normal component its linear models fit onto the Healthy

StateModel

and the Disease component vector of residuals.http://www.pnas.org

/content/early/2011/04/07/1102826108Slide30

Some quantitative analysis is also possibleSlide31

3.2.2.2 Insight by Ranked Variables

Going back to the Titanic example, the result of the KS-statistic show, that the variable “Sex” is the most strongly related to passengers death. We could generally assume that men conceded the places in lifeboats to women. Furthermore, it is feasible to deduct the subtle reasons of the death of each group. The passengers in group A died because of two reasons: they were man and the cabin class type was low. The passengers in the group B died because they were man. Finally, the passengers in the group C died because they were staying at third class even though most of them were women. Slide32

http://diglib.eg.org/handle/10.2312/SPBG.SPBG07.091-

100

Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object RecognitionSingh, Gurjeet;

Memoli, Facundo; Carlsson, GunnarSlide33
Slide34

We propose a method which can be used to

reduce high dimensional data

sets into simplicial complexes

with far fewer points which can capture topological and geometric information at

a specified resolution.Slide35

We propose a method which can be used to reduce high

dimensional data sets into simplicial complexes

with far fewer points

which can capture topological and geometric information at a specified resolution

.Slide36

v

2

e

2

e

1

e

3

v

1

v

3

2-simplex = triangle

=

{v

1

, v

2

, v

3

}

Note that the boundary

of this triangle is the cycle

e

1

+ e

2

+ e

3

= {v1, v2} + {v

2

, v

3

} +

{

v

1

,

v

3

}

1

-simplex = edge

=

{v

1

, v

2

}

Note that the boundary of this edge is v

2

+

v

1

e

v

1

v

2

0-simplex = vertex = vBuilding blocks for a simplicial complexSlide37

Creating a simplicial complex

1

.) Next add

1

-dimensional edges (1-simplices).

Note: These edges must connect two vertices.

I.e., the boundary of an edge is two verticesSlide38

Creating a simplicial complex

1

.) Next add

1

-dimensional edges (1-simplices).

Note: These edges must connect two vertices.

I.e., the boundary of an edge is two verticesSlide39

Creating a simplicial complex

n

.)

A

dd n-dimensional n-simplices, {v

1

, v

2

, …, vn+1}.Boundary of a n-simplex

= a cycle consisting of (n-1)-simplices.Slide40

disk = { x in R

2

: ||x || ≤ 1 }

=

Example: Triangulating the disk.Slide41

Example: Triangulating the circle.

disk = { x in R

2

: ||x || ≤ 1 }

Fist image from http://

openclipart.org

/detail/1000/a-raised-fist-by-

liftarn

Slide42

sphere = { x in R

3

: ||x || = 1 }

Example: Triangulating the sphere.Slide43

http://

mangrovetds.sourceforge.net/documentation/html/

torus_complex.png

Triangulation of a torusSlide44

Distance Matrix Eigenvector, Mean Centered Distance Matrix

Order of eigenvector: 1

20 intervals, 80% Overlap Slide45

http

://www.math.wsu.edu/math/faculty/bkrishna/AbsAlgTopo_PSB2016/Slides_Topology.pdf Slide46

http://

www.nature.com

/srep/2013/130207/srep01236/full/srep01236.html

Create overlapping bins: Slide47

http://

www.nature.com

/srep/2013/130207/srep01236/full/srep01236.html

Create overlapping bins: Slide48

We propose a method which can be used to reduce high

dimensional data sets into simplicial complexes

with far fewer points

which can capture topological and geometric information at

a specified resolution.Resolution means ????

Which choice refers to resolution? Slide49

The idea is to provide

another tool for a generalized notion of coordinatization for high dimensional data sets.

Coordinatization

can of course refer to a choice of real valued coordinate functions on a data set, Slide50

Dimensionality Reduction:

Given dataset D RN

Want: embedding f: D

 Rn where n << N

which “preserves” the structure of the data.

Many reduction methods:f1: D  R, f2: D

 R, … fn: D 

R

(f1, f2, … fn): D

 Rn

Many are linear,

M: RN  R

n, Mx

= y

But there are also non-linear dimensionality reduction algorithms.

USlide51

Example: Principle component

analysis (PCA)

http://en.wikipedia.org/wiki/File:GaussianScatterPCA.pngSlide52

https://en.wikipedia.org/wiki/

Nonlinear_dimensionality_reduction

Slide53

f

1: D  R

, f

2: D  R, …

fn: D 

R(f1, f2, … fn): D  R

nSlide54

circle courtesy of

knotplot.com

Goal

f

: D  S1 which “preserves” the structure of the data.Slide55

circle courtesy of

knotplot.comSlide56

The idea is to provide

another tool for a generalized notion of coordinatization for high dimensional data sets.

Coordinatization can of course refer

to a choice of real valued coordinate functions on a data set, but other notions of geometric representation (e.g., theReeb graph [Ree46]) are often useful and

reflect interesting information more directly.Slide57

http://

www.nature.com

/srep/2013/130207/srep01236/full/srep01236.html

https://

en.wikipedia.org/wiki/Reeb_graph