200500 40 finished by March 27 Introduction Background Partial ResultsDiscussion Acknowledgement Author contribution fundingconflicts References 250500 50 finished by April 5 ID: 622388
Download Presentation The PPT/PDF document "Main Project total points: 500" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Main Project total points: 500
200/500 = 40% finished by March 27
Introduction, Background,
Partial Results/Discussion,
Acknowledgement, Author contribution,
funding/conflicts, References
250/500 = 50% finished by April 5
400/500 = 80% finished by April 17
5
00/500 = 100% finished by April 26Slide2
But you may want to focus on Euclidean distance AFTER normalizing:
From databasics3900.r:> # one way to normalize data
> scaledata2 <- scale(data2)
# scales data so that mean = 0, sd = 1> colMeans(scaledata2)
# faster version of apply(scaled.dat, 2, mean) # shows that mean of each column is 0
Sepal.Length Sepal.Width Petal.Length
Petal.Width -4.480675e-16 2.035409e-16 -2.844947e-17 -3.714621e-17 > apply(scaledata2, 2, sd
) # shows that standard deviation
# of each column is 1 Sepal.Length
Sepal.Width
Petal.Length
Petal.Width 1 1 1 1
-------------------------------------------------------------------------------------------P<- select(tbl_df(scaledata2), Petal.Length
) # Choose filter
m1 <- mapper1D( # Apply mapper distance_matrix = dist
(
data.frame(scaledata2)), filter_values = P, num_intervals = 10, percent_overlap = 50, num_bins_when_clustering = 10)
# save data to current working# directory as a text filewrite.table(scaledata2, "data.txt", sep=" ", row.names = FALSE, col.names = FALSE)
See LABS/ directory for useful R filesSlide3
https://writingcenter.uiowa.edu/#
services
Slide4
Mini-presentations in class:
End of March or beginning of April.
Over anything related to your project.
5 – 10 minutes/person.
Visit speaking center before presentation. Submit summary of visit – what did you learn.Slide5
https://
speakingcenter.uiowa.edu/about-us Slide6
Modified from
http://www.garrreynolds.com/preso-tips/design/1. Keep it
SimpleLots of white space is good: The less clutter you have on your slide, the more powerful your visual message will become.
2. Limit bullet points & text3. Limit
transitions & builds (animation)Only use animations that illustrate a point.
Don’t use unnecessary animations. 4. Use high-quality graphics5. Have a visual theme, but avoid using PowerPoint templates6. Use appropriate charts7. Use color well
8. Choose your fonts welluse the same font set throughout your entire slide presentation, and use no more than two complementary sans-serif fonts (e.g., Arial and Arial Bold)
.
9. Use video or audio when appropriate. 10. Organize your talk: Spend time in the slide sorter (or print out your slides at least 6 to a page).Slide7
7
Photograph, drawing, diagram, or graph
supporting the headline message (no bulleted list)
Call-out(s), if needed:
no more than two lines
In an assertion-evidence slide, the headline is a sentence that succinctly states the slide’s main message
PowerPoint
Template:
http://
writing.engr.psu.edu/AE_template_PSU.ppt
Slide8
http://www.bigdata.uni-frankfurt.de/wp-content/uploads/2015/10/
Evaluating-Ayasdi’s-Topological-Data-Analysis-For-Big-Data_HKim2015.pdf
Slide9
We are not (currently) covering persistent homology including barcodesSlide10Slide11
Icon
Quiz 9 Reading
(
10 points; Due 4/6 at 7:00 AM)over first page:Slide12
Introduction
The purpose of this paper is to introduce a new method
for the qualitative
analysis, simplification and visualization
of high dimensional data sets, as well as the qualitative analysis of
functions on these data sets.Slide13
Introduction
The purpose of this paper is to introduce a new method for the
qualitative
analysis, simplification and visualization
of high dimensional data sets, as well as the qualitative analysis of
functions on these data sets.Slide14
http://
www.nature.com/srep/2013/130207/srep01236/full/srep01236.html
A) Data Set
Example: Point cloud data representing a hand.
B) Function f : Data Set R Example: x-coordinate f : (x, y, z)
x Put data into overlapping bins. Example: f
-1(
ai, bi
) Cluster each bin & create network.
Vertex = a cluster of a bin.
Edge = nonempty intersection between clustersSlide15
Introduction
The purpose of this paper is to introduce a new method
for the qualitative
analysis, simplification and visualization
of high dimensional data sets, as well as the qualitative
analysis of functions on these data sets.Slide16
Different types of data setsSlide17
Creating overlapping binsSlide18
Filter function: eccentricitySlide19
k
nn distance with k = 5, 3 intervals, 50% overlap
20
%
overlapSlide20
k
nn
distance with k = 5
3 intervals, 50% overlap
[
(
)
)
]
(Slide21
k
nn distance with k = 5, 50% overlap3 intervals
5
intervals
10 intervals
100
intervalsSlide22
k
nn distance with k = 50, 50% overlap3 intervals
5
intervals
10 intervals100 intervalsSlide23
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 0
5
intervals, 50% Overlap Slide24
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1
5
intervals, 50% Overlap Slide25
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1
20 intervals, 20% Overlap Slide26
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1
20 intervals, 50% Overlap Slide27
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1
20 intervals, 80% Overlap Slide28
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1
20 intervals, 80% Overlap
--Balanced Slide29
Introduction
The purpose of this paper is to introduce a new method
for the
qualitative
analysis,
simplification and visualization of high dimensional data sets, as well as the
qualitative analysis of functions on
these data sets.
Ex: 1.) f(x) = ||x|| 2.) g(x1, …, x
n-1) = xn
3.) DSGA
decomposition of the original tumor vector
into the Normal component its linear models fit onto the Healthy
StateModel
and the Disease component vector of residuals.http://www.pnas.org
/content/early/2011/04/07/1102826108Slide30
Some quantitative analysis is also possibleSlide31
3.2.2.2 Insight by Ranked Variables
Going back to the Titanic example, the result of the KS-statistic show, that the variable “Sex” is the most strongly related to passengers death. We could generally assume that men conceded the places in lifeboats to women. Furthermore, it is feasible to deduct the subtle reasons of the death of each group. The passengers in group A died because of two reasons: they were man and the cabin class type was low. The passengers in the group B died because they were man. Finally, the passengers in the group C died because they were staying at third class even though most of them were women. Slide32
http://diglib.eg.org/handle/10.2312/SPBG.SPBG07.091-
100
Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object RecognitionSingh, Gurjeet;
Memoli, Facundo; Carlsson, GunnarSlide33Slide34
We propose a method which can be used to
reduce high dimensional data
sets into simplicial complexes
with far fewer points which can capture topological and geometric information at
a specified resolution.Slide35
We propose a method which can be used to reduce high
dimensional data sets into simplicial complexes
with far fewer points
which can capture topological and geometric information at a specified resolution
.Slide36
v
2
e
2
e
1
e
3
v
1
v
3
2-simplex = triangle
=
{v
1
, v
2
, v
3
}
Note that the boundary
of this triangle is the cycle
e
1
+ e
2
+ e
3
= {v1, v2} + {v
2
, v
3
} +
{
v
1
,
v
3
}
1
-simplex = edge
=
{v
1
, v
2
}
Note that the boundary of this edge is v
2
+
v
1
e
v
1
v
2
0-simplex = vertex = vBuilding blocks for a simplicial complexSlide37
Creating a simplicial complex
1
.) Next add
1
-dimensional edges (1-simplices).
Note: These edges must connect two vertices.
I.e., the boundary of an edge is two verticesSlide38
Creating a simplicial complex
1
.) Next add
1
-dimensional edges (1-simplices).
Note: These edges must connect two vertices.
I.e., the boundary of an edge is two verticesSlide39
Creating a simplicial complex
n
.)
A
dd n-dimensional n-simplices, {v
1
, v
2
, …, vn+1}.Boundary of a n-simplex
= a cycle consisting of (n-1)-simplices.Slide40
disk = { x in R
2
: ||x || ≤ 1 }
=
Example: Triangulating the disk.Slide41
Example: Triangulating the circle.
disk = { x in R
2
: ||x || ≤ 1 }
Fist image from http://
openclipart.org
/detail/1000/a-raised-fist-by-
liftarn
Slide42
sphere = { x in R
3
: ||x || = 1 }
Example: Triangulating the sphere.Slide43
http://
mangrovetds.sourceforge.net/documentation/html/
torus_complex.png
Triangulation of a torusSlide44
Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1
20 intervals, 80% Overlap Slide45
http
://www.math.wsu.edu/math/faculty/bkrishna/AbsAlgTopo_PSB2016/Slides_Topology.pdf Slide46
http://
www.nature.com
/srep/2013/130207/srep01236/full/srep01236.html
Create overlapping bins: Slide47
http://
www.nature.com
/srep/2013/130207/srep01236/full/srep01236.html
Create overlapping bins: Slide48
We propose a method which can be used to reduce high
dimensional data sets into simplicial complexes
with far fewer points
which can capture topological and geometric information at
a specified resolution.Resolution means ????
Which choice refers to resolution? Slide49
The idea is to provide
another tool for a generalized notion of coordinatization for high dimensional data sets.
Coordinatization
can of course refer to a choice of real valued coordinate functions on a data set, Slide50
Dimensionality Reduction:
Given dataset D RN
Want: embedding f: D
Rn where n << N
which “preserves” the structure of the data.
Many reduction methods:f1: D R, f2: D
R, … fn: D
R
(f1, f2, … fn): D
Rn
Many are linear,
M: RN R
n, Mx
= y
But there are also non-linear dimensionality reduction algorithms.
USlide51
Example: Principle component
analysis (PCA)
http://en.wikipedia.org/wiki/File:GaussianScatterPCA.pngSlide52
https://en.wikipedia.org/wiki/
Nonlinear_dimensionality_reduction
Slide53
f
1: D R
, f
2: D R, …
fn: D
R(f1, f2, … fn): D R
nSlide54
circle courtesy of
knotplot.com
Goal
f
: D S1 which “preserves” the structure of the data.Slide55
circle courtesy of
knotplot.comSlide56
The idea is to provide
another tool for a generalized notion of coordinatization for high dimensional data sets.
Coordinatization can of course refer
to a choice of real valued coordinate functions on a data set, but other notions of geometric representation (e.g., theReeb graph [Ree46]) are often useful and
reflect interesting information more directly.Slide57
http://
www.nature.com
/srep/2013/130207/srep01236/full/srep01236.html
https://
en.wikipedia.org/wiki/Reeb_graph