Imperial College London Page 1 httpknowescapeorg Identification location and temporal evolution of topics meeting Part of COST Action TD1210 KNOWeSCAPE Budapest August 2930 2016 ID: 783959
Download The PPT/PDF document "The Location of Papers in Topic Space-Ti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Location of Papers in Topic Space-Time
© Imperial College London
Page
1
http://knowescape.org/“Identification, location and temporal evolution of topics” meeting, Part of COST Action TD1210 KNOWeSCAPEBudapest August 29-30, 2016.
Work done with:
James CLOUGH
pronounced
CLUFF
Tim Evans
Centre for Complexity Science
Social and Cultural Analytics Lab
Slide2© Imperial College London
Page 2
Time & Networks
Netometry – Networks & Geometry
DimensionCoordinate AssignmentSummary
Slide3Networks
Network = Vertices + Edges© Imperial College London
Page 3
Slide4Timed Edges = Temporal Edge Networks
© Imperial College LondonPage
4
Communication Networks
Email
Phone
Letters
Focus of much recent work
[
Holme & Saramäki 2012,
Lambiotte & Masuda 2016
]
Slide5Timed Vertices = Temporal Vertex Networks
© Imperial College London
Page 5
Each vertex represents one event occurring at one time
Time constrains flowsDirected edges point in one direction onlyNo loops
Slide6Timed Vertices = Temporal Vertex Networks
© Imperial College London
Page 6
Each vertex represents one event occurring at one time
Directed Acyclic Graph (DAG) Computer Science
Poset (
partially ordered set)
Discrete Mathematics
i
j
Applications - Flows
Flow of Ideas & Innovations: Citation Networkspapers, patents, court judgements
Flow of Projectsmanagement of tasks, critical pathsFlow of Mathematical Logic
spreadsheet formulaeFlow of Space-Time CausalityCausal Set approach to quantum gravity
© Imperial College LondonPage 7
Slide8My Thesis
© Imperial College London
Page
8
Constrained Network
New Network Methods
Slide9Causality and Networks
© Imperial College London
Page
9Time Constraint on Vertices
New Methods for
Temporal Vertex Networks
Transitive Reduction
Clough, Gollings, Loach, Evans 2015
Longest Path
DimensionGeometry
Slide10© Imperial College London
Page 10
Time & Networks
Netometry – Networks & GeometryDimension
Coordinate AssignmentSummary
Slide11Networks and Geometry
Standard tools for data and network analysis use Euclidean space where distance
L
is e.g.Wireless Networks, RGG etc.
Network VisualisationsDimensional Reduction of DataMDS (Multidimensional Scaling)PCA (Principal Component Analysis)© Imperial College LondonPage 11
Karate Club
[Evans 2010]
Slide12MDS Clustering
Standard clustering tool using usual Euclidean distance measures
© Imperial College London
Page 12
KDD cuparXiv hep-th 1992-2003Abstract text similarity and MDSColours:
8 k-means clusters
Slide13Clusters vs Coordinates
No visually distinct clusters suggests continuous coordinates better than discrete categories
© Imperial College London
Page 13
KDD cuparXiv hep-th 1992-2003Abstract text similarity and MDSColours:
8 k-means clusters
Slide14Networks and Geometry
Recent work goes beyond simple Euclidean space
© Imperial College London
Page
14
“
Flat” Euclidean
Standard MDS
Sphere
e.g. clustering population, air travel
Hyperbolic
e.g. Internet Backbone
[Boguna
,
Krioukov
,
Claffy
, 2009]
Networks, Geometry and Time
Very little clustering work includes time
© Imperial College London
Page
15Riemannian Spaces
(curvature 0,+1,-1)
Space-times
de Sitter
Anti-de Sitter
Minkowski
Non-Riemannian Spaces
(curvature 0,+1,-1)
Minkowskii Space
My approach is to use simplest space-timeMinkowski space in
D = (d+1) space-time dimensions© Imperial College London
Page 16
[Clough & Evans,
Physica
A 2016 (
arXiv:1408.1274
);
ISSI 2015 Proceedings,
arXiv:1507.01388
;
arXiv:1602.03103
]
Slide17Minkowski
Space
Measure spatial distance in terms of time to reach that point
c.f. light years
Information flows into forward light cone only
© Imperial College London
Page
17
Future
time-like
region
Space-like
region
Past
time-like
region
y
x
Space-like
region
TIME
SPACE
Slide18Causal Set - Null Model
PPP in D-dimensional space
Take one dimension as time
Add edge from point
x
to and
y if and time-like separation
© Imperial College LondonPage
18
> 0
Future
Space-
Like
Past
Standard model used in
Discrete maths e.g.
Bollobás
&
Brightwell
1991
Causal set quantum gravity e.g. see
Dowker
2006
Space-
Like
y
x
Transitively
complete
Slide19© Imperial College London
Page 19
Time & Networks
Netometry – Networks & Geometry
DimensionCoordinate AssignmentSummary
Slide20Dimension for Social DAG Spacetime
Formulae used derived for Causal Set Null Model= PPP (Poisson Point Process) in Minowkski
spacee.g. volume ~ (length scale)
DimensionDo they work for social DAG networks?© Imperial College London
Page 20
Slide21Comparison of Data Sources
© Imperial College London
Page
21
MM DimensionMM DimensionN
# nodes in interval
N # nodes in interval
# runs
String Theoryhep-th
arXiv
Particle Phenonemology
hep-ph arXiv
D=2
D=3
[Clough & Evans 2016,
arXiv:1408.1274
]
Slide22Dimensions
Data
Dimension
hep-th (String Theory)2hep-
ph (Particle Physics)3quant-ph (quantum physics)3astro-ph (astrophysics)3.5
US Patents
>4US Supreme Court Judgments
3 (short times), 2 (long times)
© Imperial College London
Page 22
String theory appears to be a narrow field
[Clough & Evans 2016,arXiv:1408.1274
]
Slide23© Imperial College London
Page 23
Time & Networks
Netometry – Networks & Geometry
DimensionCoordinate AssignmentSummary
Slide24Lorentzian MDS
Create Lorentzian distance matrix Length of longest path = timelike separationspace-like distances from longest path between first common neighbour in
forward and backwards light cones.
Not unique for D>2 but seems to work for reasonable N© Imperial College London
Page 24
space
time
arXiv:1602.03103
[Clough & Evans, 2016]
Slide25Lorentzian MDS
Lorentzian distance matrix largest negative eigenvalue eigenvector =
time direction (D-1) large positive eigenvalues
eigenvectors = space directions© Imperial College London
Page 25arXiv:1602.03103[Clough & Evans, 2016]
Slide26DAG Null Model Distance Matrix Eigenvalues
© Imperial College London
Page
26
time
time
time
time
2 space
no space
1 space
3 space
Eigenvalue Rank
PPP in
Minkowski
Spacetime
Slide27Lorentzian
MDS works!
© Imperial College LondonPage
27Original Causal Set
1+1 EmbeddingLMDS coordinates for 1+1 dimensionsSubset 20 points of 200 shown for clarity
Slide28Real Citation Networks© Imperial College London
Page 28
Embed (
ie. find coordinates) for papers in citation networks fromarXiv hep-th - 91% success
arXiv hep-ph - 88% successThese citation networks are more embeddable in Minkowski spacetime than you would otherwise expect
Slide29arXiv
hep-th 1992-2003
© Imperial College London
Page 29
arXiv:1602.03103[Clough & Evans, 2016]Based on top 200 most cited hep-th papers.node size = citations
Slide30arXiv hep-th 1992-2003
© Imperial College London
Page
302004
1990Publication Date
Slide31arXiv
hep-th 1992-2003
© Imperial College London
Page 31
Colours represent 8 k-means clusters based on abstract word similarity
Slide32© Imperial College London
Page 32
Time & Networks
Causally Invariant MeasuresTransitive Reduction & Innovation
Longest Path & CentralityNetometry – Networks & GeometryDimensionCoordinate AssignmentSummary
Slide33ConclusionsTemporal Networks come in two types
time on vertices or time on edges Constraints on networks require new tools in network analysis.
Transitive Reduction, Dimension, Longest PathGeometryGeneralised MDS to Lorentzian Spacetime
ReliableMight help to cluster papers© Imperial College London
Page 33Tim Evans, Imperial College Londonhttp://netplexity.org
Work done with:
James Clough, Jamie
Gollings, Tamar Loach, Sophia Goldberg, Hannah Anthony, Joana Teixeira, Chon Lei
Slide34Bibliography
Albert, M. H. & Frieze, A. M., Random graph orders, Order, 1989, 6, 19-30.Bombelli, L., Ph.D. thesis, Syracuse University, 1987.Brightwell
, G. & Gregory, R. Structure of random discrete spacetime, Phys. Rev. Lett.,
1991, 66, 260-263Bruggemann, R.; Halfon, E.; Welzl, G.; Voigt, K. & Steinberg, C. Applying the Concept of Partially Ordered Sets on the Ranking of Near-Shore Sediments by a Battery of Tests J. Chem. Inf. Model., American Chemical Society (ACS), 2001, 41, 918-925.
Bruggemann, R., M¨unzer, B. and Halfon, E., An algebraic/graphical tool to compare ecosystems with respect to their pollution – the German river “Elbe” as an example - I: Hasse-diagrams, Chemosphere, 28 (1994) 863–872.Clough, J. R.; Gollings, J.; Loach, T.V. & Evans, T.S., Transitive reduction of citation networks, J.Complex Networks, 2015, 3, 189-203 [10.1093/
comnet/cnu039 arXiv1310.8224]Clough, J.R. & Evans, T.S. What is the dimension of citation space?
Physica A 2016 448, 235-247 [10.1016/j.physa.2015.12.053
arXiv:1408.1274 ]Clough, J.R. & Evans, T.S. Time and Citation Networks in "Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics
and Informetrics Conference, Istanbul, Turkey, 29 June to 3 July, 2015", ISBN 978-975-518-381-7; ISSN 2175-1935 [arXiv:1507.01388]Clough, J.R. & Evans, T.S. Embedding Graphs in Lorentzian
Spacetime, arXiv:1602.03103Evans, T.S., Complex Networks, Contemporary Physics,
2004, 45, 455-474 [10.1080/00107510412331283531 arXiv:cond-mat
/0405123]Evans, T.S., Clique Graphs and Overlapping Communities, J.Stat.Mech, 2010
, P12037 [10.1088/1742-5468/2010/12/P12037 arXiv:1009.0638]Evans, T.S. & Lambiotte
, R., Line Graphs, Link Partitions and Overlapping Communities, Phys.Rev.E, 2009, 80, 016105 [
10.1103/PhysRevE.80.016105 arXiv:0903.2181]Evans, T.S. & Lambiotte, R., Line Graphs of Weighted Networks for Overlapping Communities,
Eur.Phys.J. B 2010, 77, 265–272 [10.1140/
epjb/e2010-00261-8 arXiv:0912.4389]Expert, P.; Evans, T. S.;
Blondel, V. D. & Lambiotte, R. Uncovering space-independent communities in spatial networks,
PNAS, 2011, 108, 7663-7668 10.1073/pnas.1018962108 [10.1073/pnas.1018962108 arXiv:1012.3409]Garfield, E.; Sher, I. H. & Torpie, R. J., The use of citation data in writing the history of science, DTIC Document, 1964.
Holme, P. & Saramäki, J., Temporal Networks, Physics Reports, 2012, 519, 97-125.Hummon, N. P. & Dereian
, P., Connectivity in a citation network: The development of DNA theory, Social Networks,
1989
, 11, 39-63.
Masuda, N & Lambiotte, R, 2016. A Guide To Temporal Networks, World Scientific, ISBN 9781786341143.
Myrheim
, J.. Statistical geometry, 1978. Technical report, CERN preprint TH-2538, 1978.
Meyer, D.A. The Dimension of Causal Sets. PhD thesis, MIT, 1988.
Meyer, D.A. Dimension of causal sets, 2006.
Reid, D. D. Manifold dimension of a causal set: Tests in
conformally
flat
spacetimes
, Phys. Rev. D
2003
, 67
, 024034.
Winkler, P., Random orders, Order, 1985, 1, 317-331
© Imperial College London
Page
34
Tim Evans
Centre for Complexity Science
http://netplexity.org
Slide35© Imperial College London
Page
35
Slide36© Imperial College London
Page 36
Time & Networks
Causally Invariant MeasuresTransitive Reduction
& InnovationLongest Path & CentralityNetometry – Networks & GeometryDimensionCoordinate AssignmentSummary
Slide37Transitive Reduction (
Hasse Diagrams)
Remove all edges not needed for causal links
= Remove links provided all pairs of connected points remain connected= Remove links implied by transitivity
Uniquely defined because of causal structure© Imperial College LondonPage
37
Slide38Transitive Completion
Add all edges implied by causal links
= Add edge if there is a path between a pair of point
= Edge for any vertices with partial orderUniquely defined because
of causal structure© Imperial College LondonPage 38
Slide39Real DAG data
Edges represent some interaction e.g. citationsNot simply a causal relationship
© Imperial College London
Page 39
Transitively
Reduced
Real
Network
Transitively
Complete
0.0
0.5
1.0
(
Hasse
Diagram)
Slide40Two Types of MeasurementCausally Invariant Measures
Only see causal structure Not sensitive to fraction of TR TCe.g. Longest Path
Sensitive to TR TC completion Fractione.g. degree = citation counts
© Imperial College London
Page 40
Slide41© Imperial College London
Page 41
Time & Networks
Causally Invariant MeasuresTransitive Reduction & Innovation
Longest Path & CentralityNetometry – Networks & GeometryDimensionCoordinate AssignmentSummary
Slide42Dimension
Assume an embedding of DAGin Minkowskii space of D dimensions
Information flows into forward light cone only
Find Dimension D
© Imperial College LondonPage 42
Future
time
like
Space-
Like
Past
time
like
Space-
Like
y
x
Slide43Causal Set Model
PPP in D-dimensional space
Take one dimension as time
Add edge from point
x
to and y
if and time-like separation
© Imperial College LondonPage
43
> 0
Future
Space-
Like
Past
Standard model used in
Discrete maths e.g.
Bollobás
&
Brightwell
1991
Causal set quantum gravity e.g. see
Dowker
2006
Space-
Like
y
x
Transitively
complete
Slide44Longest Path and Causal SetsLongest Path is the best approximation to the space-time geodesics in the Causal Set Models of
Minkowskii space [Brightwell
& Gregory, 1991]Also conjectured to be true of Causal Sets for general (curved) space-times
© Imperial College LondonPage
44
Slide45Longest Path in Causal Sets
Longest Path (9)
Geodesic
Shortest
Path (4)
Edges of Light
Cones
© Imperial College London
Page
45
(Poisson Point Process in
1+1 dimension
Minkowskii
space)
Slide46Box Counting Direct
Choose an interval at randomInterval = set of points on path from given
source to specified sink nodes, sink/source
chosen uniformly at random from vertex setMeasure Volume = N number of points in interval
(N>200 best)Measure length of interval = L longest path between source and sinkRepeat many timesFit to to find D and m
© Imperial College London
Page 46
[Lei, Teixeira, Clough, Evans, 2016 unpublished]
source
sink
Slide47Longest Path L Scaling for Large N
Bollobás &
Brightwell (1991)show that for Minkowski
(Cone) space
© Imperial College London
Page
47
Greedy Path case
Finite Size
effects unknown?
Slide48Silly Example
N=19, L=8N=6, L=4N=4, L=4Fit (excel!) to find
© Imperial College London
Page
48
Causal Set in 1+1
Minkowski
space,
PPP + connect
timelike
pairs
Slide49Midpoint Method (box counting)
Choose an interval
random pair of points and all N
nodes lying on paths between the 1st two pointsFind midpoint such that two sub-intervals are roughly equal size
Then
© Imperial College London
Page
49
[
Bombelli
1988, Reid 2003]
N
1
=6
N
2
=4
N=18
D = 1.61
2.16
Causal Set in
1+1
Minkowski
space,
PPP + connect
timelike
pairs
Slide50Myrheim-Meyer Dimension Estimator
Number of causally connected pairs
in an interval with
N nodes in a D dimensional causal set is
Assumes
Minkowskii
space.
Causally invariant (works if not a transitively complete network).
© Imperial College London
Page
50
[
Myrheim
1978, Meyer 1988, Reid 2003]
Slide51Example of Myrheim-Meyer Method
N=4 internal pointsS
2=4 causally connected pairs
D=2.0
© Imperial College LondonPage 51
Slide52Degree after Transitive Reduction
Not an effective measure as the dependence on dimension is too weakResults too noisy
N.B. in 2D can find whole degree distribution
which is Poisson in
ln
(N)
plus corrections
© Imperial College London
Page 52
[Bombelli et al 1987;
Eichorn & Mizera 2014; Clough & TSE 2016]
[Clough & TSE 2016]
Slide53Dimension Measures for Social DAGResults derived for Casual sets
= PPP in Minowkski spaceDo they work for social DAG networks?
© Imperial College London
Page 53
Slide54Comparison of Dimension Measures
© Imperial College London
Page
54
MM DimensionMidpoint DimensionN
# nodes in interval
N # nodes in interval
# runs / 5000
hep-th
arXiv
[Clough & Evans 2016,arXiv:1408.1274 ]
Slide55Comparison of Data Sources
© Imperial College London
Page
55
MM DimensionMM DimensionN
# nodes in interval
N # nodes in interval
# runs
String Theoryhep-th
arXiv
Particle Phenonemology
hep-ph arXiv
D=2
D=3
[Clough & Evans 2016,
arXiv:1408.1274
]
Slide56Dimensions
Data
Dimension
hep-th (String Theory)2hep-
ph (Particle Physics)3quant-ph (quantum physics)3astro-ph (astrophysics)3.5
US Patents
>4US Supreme Court Judgments
3 (short times), 2 (long times)
© Imperial College London
Page 56
String theory appears to be a narrow field
[Clough & Evans 2016,arXiv:1408.1274
]
Slide57© Imperial College London
Page 57
Time & Networks
Causally Invariant MeasuresTransitive Reduction & Innovation
Longest Path & CentralityNetometry – Networks & GeometryDimensionCoordinate AssignmentSummary
Slide58Remaining links are a
“lower bound” on the necessary links
Transitive Reduction & Innovation
Conjecture:
Transitive Reduction removes unnecessary and indirect influences on innovation trail© Imperial College LondonPage 58
Links left after transitive
reduction are
sufficient
for innovation
Slide59Citation Counts
Citations of academic papers sometimesmade for poor reasons:Cite your own paperCite a standard paper
just copy citation from another paper? (80%? [
Simkin and Roychowdhury 2003;
Goldberg, Anthony, TSE 2015])© Imperial College LondonPage 59
New Paper
Recent
Paper
Standard
Paper
Transitive Reduction removes poor citations
may also remove some useful ones …
Slide60Citation Counts
Conjecture:Transitive Reduction removes poor citations
Assess papers with ``reduced degree’’ = degree after transitive reduction - a
causally invariant measure© Imperial College London
Page 60
New Paper
Recent
Paper
Standard
Paper
Suggestion:
Reduced Degree < Paper Quality < Citation Count
Slide61Degree Distribution before and after
Transitive Reduction –
arXiv:hep-th
© Imperial College London
Page 61After TR
Citation count
Frequency
Lose 80% of edges
hep-ph similar,
as Simkin &Roychowdhury
,Goldberg et al.
Before TR
[Clough et al. 2015,
arXiv1310.8224
]
Slide62Degree Distribution before and after
Transitive Reduction – US Supreme Court
© Imperial College London
Page 62
After TR# Citations
Frequency
Lose 73% of edgesSimilar to hep-
th
Before TR
[Clough et al. 2015,
arXiv1310.8224
]
Slide63Degree Distribution before and after
Transitive Reduction – US Patents
© Imperial College London
Page 63
Before TR# Citations
Frequency
Lose 15% of edges
Very different to arXiv and court judgments
After TR
[Clough et al. 2015,
arXiv1310.8224
]
Slide64arXiv hep-th repository
© Imperial College London
Page
64
Citation count before TRCitation count after TR
equalWinner (806
77)
Loser
(1641
3)
[Clough et al. 2015,
arXiv1310.8224
]
Slide65Transitive Reduction and Citation Networks
Shows key differences between citation networks from different fields New test for models
Finds large differences in “reduced degree” between papers of similar citation counts Alternative recommendation system
© Imperial College London
Page 65[Clough et al. 2015,arXiv1310.8224]
Slide66© Imperial College London
Page 66
Time & Networks
Causally Invariant MeasuresTransitive Reduction & Innovation
Longest Path & CentralityNetometry – Networks & GeometryDimensionCoordinate AssignmentSummary
Slide67Path LengthThe
length of a path is the number of links on that path.© Imperial College London
Page 67
Path Length
6
Slide68Distance Between VerticesThe
distance between vertices may be set equal to the length of the shortest path.
© Imperial College LondonPage
68
Distance between vertices = 4
Slide69Longest Path
Longest paths have no meaning for most networks as such paths typically visit a large fraction of network© Imperial College London
Page 69
Longest Path
Length 9
Slide70Time and Longest Path© Imperial College London
Page 70
Longest Path Length 5
TIME
Longest paths are well defined when a time is defined
- you never go backwards
Slide71Longest Path and DAG
Longest Paths are unchanged by transitive reduction
They only use links which are essential for the causal structure
Causally Invariant© Imperial College London
Page 71
TIME
Slide72© Imperial College London
Page 72
Time & Networks
Causally Invariant MeasuresTransitive Reduction & Innovation
Longest Path & CentralityNetometry – Networks & GeometryDimensionCoordinate AssignmentSummary
Slide73Centrality
Network centrality measures try to describe the importance of a network. e.g. degree, (shortest path) betweenness
, …© Imperial College London
Page 73
Size
DegreeDarker = Higher
Betweenness
Slide74Centrality Measures for DAGsSeems that we should use
longest path not shortest path for DAGs© Imperial College London
Page 74
Slide75Small Example: DNA Citation Network40 key events (mostly single papers) in the development of the theory of DNA
Links are both direct citations and citations via an intermediary paper with a common author.[Asimov 1962; Garfield,
Sher, Torpie,1964; Hummon
& Dereian 1989]
© Imperial College LondonPage 75
Slide76DNA Citation Network© Imperial College London
Page 76
TIME
[Fig 1,
Hummon
& Dereian
1989]
Miescher 1871
Watson &
Crick ‘53
27
Nobel
32
Ochoa ‘55
Slide77Shortest Path
Betweenness for DNA Network
© Imperial College London
Page 77
Size = Shortest Path BetweennessColour = Longest Path Betweenness
Slide78Longest Path
Betweenness for DNA Network
© Imperial College London
Page 78
Size = Longest Path BetweennessColour = ditto
Slide79Centrality for DAGsLongest Path Centrality shows promise
Experimenting with other approachesComparing with “main path analysis” of bibliometrics field [
Hummon & Dereian 1989]
© Imperial College London
Page 79
Slide80Rankings & Cube Space
© Imperial College London
Page 80
Slide81Cube Box Space
Each point has
D
coordinates
Connect from
x
to
y
iff
© Imperial College London
Page
81
D=2
[
Bollobás
&
Brightwell
1991]
Slide82Box Space Representation of Journal Rankings
© Imperial College London
Page
82Node = JournalDirected Edge
if journal has better IF, EF & CAHeight = minimum longest path distance to root Top 1000journals usedTop 40 shown
worse
Hasse
diagram technique
[Brüggeman et al. 1994]
Slide83DAG Models
© Imperial College London
Page 83
Slide84DAG ModelsCausal Set Models
PPP in space with at least one time directionMinkowski (Cone) space Cube Space (useful for multiple rankings)
[e.g. Bollobas & Brightwell
1991, Clough et al 2015, Clough & TSE 2016]Growing Network Models - Growth = Time[e.g. Price 1965, Barabasi-Albert 1999, Chung-Lu 2000, …]
More realistic citation models, see Goldberg, Anthony, TSE 2015 and refs thereinAny Network Model + Vertex OrderRandom Partial Orders e.g. Winkler 1985, Albert & Frieze 1989© Imperial College London
Page 84
Slide85Spatial Coordinates
Similarities to network layout problems
However our work shows data needs
D>2
We exploit the space-time structure for speed© Imperial College LondonPage 85
Slide86Optimisation Approach
Find coordinates which minimise
where
if edge from
i
to j (future
timelike)
if edge from j to I (past timelike)
if no edge from j to
i (spacelike)
© Imperial College London
Page 86
Slide87© Imperial College London
Page 87
Time & Networks
Transitive ReductionDimension
SummaryLongest PathGeometryFurther Material
Slide88Basics Statistics Before and After Transitive Reduction
Network
hep-
th
US patents
US Supreme Court
before/after TR
before
after
before
after
before
after
# nodes
27383
27383
3764094
3764094
25376
25376
#
edges
351237
62257
16510997
13996169
216198
59032
Clustering
0.249
0
0.0757
0
0.163
0
Mean in-degree
12.82
2.27
4.39
3.71
8.52
2.33
Median in-degree
4
2
2
2
5
2
1st Q in-degree
1
1
0
0
0
0
3rd Q in-degree
12
3
6
6
11
3
Gini
coef
0.729
0.481
0.684
0.67
0.62
0.51
© Imperial College London
Page
88
Slide89Degree Distribution, Null Models and TR
© Imperial College LondonPage
89
Slide90Degree by year of Publication – hep-th
© Imperial College London
Page 90
Slide91Degree by year of Publication – US Patents
© Imperial College LondonPage
91
Slide92Degree by year of Publication – US Supreme Court
© Imperial College LondonPage
92
Slide93Midpoint Dimension for US Supreme Court
© Imperial College LondonPage
93
M.-M.
dimensionsimilar,largererrors
Slide94US Patents, Myrheim-Meyer Dimension
© Imperial College London
Page 94
Midpoint
similar but slightly worse
Slide95# 2-Chains S
2
Cone vs Cube Space Models
© Imperial College London
Page 95Dimension DIdentical
at D=2 as expectedCube space
gives larger dimension
estimates for D>2,
about 7% bigger if D~3
Slide96© Imperial College London
Page 96
Time & Networks
Transitive ReductionDimension
Longest PathFinding CoordinatesGeometrySummary
Slide97Finding Coordinates
© Imperial College London
Page 97
Slide98© Imperial College London
Page 98
Old Material
Slide99Timed Vertices + Edges
Temporal Edge
Networkse.g. Communication Networks – Email, phone, letters
as in rev Holme & Saramäki,
Temporal Networks 2012© Imperial College LondonPage 99
Slide100Timed Edges = Temporal Edge Networks
© Imperial College LondonPage
100
Communication Networks
Email
Phone
Letters
Focus of much recent work
[
Holme &
Saramäki
2012
]
Slide101Vertices + Timed Edges
Temporal Vertex
NetworksEach vertex represents an event happening on one time
InnovationsPatentsResearch PapersCourt Judgments
Citation Networks© Imperial College LondonPage 101
Slide102Timed Vertices = Temporal Vertex Networks
Each vertex represents an event occurring at one time
Time constrains flows
Edges point in one direction only
No Loops© Imperial College LondonPage 102
time
Slide103Key Properties of Temporal Vertex Networks
Directede.g. citation from newer
to older paper
or reverse if preferAcycliccan not cite a newer paper
Directed Acyclic Graph = DAG© Imperial College LondonPage 103
time
Slide104Key Properties of Temporal Vertex Networks
Edges of a DAG define a
partial order of the set of vertices- a
poset
There are many ways order to vertices(total or linear orders)© Imperial College LondonPage 104
A
C
D
B
E
A
B
D
C
E
Slide105DAGS and Flows
DAGs represent many different types of Flowse.g.
B,C A
Flow of Ideas – Innovations, Citation Networkspapers, patents, court judgementsFlow of ProjectsManagement of Tasks, Critical paths
Flow of Mathematical LogicSpreadsheet formulae© Imperial College LondonPage 105
A
C
D
B
E
Slide106Innovations
Each vertex represents a discoveryPatentAcademic PaperLaw Judgment
Information is now copied from
one node to all connected eventsin the future
Different process from a random walkSimilar to epidemics but different network© Imperial College LondonPage 106
Slide107Applications
Flow of Ideas: Innovations, Citation Networkspapers, patents, court judgements
Flow of Projectsmanagement of tasks, critical pathsFlow of Mathematical Logic
spreadsheet formulaeFlow of Space-Time CausalityCausal Set approach to quantum gravity
© Imperial College LondonPage 107Spread like epidemics but very different network
Slide108Timed Vertices = Temporal Vertex Networks
© Imperial College London
Page 108
Each vertex represents one event occurring at one time
Time constrains flows
Directed edges point in
one direction only
No loops
Directed Acyclic Graph (DAG)
Poset
(partially ordered set)
Discrete Mathematics
Slide109Applications
Flow of Ideas & Innovations: Citation Networks
papers, patents, court judgementsFlow of Projectsmanagement of tasks, critical pathsFlow of Mathematical Logic
spreadsheet formulaeFlow of Space-Time CausalityCausal Set approach to quantum gravity
© Imperial College LondonPage 109
Slide110Temporal Edge/Vertex Network Relationship
Types related by
Line Graph transformation
© Imperial College LondonPage
110
Slide111Temporal Vertex -> Edge Network
Directed Line Graph transformation© Imperial College London
Page 111
A
C
D
B
E
1
1
2
3
4
5
6
7
2
3
4
5
6
7
Slide112Temporal Edge -> Vertex Network
(Directed) Line Graph transformation© Imperial College London
Page 112
3
2
4
1
B
C
D
A
D
C
B
E
1
1
1
2
3
4
C
D
Time of email
Email
Chain
Email
Person
Person
Slide113Temporal Spread vs. Temporal Events
Spreading on a geographical network is very different from temporal events network© Imperial College London
Page 113
1
2
1
1
0
y
x
Arrival time is shortest path
distance from source
A
C
D
B
E
A
C
D
B
E
DAG
Slide114Temporal Spread vs. Temporal Events
Same geographical network,different source vertex
© Imperial College LondonPage
114
0
12
1
1
y
x
Arrival time is shortest path
distance from source
A
C
D
B
E
A
C
D
B
E
DAG
Slide115Temporal Spread vs. Temporal Events
Same network, same process, different source vertex different DAG
© Imperial College London
Page 115
A
C
D
B
E
A
C
D
B
E
DAG, source E
A
C
D
B
E
DAG, source C
or
Slide116Temporal Edge/Vertex Network Relationship
Types related by
Line Graph transformation
© Imperial College LondonPage
116
[c.f. Evans and
Lambiotte
, 2009, 2010;
Ahn
et al. 2010]
Slide117Networks and Geometry
Recent work combinesnetworks and isotropic,
homogeneous geometriesSpatial:
(curvature 0,+1,-1)
Internet Backbone
[Boguna,
Krioukov, Claffy, 2009]
Space-time:
(curvature 0,+1,-1)
[Krioukov
et al 2013]Only uses 2 dimensional space / space-times
© Imperial College London
Page 117
Slide118Questions for “Netometry”
© Imperial College LondonPage
118
Dimension of space?Only 1 spatial dimension used in curved space work but we normally find more than 1 space dimensionAssigning spatial coordinatesDefine similarity/difference between topicsDo we need Curvature?
How do you measure it?Locally flat so perhaps find post-hocNon-metric spacesCube Box spaces [Bollobás & Brightwell 1991]= Networks + Geometry
Slide119Minkowskii Space
My approach is to use simplest space-timeMinkowski space in
D = (d+1) space-time dimensions© Imperial College London
Page 119
[Clough & Evans,
Physica
A 2016 (to appear)
arXiv:1408.1274
;
ISSI Proceedings,
arXiv1507.01388
; also in preparation]
Slide120Dimension
Assume an embedding of DAGin Minkowskii space of D dimensions
Information flows into forward light cone only
Find Dimension D
© Imperial College LondonPage 120
Future
time
like
Space-
Like
Past
time
like
Space-
Like
y
x
Slide121Shortest PathsThe
length of a path is the number of links on that path.© Imperial College London
Page 121
Slide122Shortest PathsThe
length of a path is the number of links on that path.© Imperial College London
Page 122
Slide123Shortest PathsThe
distance between vertices is the length of the shortest path© Imperial College London
Page 123
Slide124Shortest PathsThe length
of a path is the number of links on that path.The distance between vertices is the length of the shortest pathThe Betweenness of a vertex is the number of shortest paths passing through that vertex
© Imperial College London
Page 124
Slide125CentralityNetwork centrality measures try to describe the importance of a network.
© Imperial College London
Page 125
Size
DegreeDarker = Higher
Betweenness
Slide126Path LengthThe
length of a path is the number of links on that path.© Imperial College London
Page 126
Path Length
6
Slide127Distance Between VerticesThe
distance between vertices is the length of the shortest path.
© Imperial College LondonPage
127
Distance between vertices = 4
Slide128Vertex Betweenness
The Betweenness of a vertex is the number of shortest paths passing through that vertex
© Imperial College London
Page 128
Slide129Longest Path
Longest paths have little meaning for most networks as paths typically visit a large fraction of network© Imperial College London
Page 129
Longest Path
Length 9
Slide130Longest Path and DAGLongest Path is the best approximation to the space-time geodesics in the Causal Set Models of
Minkowskii space [Brightwell
& Gregory, 1991]Also conjectured to be true of Causal Sets for general (curved) space-times
© Imperial College LondonPage
130
Slide131Longest Path and DAG
Longest Path (9)
Geodesic
Shortest
Path (4)
Edges of Light
Cones
© Imperial College London
Page
131
(Poisson Point Process in
1+1 dimension
Minkowskii
space)
Slide132Box Space Representation of Journal Rankings
© Imperial College London
Page
132Node = JournalDirected Edge
if journal has better IF, EF & CAHeight = minimum longest path distance to root Top 1000journals usedTop 40 shown
worse
Slide133ConclusionsTemporal Networks come in two types
time on vertices or time on edges Constraints on networks require new tools in network analysis.
Transitive Reduction, Dimension, Longest PathGeometrySignificant differences revealed between different fields and different types of data.
© Imperial College London
Page 133Tim Evans, mperial College Londonhttp://netplexity.org
Work done with: James Clough, Jamie Gollings
, Tamar Loach
Slide134© Imperial College London
Page 134
Time & Networks
Transitive ReductionDimension
Longest PathGeometrySummary
Slide135Time and Longest Path© Imperial College London
Page 135
Longest Path Length 5
TIME
Longest paths are well defined when a time is defined
- you never go backwards
Slide136Longest Path and DAGLongest Path is the best approximation to the space-time geodesics in the Causal Set Models of
Minkowskii space [Brightwell
& Gregory, 1991]Also conjectured to be true of Causal Sets for general (curved) space-times
© Imperial College LondonPage
136
Slide137Longest Path and DAG
Longest Path (9)
Geodesic
Shortest
Path (4)
Edges of Light
Cones
© Imperial College London
Page
137
(Poisson Point Process in
1+1 dimension
Minkowskii
space)
Slide138Small Example: DNA Citation Network40 Key events (mostly single papers) in the development of the theory of DNA
Links are both direct citations and citations via an intermediary paper with a common author.[Asimov 1962; Garfield,
Sher, Torpie,1964; Hummon
& Dereian 1989]
© Imperial College LondonPage 138
Slide139DNA Citation Network© Imperial College London
Page 139
TIME
[Fig 1,
Hummon
& Dereian
1989]
Miescher 1871
Watson &
Crick ‘53
27
Nobel
32
Ochoa ‘55
Slide140Shortest Path
Betweenness for DNA Network
© Imperial College London
Page 140
Size = Shortest Path BetweennessColour = Longest Path Betweenness
Slide141Longest Path
Betweenness for DNA Network
© Imperial College London
Page 141
Size = Longest Path BetweennessColour = ditto
Slide142Centrality for DAGsLongest Path Centrality shows promise
Experimenting with other approachesComparing with “main path analysis” of bibliometrics field [
Hummon & Dereian 1989]
© Imperial College London
Page 142
Slide143Temporal Edge/Vertex Network Relationship
Types related by
Line Graph transformation
© Imperial College LondonPage
143
[c.f. Evans and
Lambiotte
, 2009, 2010;
Ahn
et al. 2010]
Slide144Temporal Vertex -> Edge Network
Directed Line Graph transformation© Imperial College London
Page 144
A
C
D
B
E
1
1
2
3
4
5
6
7
2
3
4
5
6
7
Slide145Temporal Edge -> Vertex Network
(Directed) Line Graph transformation© Imperial College London
Page 145
3
2
4
1
B
C
D
A
D
C
B
E
1
1
1
2
3
4
C
D
Time of email
Email
Chain
Email
Person
Person
Slide146Temporal Spread vs. Temporal Events
Spreading on a geographical network is very different from temporal events network© Imperial College London
Page 146
1
2
1
1
0
y
x
Arrival time is shortest path
distance from source
A
C
D
B
E
A
C
D
B
E
DAG
Slide147Temporal Spread vs. Temporal Events
Same geographical network,different source vertex
© Imperial College LondonPage
147
0
12
1
1
y
x
Arrival time is shortest path
distance from source
A
C
D
B
E
A
C
D
B
E
DAG
Slide148Temporal Spread vs. Temporal Events
Same underlying network, same process, different source vertex different DAG
© Imperial College London
Page 148
A
C
D
B
E
A
C
D
B
E
DAG, source E
A
C
D
B
E
DAG, source C
or
Slide149Minkowskii
Space© Imperial College London
Page 149
time
spaceFuturetimelike
Slide150Analysis of DAGs
Paths and Dimensions in Temporal Networks
Joana
Teixeira
Third year project
Partner:
Chon Lei (Jacky)
Supervisor:
Dr Tim Evans
1
Slide151Longest
1.970
0.006
1.808
0.021
Greedy
2.057
0.011
1.722
0.029
Longest
Greedy
[1]
Bollobas
,
Brightwell
(1991)
(1)
(2)
Longest Path is related to Dimension [1] via :
– number of nodes in interval
– length of longest path
D – dimension of network
– maximal chain constant (limits presented in [1])
2D
Minkowski
– iterated 500 times
Number of nodes
Dimension Calculation
8
2. Results - Theoretical Model
Slide152Path length
Number of nodes
1/N
2D
Minkowski
– iterated 500 times
Path length
Number of nodes
1/N
2D
1.596
2.718
2D
1.596
2.718
Limits
for
for
Minkowski
space given in [1]
2D
1.571
0.006
1.901
0.004
2D
values obtained from our model for large N; D=2
)
Fitting Function:
Dimension Calculation
[1]
Bollobas
,
Brightwell
(1991)
9
2. Results - Theoretical Model