Software Engineer October 2015 colsonanlgov Graph Databases and Java 1 Outline Assumptions What is a graph and what are they good for What is a graph database What is Neo4J and how does one use it ID: 415164
Download Presentation The PPT/PDF document "Chuck Olson" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Chuck OlsonSoftware EngineerOctober 2015colson@anl.gov
Graph Databases and Java
1Slide2
OutlineAssumptionsWhat is a graph and what are they good for?What is a graph database?
What is Neo4J and how does one use it?Case: Subway Model Results Compilation
Questions
2Slide3
Audience AssumptionsWorking knowledge of:Java
Relational databases
3Slide4
What is a graph?Collection of nodes and edgesEdges can be directed (or not)Edges can represent many things4
Chuck
Jim
Jay
Gary
K
nows
K
nows
K
nows
K
nows
K
nows
Coco
AnnoysSlide5
What is a graph?5Slide6
Transportation Example6Denver
18:00
20:00
13:00
15:00
12:00
Los Angeles
Chicago
New York
Dallas
16:00
17:00Slide7
What are graphs good for?Often map more directly to the structure of some object-oriented problems.Work best for storing “richly connected” dataMany algorithms exist to extract useful informationDijkstra’s shortest pathMinimum spanning tree (Kruskal and others)
7Slide8
What is a graph database?A NoSQL database that stores nodes and edges, and provides a mechanism to easily query information from it.Can contain nodes of different typesCan have free-form attributes within nodesCan have edges (relationships) of different typesCan have attributes attached to edges (distance, cost, relationship)Query mechanism
8Slide9
Why would I ever use one?Easier to find solutions to certain problems by framing data graphically.“The right tool for the job”9Slide10
Neo4JOpen source (GPLv3 for Community Edition)V1.0 released in 2010Written in Java and ScalaManaged by Neo TechnologyUses the Property Graph ModelEmbedded or serverFully transactionalSet of jar files ~30MBQuery language: Cypher
10Slide11
How do you use Neo4J?Creating a database11
// Location of databaseString
dbPath
=
“/Users/chuck/
myneodb
”;
GraphDatabaseFactory
factory =
new
GraphDatabaseFactory
();
GraphDatabaseBuilder
builder
=
factory.newEmbeddedDatabaseBuilder
(
dbPath
);
GraphDatabaseService dbService
=
builder.newGraphDatabase();Slide12
How do you use Neo4J?Creating fixed node and edge types12
// Node typespublic
enum
NodeLabel
implements Label {Station};
//
Relationship types
public
enum
RelType
implements
RelationshipType
{TRACKS_TO
, ROUTE_TO, AIRWAY_TO};
Slide13
How do you use Neo4J?Adding nodes to a database13
// Create Station nodeNode node1
=
dbService.createNode
(
NodeLabel.Station
);
// Set properties on the Station
node1.setProperty
("number",
“100”);
node1.setProperty
("name",
“State St”);
// Add another
Node node2
=
dbService.createNode
(
NodeLabel.Station
);
node2.setProperty
("number", “
101”);
node2.setProperty
("name",
“Lake
St”);Slide14
How do you use Neo4J?Adding edges to a database14
// Create edge from node1 to node2Relationship edge
=
node1.createRelationshipTo(node2,
RelType.ROUTE_TO
);
// Set props on the edge
edge.setProperty
("route",
“State St Subway”);
edge.setProperty
("line",
“Red”);
// Create another edge of a different type.
edge
=
node1.createRelationshipTo(node2,
RelType.TRACK_TO
);Slide15
How do you use Neo4J?Querying the database15
// Returns station numbers of all stations in graph.String
queryText
=
“
MATCH (
stn:Station
) RETURN
stn.number
";
ExecutionEngine
engine = new
ExecutionEngine
(
dbService
);
ExecutionResult
result
=
engine.execute
(
queryText
);
Iterator<String> stnIt
= result.columnAs
("stn.number
");
// Print results
while
(
stnIt.hasNext
())
System.out.println
(
stnIt.next
());Slide16
Case: Studying Subways16Slide17
Case: Studying SubwaysQuestions we might want to ask:“Find all the stations that have air connectivity paths to station X that are less than K km”“Find all the train routes that that go through all stations that are N stops from station X”
17Slide18
Case: Studying Subways18Slide19
Case: Studying Subways19 Stations- Number (id)- Name
- Lat- Lon
Segments
-
SegmentId
(id)
-
StationFromNumber
-
StationToNumber
- Length
-
SegmentType
SegmentTypes
-
SegmentTypeId
(id)
-
TypeName
LineSegments
- LineId (id)- SegmentId
- SegmentIndex
Lines- LineId
(id)- LineName-
LineDirection
Relational attempt…Slide20
Case: Studying Subways20
Station NameStation Number
Route Name
Distance
Distance
Graph attempt…
Node
ROUTE_TO Edge
TRACKS_TO Edge
AIRWAY_TO EdgeSlide21
Case: Studying Subways21
St. Paul’s100
Bank
101
Cannon Street
200
Monument
201
Tower Hill
202
Tower Gateway
300
Red
Green
Yellow
1 km
.7 km
.2 km
1.8 km
Green
Yellow
.1 km
White/BlueSlide22
Case: Studying SubwaysAnswering the question: returns all stations 2 track segments from station 200 (Cannon Street)22
MATCH p=(fromStn:Station)-[edge:TRACKS_TO*2..2]-(
toStn:Station
{number
:‘200’})
WHERE
fromStn.number
<>
toStn.number
RETURN
distinct
fromStn,toStn,fromStn.numberSlide23
DrawbacksNo standard query language like SQL. Vendor-specific.Query language learning curve.Lack of built-in visualization tools.23Slide24
For Further Reading…24Ian Robinson, Jim Webber, Emil
EifremGraph Databases, 2nd
Edition
O’Reilly and Associates
Rik Van
Bruggen
Learning Neo4J
Packt
Publishing
http://www.neo4j.com
http://www.analytics-driven.comSlide25
Questions25