-
Notifications
You must be signed in to change notification settings - Fork 1
GraphDatabases
This information relates specifically to Neo4j, YMMV for other graph dbs.
A graph database has several types of data:
- nodes - graph data records
- relationships - connections between nodes
- properties - named data values pairs for nodes or relationships
- labels - mechanism to group nodes together
A node can have properties. A relationship can have properties. A label does not have any properties. Similar nodes can have different sets of properties. Properties can be strings, numbers, or booleans. A node may have multiple labels.
- Relationships always have a direction
- Relationships always have a type
- Relationships always have a start node
- Relationships always have an end node
- Relationships form patterns of data
- Relationships are data records
- Relationships can contain properties
Most often, relationships have quantitative properties, like weight, rating, strength, or distance. Even though they are directed, relationships can be navigated regardless of direction.
The only rule: no broken links. You cannot delete a node without also deleting the relationships it's a member of.
Graph relationships naturally form paths, so querying/traversing the graph involves following those paths.
When transitioning from a relational model, use the following guidelines:
- A row is a node.
- A table name is a label.
- Adopt a design for queryability mindset.
The graph model will describe the relationship in more detail. The name of the relationship will already give an indication about it's nature. Additionally, in a graph model, the data can be normalized without sacrificing performance.
- Define a statement describing a connection between 2 entities.
- Identify each unique conceptual identity in the statement as a node.
- Extract label names by identifying the roles of each of the nodes.
- Connect the nodes with a relationship by describing their interactions.
- Draw the data model
- Start asking pertinent questions about your data to identify properties of the node or relationship.
- Create a simple dataset to validate your assumptions.
- Translate your questions in queries.
Use labels to group nodes into sets. Queries can limit their scope by using these labels instead of searching the entire graph. After identifying the roles of the objects, extract the label names.
When refactoring a graph database schema, the normal mechanism is just to add new nodes and relationships rather than add new properties to existing ones. The rationale is data safety, changing properties might introduce some variance in existing queries. Graph database are naturally additive. Always model for the questions you want answered and create new nodes and relationships that describe them.
Avoid storing entities in relationships, rather keep the properties focused on how the entities are related, rather than what they are. Be careful about the nouns you use.
Neo4j Data Modeling Guide Modeling Trees with Neo4j Neo4j Mailing List - search for 'tree'
This is a command driven web-based client. Use it for running ad-hoc graph queries or prototype a simply Neo4j-based application. You can export the any query results. It provides visualization mechanisms. It is built on top of the REST API.
:help
- edit multi-line with
<shift-enter>
- execute a query with
<ctrl-enter>
:play start|intro|concepts|graphs
:clear
-
:play sysinfo
will get monitoring information -
:help history
will show the command history of the browser
When querying w/ Cypher, we frequently start with bound nodes, which are well known starting points in the graph. Use the START
clause to query the underlying indexes to start exploring the rest of the graph.
A simple example to create a small social graph:
CREATE (ee:Person { name: "Emil", from: "Sweden", klout: 99 })
-
CREATE
creates the data -
()
indicates a node -
ee:Person
a variableee
and labelPerson
for the new node -
{}
add properties to the node
A simple example to find the node representing Emil:
MATCH (ee:Person) WHERE ee.name = "Emil" RETURN ee;
-
MATCH
specifies a pattern of nodes and relationships -
(ee:Person)
a single node pattern with label "Person" which will assign matches to the variableee
-
WHERE
constrains the results -
ee.name = "Emil"
compares the name property to the value "Emil" -
RETURN
used to request particular results
You can create many nodes and relationships at the same time:
MATCH (ee:Person) WHERE ee.name = "Emil"
CREATE (js:Person { name: "Johan", from: "Sweden", learn: "surfing" }),
(ir:Person { name: "Ian", from: "England", title: "author" }),
(rvb:Person { name: "Rik", from: "Belgium", pet: "Orval" }),
(ally:Person { name: "Allison", from: "California", hobby: "surfing" }),
(ee)-[:KNOWS {since: 2001}]->(js),(ee)-[:KNOWS {rating: 5}]->(ir),
(js)-[:KNOWS]->(ir),(js)-[:KNOWS]->(rvb),
(ir)-[:KNOWS]->(js),(ir)-[:KNOWS]->(ally),
(rvb)-[:KNOWS]->(ally)
A pattern can be used to find Emil's friends:
MATCH (ee:Person)-[:KNOWS]-(friends)
WHERE ee.name = "Emil" RETURN ee, friends
-
MATCH
describes the pattern from known nodes to found nodes -
(ee:Person)
starts the pattern with a Person (qualified byWHERE
) -
-[:KNOWS]-
matches "KNOWS" relationships in either direction -
(friends)
results bound to this variable
Pattern matching can also make recommendations. For example, Johan is learning to surf, so he may want to find a new friend who already does:
MATCH (js:Person)-[:KNOWS]-()-[:KNOWS]-(surfer)
WHERE js.name = "Johan" AND surfer.hobby = "surfing"
RETURN DISTINCT surfer
-
()
empty parens to ignore the nodes in-between -
DISTINCT
because more than one path will match the pattern -
surfer
will contain Allison, a friend of a friend who surfs
Concepts
Elements
Guidelines
Miscellaneous
Techniques