Interacting with Neo4j from Pharo Smalltalk
Introduction
There are various ways to persist data with Pharo Smalltalk, but you may not be aware that Graph DB is also one of the leading candidates.
Smalltalk is an environment where objects live on memory, so as application development progresses, graphs of objects with rich structures will be created in the image. When you try to save these objects in RDB, the impedance mismatch problem between the tables and the objects will arise. Even with the power of OR-mappers such as Glorp, it is very difficult to maintain complex mappings.
On the other hand, a graph database allows you to persist the graph structure created by objects without awkward conversions. You don't have to worry about complicated mappings.
In this article, I would like to introduce a high-level graph database client library called SCypherGraph. If you use SCypherGraph, you can easily access Neo4j, a very popular graph database, from Pharo.
Loading SCypherGraph into Pharo
To install SCypherGraph, evaluate the following code in Pharo's Playground[^1].
Metacello new
baseline: 'SCypherGraph';
repository: 'github://mumez/SCypherGraph:main/src';
load.
[^1]: Note that Playground opens with Control + o + w on Windows. On Mac, you can use the Command key instead)
This will download the prerequisites libraries and SCypherGraph itself from GitHub and install them on Pharo.
Preparing Neo4j
From the Neo4j Download Center, select "Community Server" to retrieve and extract the archive file.
After setting the path to the bin directory, execute neo4j console
( neo4j.bat console
in the case of windows). Neo4j is written in Java, so JDK 11 or later is required for execution.
neo4j console
Accessing localhost:7474
with a browser will bring up a web-based administration page. You will need to set an administrator password for the first time. (I am assuming version 4.2.3. The UI may change slightly when you upgrade)
By default, you can log in with:
Username: neo4j
Password: neo4j
However, if you press the "Connect" button, you will be prompted to change the login password immediately.
This time, let's change it as follows:
Password: neoneo
Populating Sample Data to Neo4j
After changing the password, the command pane will be displayed on the top. Let's populate the sample data called "Movie Graph" as a starting point.
First, enter :use neo4j
command after the $
prompt. You can evaluate it by pressing the play icon button (or Control + enter).
:use neo4j
This is to switch the default database from system
to neo4j
. You should change the default DB because sample data cannot be entered into the system
database.
Then open the tutorial. In the command pane, execute as follows:
:play movie-graph
A multi-page tutorial will appear directly below. If you go to the second page and click the play button in the upper left corner, the code of Cypher (Neo4j graph query/operation language) will be pasted in the top command pane.
Let's run the pasted Cypher for populating the data.
The input data is now displayed graphically.
Accessing Graph Data with SCypherGraph
Now, let's connect to Neo4j with SCypherGraph and extract the data of "Movie Graph". Write the following code[^2] in Pharo's Playground and "print it". (Select the whole code and Control + p)
db := SgGraphDb new.
db settings username: 'neo4j'; password: 'neoneo'.
db allLabels. "print it"
[^2]: If you want to change the IP address or port number for some reason, you can set it with db settings targetUri: 'bolt://127.0.0.1:7687'.
SgGraphDb >> allLabels
retrieves the label list of the nodes inside the DB. At the moment, there are only 'Movie' and 'Person' labels.
Getting Nodes
I would like to display all nodes labeled as 'Movie'. Open Transcript with Control + o + t, then append the following code to Playground and "Do it". (Choose th code and Control + d)
(db nodesLabeled: 'Movie')
do: [ :each | self traceCr: each properties ].
By SgGraphDb >> nodesLabeled:
, you can retrieve a group of nodes with a specific label. You can see that each Movie node has various information as properties, such as release year and tagline. (Since the results are iterated with do:
, they are displayed in Transcript with line breaks for each node by traceCr:
)
There are too many Movie nodes, so let's specify a condition with where:
. Add the following and try "Inspect it". (Control + i)
matrix := (db nodesLabeled: 'Movie' where: [:each | each @ 'title' = 'The Matrix']) first.
matrix properties.
By using the where:
block, the nodes will be filtered to have a 'title' equal to 'The Matrix'. The result is returned as an OrderedCollection
, but since it contains only one node, we can just take the first
one.
The message properties
returns the node's properties list, and Inspector displays them.
Hmm, "The Matrix" is a movie that is over 20 years old!
Getting Relationships
In a graph, nodes are connected by relationships. Neo4j has the notion of type and direction for relationships. You can also set properties on a relationship.
Let's take a look at the relationships that are directed towards 'The Matrix'.
matrix inRelationships. "print it"
You can see that there are different types of relationships such as ACTED_IN
, DIRECTED
, and PRODUCED
. It seems that people such as actors, directors, and producers are involved.
On the other hand, there were no relationships outgoing from 'The Matrix'.
matrix outRelationships. "print it"
Now, let's specify a few more conditions and list the names of the people who are involved as actors (ACTED_IN
).
(matrix inRelationshipsTyped: 'ACTED_IN')
collect: [:each | each endNode @ 'name']. "print it"
The end side of the relationship is Person node and has the property 'name'. By sending collect:
message, the result was obtained as an array of strings.
Let's add where:
to filter the result further. The first block argument is the start node. The second one is the relationships between the nodes. The last one is the end node. These can be used to specify the conditions for searching.
The ACTED_IN
relationship has a property 'roles'. In a certain movie, there are times when one person has more than two roles, so it's reasonably an array. Let's find the person who played 'Neo'.
(matrix inRelationshipsTyped: 'ACTED_IN' where: [ :start :rel :end | (rel @ 'roles') = #('Neo') ])
collect: [ :each | each endNode properties ]. "inspect it"
Looking at the inspector:
It turns out that 'Neo' was played by 'Keanu Reeves', born in 1964.
Updating Graph Data with SCypherGraph
Now that we have nicely retrieved nodes and relationships, let's try updating database next.
Creating Nodes
First, let's create a new node labeled 'Genre'. This time, we just create two, 'SF' and 'Action' genre nodes. Each node has properties, 'name' and a short 'description'.
sf := db mergeNodeLabeled: 'Genre' properties: {'name'->'SF'. 'description'->'Science Fiction'}. "inspect it"
action := db mergeNodeLabeled: 'Genre' properties: {'name'->'Action'. 'description'->'Exciting Actions'}. "inspect it"
As shown above, you can create a node with SgGraphDb >> mergeNodeLabeled: properties:
. The resulting node will be returned immediately.
You can also create a node with createNodeLabeled:properties:
, but this will create a new node every time you run it. In mergeNodeLabeled:properties:
, if there is a matching node that already exists in the database, it will be retrieved. Nodes are created only if such nodes do not exist. So, you usually use the merge expression.
You would already have two node inspectors open. You can evaluate Smalltalk expressions in the pane at the bottom of the inspector, so type self properties
and try "print it". The property values of the node will be displayed.
Creating Relationships
Next, I would like to connect a genre node and a movie node with a relationship.
The Matrix looks like a science fiction and action movie, so it should be connected to the two genre nodes.
The relationship type is HAS_GENRE
. We'll use SgNode >> relateOneTo:typed:properties:
as shown below.
matrixToSf := matrix relateOneTo: sf typed: 'HAS_GENRE' properties: {'score'-> 6}.
matrixToAction := matrix relateTo: action typed: 'HAS_GENRE' properties: {'score'-> 7}. "do it"
I also added a property called 'score' as a measure of how strong the tendency of the genres to be classified is.
Relations can also be created with SgNode >> relateTo:typed:properties:
, but this will create a new relation each time it is executed again. There is no point in having duplicate associations from the Matrix to the same SF node, so relateOneTo:
is suitable this time. As with the merge operation of the node, if there is already the same kind of relationship, it will be returned.
For confirmation, let's extract the nodes properties from the generated relationship.
{matrixToSf startNode @ 'title'. matrixToSf endNode @ 'name'}. "print it"
As you expect, 'The Matrix' is related to 'SF'.
Let's list the relationships outgoing from Matrix. They were empty earlier.
matrix outRelationships. "print it"
Now you can confirm the two relationships.
Running Raw Cypher Queries
Behind the scenes of SCypherGraph, the graph manipulation language Cypher is dynamically generated and sent to Neo4j. The nice thing about SCypherGraph is that you can read and update graph data by simply sending messages to objects without knowing Cypher.
However, in reality, there are some situations where you want to be aware of Cypher from the perspective of performance tuning. Even if you end up writing a slightly long and complicated Cypher, getting only the elements you need at one time will reduce the number of queries executed, which will be advantageous in terms of performance.
Therefore, SCypherGraph also allows you to pass raw Cypher queries to Neo4j for optimized execution.
Let's try it with a simple Cypher first.
db runCypher: 'UNWIND range(1, 10) AS n RETURN n*n'. "inspect it"
UNWIND
generates a list of numbers from range
, and the squared results are returned as RETURN
.
This will open a SbCypherResult
inspector. You can see the actual values by writing self fieldValues
in the bottom pane and "print it".
SCypherGraph also supports executing Cypher with parameters. The parts named $from
and $to
in the following query are the parameters. The argument values are given by arguments:
.
db runCypher: 'UNWIND range($from, $to) AS n RETURN n*n'
arguments: {'from'->2. 'to'->5}. "inspect it"
Dynamically Generating Cypher with SCypher
What if you need a more complex Cypher? SCypherGraph internally uses a library called SCypher, which can generate flexible Cypher queries dynamically by sending fluent messages.
Let's consider a slightly more complicated query example. Suppose we would like to get a list of actors, who co-starred with an actor whose name begins with 'Tom', in movies released in 2000.
Writing a raw Cypher looks like this.
MATCH (p:Person)-[act1:ACTED_IN]->(m:Movie {released:2000})<-[act2:ACTED_IN]-(o:Person)
WHERE (p.name STARTS WITH 'Tom')
RETURN p.name, o.name, m.title ORDER BY p.name
MATCH
specifies the pattern of connections between nodes and relationships. WHERE
adds a detailed condition that a name property value starts with 'Tom'. With RETURN
, only the necessary information such as the actor's name and movie title is extracted.
If the search pattern is fixed, it would be enough to embed this Cypher in the source code. However, as the number of search variations increases, such a hard-coded approach becomes difficult to implement.
Now, let's create the above Cypher dynamically with SCypher.
m := 'm' asCypherObject. "Movie"
p := 'p' asCypherObject. "Person"
o := 'o' asCypherObject. "Other Person"
"A pattern in which two actors are connected in a movie released in 2000"
pathPattern := (p node: 'Person') - ('act1' asCypherObject rel: 'ACTED_IN' ) -> (m node: 'Movie' props: {'released'->2000}) <- ('act2' asCypherObject rel: 'ACTED_IN' ) - (o node: 'Person').
"A condition whether an actor starts with the name specified in the parameter"
actorNameParam := 'actorName' asCypherParameter.
where := (p @ 'name') starts: actorNameParam.
"Specifying to return the result in the order of actor's name, co-star's name, movie title"
return := (p @ 'name'), (o @ 'name'), (m @ 'title').
"Assembling Cypher queries"
query := CyQuery match: pathPattern where: where return: return orderBy: (p @ 'name') skip: 0 limit: 100. "print it"
It's longer than the Cypher I wrote manually, but you can see that pathPattern
, where
and return
are now variables. So the query is much easier to rearrange.
In addition, although these are Smalltalk-style message sends, the objects in the messages are almost a one-to-one match with Cypher elements. Therefore, they are easy to remember and can be converted smoothly.
If you select the whole above code and "print it", you can see that the following Cypher is created.
MATCH (p:Person)-[act1:ACTED_IN]->(m:Movie {released:2000})<-[act2:ACTED_IN]-(o:Person)
WHERE (p.name STARTS WITH $actorName)
RETURN p.name, o.name, m.title ORDER BY p.name SKIP 0 LIMIT 100
The query includes parameters, but it's almost the same as the hand-written Cypher shown earlier.
Let's run it with SbGraphDb >> runCypher:arguments:
.
result := db runCypher: query arguments: { actorNameParam -> 'Tom' }.
(result fieldValues groupedBy: [ :each | each at: 1 ]). "inspect it"
For ease of viewing in Inspector, we use groupedBy:
to group by actor's name. The result is that 'Tom Cruise' co-starred with 8 people on 'Jerry Maguire' and 'Tom Hanks' co-starred with 1 person on 'Cast Away'.
Probably not a big deal. Let's change the condition of where
a little.
where := ((p @ 'born') > 1970) and: ((o @ 'born') > 1970). "do it"
Now that you have changed where
condition, you need to regenerate the query. Select both query := ...
and result := ...
and "inspect it".
With this, we are able to obtain people who co-starred with each other under the age of 30, back in 2000.
What Kind of Application is using SCypherGraph?
Although it has not been put into actual battlefields yet, Allstocker.com, an used construction equipment trading service, keep using Neo4reSt for years. Actually, Neo4reSt is a kind of ancestor of SCypherGraph. The client library is used for realizing advanced searches in Allstocker. Neo4reSt is calling Neo4j's legacy REST API. SCypherGraph, on the other hand, uses Neo4j's native Bolt binary protocol. In my benchmark, SCypherGraph is about three times faster than Neo4Rest. Therefore, Allstocker is considering moving to SCypherGraph in a near future.
In Conclusion
SCypherGraph allows you to access Neo4j and manipulate the graph data in a very intuitive way. For more complex use cases, dynamically generated Cyphers can be sent directly to Neo4j, allowing for a high degree of search flexibility.
Have fun with SCypherGraph!