4. How to query the Neo4j database with Neomodel.

At this point of the tutorial, you should have already created and setup the paradise_paper_search Django project(Part 1). You learned how to integrate the Neomodel OGM into the Django project(Part 2). Also, at Part 3 you created the fetch_api Django app and learned the way a Neo4j Graph Database is modeled using Neomodel. You ended up with a group of python class definitions that represent the nodes, properties and relationships in the Paradise Paper Graph Database(PPGDB).

Current project structure:

paradise_papers_search/
├── paradise_papers_search/
│   └── +
├── fetch_api/
│   ├── __init__.py
│   ├── admin.py
│   ├── apps.py
│   ├── migrations/
│   │   └── __init__.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── address.py
│   │   ├── entity.py
│   │   ├── intermediary.py
│   │   ├── officer.py
│   │   └── other.py
│   ├── tests.py
│   ├── urls.py
│   └── views.py
└── manage.py

Now, how you actually query a graph database inside your Django project or apps?

4.1. Using your models

Using your models is pretty standard. You usually just import the ones you need and use them, for example, in your Django views. Before we get to that, we need to learn the Neomodel Query API. This API will allow us to express queries to the database without having to write them in plain Cypher.

We will learn to query using the Paradise Paper models we did before. To do that we will first use an instance of the python interpreter.

4.1.1. The project python interpreter:

Let’s open the python interpreter through our Django project manage.py which will import our project settings(remember you set the DATABASE_URL in there, this is needed to connect to the db). Also, it will try to use ipython or bpython if available.

First let’s start opening the console, if necessary.

Then make sure you are at the paradise_papers_search root directory where you created the Django project(where the manage.py module is).

Run the command:

python manage.py shell

With the python interpreter in hand, we can import our models and start to use them as soon as we execute the following python import commands:

from fetch_api.models import Entity
from fetch_api.models import Intermediary
from fetch_api.models import Officer
from fetch_api.models import Address
from fetch_api.models import Other

Each of the models we just imported maps to a specific structure of a node label, property keys and relationship types in the PPGDB. Now we are ready to start exploring the Paradise Paper Graph Database through the Neomodel Query API.

4.2. Neomodel Query API

Each of your models has some properties and methods(inherited from StructureNode or DjangoNode) that help us to express queries to the Neo4j database.

4.2.1. NodeSet and Nodes Neomodel objects

A NodeSet object represents a set of nodes matching common query parameters or filters.

A Node object is an instance of one of our models. That means we can access all the properties and methods defined on the model class. Each instance represents a single node in the database.

The <Model>.nodes class property of each model store a NodeSet object. Each time we access this .nodes property we get a brand new nodeset object, which means we get nodeset without any filters applied. Initially, before applying any filters, this noseset represents all the nodes mapped under a model(nodes labeled with the same class name). For instance, Entity.nodes contains all the nodes with the label Entity on the database.

Later we will see how we can apply filters in order to match a specific subset of nodes.

4.2.2. Length of a NodeSet

If we wanted to count all the Entity nodes that are stored in the database, we just call the len python function over the Entity.nodes nodeset.

Example:

len(Entity.nodes)

When we call len(Entity.nodes), Neomodel will generate a cypher query that counts all the nodes with the label Entity. Then that query is executed in the Neo4j database and we get back the count. The cypher query string that is generated by Neomodel behind the scene is:

MATCH (n:Entity) RETURN COUNT(n)

Note

We are not retrieving all the nodes from the database and then count them. The actual counting is done by the Neo4j database engine which is faster.

Another example, to get a count of all the nodes that exist in the PPGDB database:

len(Entity.nodes) \
+ len(Officer.nodes) \
+ len(Intermediary.nodes) \
+ len(Address.nodes) \
+ len(Other.nodes)

If nodeset is filtered, only nodes that fulfill the filters will be counted.

4.2.3. Fetching nodes

In order to retrieve the nodes, read their properties and relationships, an actual cypher query needs to be executed by Neomodel. This is handled completely by Neomodel and we just need to use its query API.

A call to the NodeSet method .all(), would return all the nodes of a nodeset; nevertheless this would result in an expensive query. The reason is that Neomodel will actually try to retrieve all the nodes at once. It is recommended to use .all() when the nodeset is small. We can reduce the size by filtering the nodeset as will see in the later.

It is better to fetch the nodes in batches from a nodeset. The NodeSet objects support the same operators for indexing and slicing just like the normal python lists.

To get the first element of the Entity.nodes nodeset, we can reference its index:

Entity.nodes[0]

To get a subset of nodes, we can use the python slice syntax. This is convenient for writing code that retrieves the nodes in batches. For example to get the first 10 nodes in a list:

Entity.nodes[0:10]

Note

Neomodel will generate and execute cypher query only to retrieve the nodes we are asking for. So we are not actually retrieving all the nodes at once from the database. An example of a cypher query string generated by new model would be MATCH (n:Entity) RETURN n SKIP 10 LIMIT 10

4.2.4. Finding nodes

If we know exactly what node we are looking for, for instance we have the node_id or the exact name property value, we can use the .get() or .get_or_none() nodeset methods. The difference is that if no match, the first one will raise a DoesNotExist exception and the second will return None.

To get the node which node_id is 160380 in a given nodeset:

Entity.nodes.get_or_none(node_id=160380)
Entity.nodes.get(node_id=160380)

Warning

These methods will raise MultipleNodesReturned exception if the property value used to get the node is not unique.

4.2.5. Filtering nodes

It is very probable that we want to get a subset of nodes that fulfill a specified condition. For example, getting all the Entity nodes which name property contains a specific word.

In order to filter nodes in a nodeset, we use the NodeSet method .filter`. The filter method borrows the same django filter format with double underscore prefixed operators.

To get Entity nodes which name property has the word “financial”, we use the operator contains:

Entity.nodes.filter(name__contains='financial')

The above statement will return a filtered nodeset, in order to actually retrieve the data see the Fetching Nodes section. For more prefixed operators refer to this page: http://neomodel.readthedocs.io/en/latest/queries.html#node-sets-and-filtering

4.3. Creating some utils to search the PPGDB

The purpose of this tutorial is to show you how we can use Neomodel with Django. In order to do that we will build an app that will search the Paradise Paper Graph Database. With what we have learned so far is enough for our purpose.

We will create some function utils that will help us search the PPGDB. Later, we will find ourselves importing and using these helper functions to fetch data from the DB in our Django views.

To start coding, first let’s create a new python module under our fetch_api/ directory. Name the file as utils.py

Now, as we will want to query the Neo4j database, we will import our models. Put the below import statements at the start of the utils.py`:

from .models import Entity
from .models import Intermediary
from .models import Officer
from .models import Address
from .models import Other

In order to easily access each of the model classes programmatically, let’s create a key-value map. The key will be the model class name and the value will be the model class itself:

MODEL_ENTITIES = {
    'Entity': Entity,
    'Address': Address,
    'Intermediary': Intermediary,
    'Officer': Officer,
    'Other': Other
}

4.3.1. Filter Nodes Helper

We will create a function that receives a model class and some filter parameters like name, country jurisdiction and source_id. Then this function will return a filtered nodeset containing only the model nodes that pass our filters.

Let’s add this helper function to the utils.py, with the name filter_nodes:

def filter_nodes(node_type, search_text, country, jurisdiction, source_id):
    node_set = node_type.nodes

    # On Address nodes we want to check the search_text against the address property
    # For any other we check against the name property
    if node_type.__name__ == 'Address':
        node_set.filter(address__icontains=search_text)
    else:
        node_set.filter(name__icontains=search_text)

    # Only entities store jurisdiction info
    if node_type.__name__ == 'Entity':
        node_set.filter(jurisdiction__icontains=jurisdiction)

    node_set.filter(countries__icontains=country)
    node_set.filter(sourceID__icontains=source_id)

    return node_set

4.3.2. Count Nodes Helper

We will create a function that return the length of the nodeset returned by the filter_nodes helper we created before. It will receive a dictionary of filters.

Here a representation of the required dictionary keys:

{
    'node_type': '',
    'name': '',
    'country': '',
    'jurisdiction': '',
    'sourceID': ''
}

Let’s add this helper function to the utils.py, with the name count_nodes:

def count_nodes(count_info):
    count = {}
    node_type               = count_info['node_type']
    search_word             = count_info['name']
    country                 = count_info['country']
    jurisdiction            = count_info['jurisdiction']
    data_source             = count_info['sourceID']
    node_set                = filter_nodes(MODEL_ENTITIES[node_type], search_word, country, jurisdiction, data_source)
    count['count']          = len(node_set)

    return count

4.3.3. Fetch Nodes Helper

We will create a function that returns a subset of nodes filtered by the filter_nodes helper that we created previously. It will receive a dictionary of filters.

Here a representation of the required dictionary keys:

{
    'node_type': '',
    'name': '',
    'country': '',
    'jurisdiction': '',
    'sourceID': ''
    'limit': 10,
    'page': 1
}

The limit and page filters are necessary to calculate the start and end values that we will use to get a subset of nodes from a nodeset. Just like we learned in the Fetching Nodes section, we will return the nodes in batches using slice python syntax on the nodeset.

Let’s add this helper function to the utils.py, with the name fetch_nodes:

def fetch_nodes(fetch_info):
    node_type       = fetch_info['node_type']
    search_word     = fetch_info['name']
    country         = fetch_info['country']
    limit           = fetch_info['limit']
    start           = ((fetch_info['page'] - 1) * limit)
    end             = start + limit
    jurisdiction    = fetch_info['jurisdiction']
    data_source     = fetch_info['sourceID']
    node_set        = filter_nodes(MODEL_ENTITIES[node_type], search_word, country, jurisdiction, data_source)
    fetched_nodes   = node_set[start:end]

    return fetched_nodes

4.3.4. Fetch Node Details Helper

We will create a function that return a single node. It will receive a dictionary of filters with the node_type and the node_id.

Here a representation of the required dictionary keys:

{
    'node_type': '',
    'node_id': ''
}

Let’s add this helper function to the utils.py, with the name fetch_node_details:

def fetch_node_details(node_info):
    node_type       = node_info['node_type']
    node_id         = node_info['node_id']
    node            = MODEL_ENTITIES[node_type].nodes.get(node_id=node_id)
    node_details    = node

    return node_details

4.3.5. Fetch countries, jurisdictions and data source helpers

As we are filtering the nodes by countries, jurisdictions and data source, we will need a list of valid filtering values.

First let’s create a new python module under our paradise_papers_search/ directory. Name the file as constants.py.

We will fetch this data from the database, however, we are not going to use the models. Sometimes it is convenient to make raw cypher queries to the database. Neomodel allows you to do that.

In your constant.py module, import the database util ‘db’ from neomodel:

from neomodel import db

Now you can use the cypher_query method, to execute raw cypher queries and get the results.

For example, we will query the countries, jurisdictions and data sources in the constants.py:

countries = db.cypher_query(
    '''
    MATCH (n)
    WHERE NOT n.countries CONTAINS ';'
    RETURN DISTINCT n.countries AS countries
    '''
    )[0]

jurisdictions = db.cypher_query(
    '''
    MATCH (n)
    RETURN DISTINCT n.jurisdiction AS jurisdiction
    '''
)[0]

data_sources = db.cypher_query(
    '''
    MATCH (n)
    RETURN DISTINCT n.sourceID AS dataSource
    '''
)[0]

With the results we will make sorted lists of COUNTRIES, JURISDICTIONS and DATASOURCE:

COUNTRIES = sorted([country[0] for country in countries])
JURISDICTIONS = sorted([jurisdiction[0] for jurisdiction in jurisdictions if isinstance(jurisdiction[0], str)])
DATASOURCE = sorted([data_source[0] for data_source in data_sources if isinstance(data_source[0], str)])

In the utils.py, import COUNTRIES, JURISDICTIONS, DATASOURCE:

from paradise_papers_search.constants import COUNTRIES, JURISDICTIONS, DATASOURCE

Then create fetch_countries, fetch_jurisdictions and fetch_data_source helpers:

def fetch_countries():
    return COUNTRIES


def fetch_jurisdictions():
    return JURISDICTIONS


def fetch_data_source():
    return DATASOURCE

We created the constants.py module because we want to make the cypher queries once. Not each time we call fetch_countries, fetch_jurisdictions or fetch_data_source helpers.

Since these queries might take some time to execute, we want them ready at the start of the application. We can do that by executing the constants.py code.

Open the fetch_api/app.py module. Add a new method to the Django application class with the name ready and import the constants.py module. That will be enough to initialize COUNTRIES, JURISDICTIONS and DATASOURCE constants.

Here how the fetch_api/app.py would look like:

from django.apps import AppConfig

class FetchApiConfig(AppConfig):
    name = 'fetch_api'

    def ready(self):
        from paradise_papers_search import constants

4.4. Using the search utils

To use the search utils, we just need to import them into the module they will be used. In this case, we will need to import them into the fetch_api/views.py module. Later they will be used to create our application views.

Here the import statement, place this code in the fetch_api/views.py module:

from .utils import (
    count_nodes,
    fetch_nodes,
    fetch_node_details,
    fetch_countries,
    fetch_jurisdictions,
    fetch_data_source,
)

In the next section, you will build the rest of this code.

4.4.1. Testing utils in the console

Just like we did with models, we can import the utils in the project python interpreter and play around.

Make sure you are at the paradise_papers_search root directory where you created the Django project(where the manage.py module is).

Run the command:

python manage.py shell

With the python interpreter in hand, we can import our utils and start to use them as soon as we execute the following python import command:

from .utils import (
    count_nodes,
    fetch_nodes,
    fetch_node_details,
    fetch_countries,
    fetch_jurisdictions,
    fetch_data_source,
)

Now, for example, you can count all the nodes that pass some filters:

count_nodes({
    'node_type': 'Entity',
    'name':'Junior',
    'country':'',
    'jurisdiction':'',
    'sourceID':''
})

Or you can fetch a subset of nodes that pass some filters:

fetch_nodes({
    'node_type': 'Entity',
    'name':'Junior',
    'country':'',
    'jurisdiction':'',
    'sourceID':''
    'limit': 10,
    'page': 1,
})