==============================================
How to query the Neo4j database with Neomodel.
==============================================

At this point of the tutorial, you should have already created and setup the paradise_paper_search
Django project(:doc:`Part 1 <part01>`). You learned how to integrate the Neomodel OGM into the
Django project(:doc:`Part 2 <part02>`). Also, at :doc:`Part 3 <part03>` you created the fetch_api
Django app and learned the way a Neo4j Graph Database is modeled using Neomodel. You ended up with a group of python class definitions that represent the nodes, properties and relationships in the
Paradise Paper Graph Database(PPGDB).

Current project structure::

    paradise_papers_search/
    ├── paradise_papers_search/
    │   └── +
    ├── fetch_api/
    │   ├── __init__.py
    │   ├── admin.py
    │   ├── apps.py
    │   ├── migrations/
    │   │   └── __init__.py
    │   ├── models/
    │   │   ├── __init__.py
    │   │   ├── address.py
    │   │   ├── entity.py
    │   │   ├── intermediary.py
    │   │   ├── officer.py
    │   │   └── other.py
    │   ├── tests.py
    │   ├── urls.py
    │   └── views.py
    └── manage.py

Now, how you actually query a graph database inside your Django project or apps?

Using your models
=================

Using your models is pretty standard. You usually just import the ones you need and use them, for
example, in your Django views. Before we get to that, we need to learn the Neomodel Query API. This
API will allow us to express queries to the database without having to write them in plain Cypher.

We will learn to query using the Paradise Paper models we did before.
To do that we will first use an instance of the python interpreter.

The project python interpreter:
---------------------------------------

Let's open the python interpreter through our Django project ``manage.py`` which will import our
project settings(remember you set the DATABASE_URL in there, this is needed to connect to the db).
Also, it will try to use ipython or bpython if available.

First let’s start opening the console, if necessary.

Then make sure you are at the ``paradise_papers_search`` root directory
where you created the Django project(where the ``manage.py`` module is).

Run the command::

        python manage.py shell


With the python interpreter in hand, we can import our models and start to use them as soon as we
execute the following python import commands::

    from fetch_api.models import Entity
    from fetch_api.models import Intermediary
    from fetch_api.models import Officer
    from fetch_api.models import Address
    from fetch_api.models import Other

Each of the models we just imported maps to a specific structure of a node label, property keys and
relationship types in the PPGDB. Now we are ready to start exploring the
Paradise Paper Graph Database through the Neomodel Query API.

Neomodel Query API
==================
Each of your models has some properties and methods(inherited from StructureNode or DjangoNode)
that help us to express queries to the Neo4j database.

NodeSet and Nodes Neomodel objects
--------------------------------------------------------
A *NodeSet* object represents a set of nodes matching common query parameters or filters.

A *Node* object is an instance of one of our models. That means we can access all the properties
and methods defined on the model class. Each instance represents a single node in the database.

The ``<Model>.nodes`` class property of each model store a NodeSet object. Each time we access
this ``.nodes`` property we get a brand new nodeset object, which means we get nodeset without any
filters applied. Initially, before applying any filters, this noseset represents all the nodes
mapped under a model(nodes labeled with the same class name). For instance, ``Entity.nodes``
contains all the nodes with the label Entity on the database.

Later we will see how we can apply filters in order to match a specific subset of nodes.

Length of a NodeSet
-------------------
If we wanted to count all the Entity nodes that are stored in the database, we just call
the ``len`` python function over the ``Entity.nodes`` nodeset.

Example::

    len(Entity.nodes)

When we call ``len(Entity.nodes)``, Neomodel will generate a cypher query that counts
all the nodes with the label ``Entity``. Then that query is executed in the Neo4j database and
we get back the count. The cypher query string that is generated by Neomodel behind the scene is::

    MATCH (n:Entity) RETURN COUNT(n)

.. note::
    We are not retrieving all the nodes from the database and then count them. The actual counting
    is done by the Neo4j database engine which is faster.

Another example, to get a count of all the nodes that exist in the PPGDB database::

    len(Entity.nodes) \
    + len(Officer.nodes) \
    + len(Intermediary.nodes) \
    + len(Address.nodes) \
    + len(Other.nodes)

If nodeset is filtered, only nodes that fulfill the filters will be counted.

Fetching nodes
----------------
In order to retrieve the nodes, read their properties and relationships, an actual cypher query needs
to be executed by Neomodel. This is handled completely by Neomodel and we just need to use its
query API.

A call to the NodeSet method ``.all()``, would return all the nodes of a nodeset; nevertheless
this would result in an expensive query. The reason is that Neomodel will actually try to
retrieve all the nodes at once. It is recommended to use ``.all()`` when the nodeset is small.
We can reduce the size by filtering the nodeset as will see in the later.

It is better to fetch the nodes in batches from a nodeset. The NodeSet objects support the same
operators for indexing and slicing just like the normal python lists.

To get the first element of the ``Entity.nodes`` nodeset, we can reference its index::

    Entity.nodes[0]

To get a subset of nodes, we can use the python slice syntax. This is convenient for writing code
that retrieves the nodes in batches. For example to get the first 10 nodes in a list::

    Entity.nodes[0:10]

.. note::
    Neomodel will generate and execute cypher query only to retrieve the nodes we are asking for.
    So we are not actually retrieving all the nodes at once from the database. An example of a
    cypher query string generated by new model would be
    ``MATCH (n:Entity) RETURN n SKIP 10 LIMIT 10``

Finding nodes
-------------
If we know exactly what node we are looking for, for instance we have the node_id or the exact name
property value, we can use the ``.get()`` or ``.get_or_none()`` nodeset methods. The difference is
that if no match, the first one will raise a DoesNotExist exception and the second will return
`None`.

To get the node which node_id is ``160380`` in a given nodeset::

    Entity.nodes.get_or_none(node_id=160380)
    Entity.nodes.get(node_id=160380)

.. warning::
    These methods will raise MultipleNodesReturned exception if the property value
    used to get the node is not unique.

Filtering nodes
---------------
It is very probable that we want to get a subset of nodes that fulfill a specified condition.
For example, getting all the Entity nodes which name property contains a specific word.

In order to filter nodes in a nodeset, we use the NodeSet method ``.filter```.
The filter method borrows the same django filter format with double underscore prefixed operators.

To get Entity nodes which name property has the word "financial", we use the operator `contains`::

    Entity.nodes.filter(name__contains='financial')

The above statement will return a filtered nodeset, in order to actually retrieve the data see
the Fetching Nodes section. For more prefixed operators refer to this page:
http://neomodel.readthedocs.io/en/latest/queries.html#node-sets-and-filtering

Creating some utils to search the PPGDB
=======================================
The purpose of this tutorial is to show you how we can use Neomodel with Django. In order to do
that we will build an app that will search the Paradise Paper Graph Database.
With what we have learned so far is enough for our purpose.

We will create some function utils that will help us search the PPGDB. Later, we will find ourselves
importing and using these helper functions to fetch data from the DB in our Django views.

To start coding, first let's create a new python module under our ``fetch_api/`` directory.
Name the file as ``utils.py``

Now, as we will want to query the Neo4j database, we will import our models.
Put the below import statements at the start of the `utils.py``::

    from .models import Entity
    from .models import Intermediary
    from .models import Officer
    from .models import Address
    from .models import Other

In order to easily access each of the model classes programmatically, let's create a key-value map.
The key will be the model class name and the value will be the model class itself::

    MODEL_ENTITIES = {
        'Entity': Entity,
        'Address': Address,
        'Intermediary': Intermediary,
        'Officer': Officer,
        'Other': Other
    }

Filter Nodes Helper
-------------------

We will create a function that receives a model class and some filter parameters like *name,
country jurisdiction and source_id*. Then this function will return a filtered nodeset containing
only the model nodes that pass our filters.

Let's add this helper function to the ``utils.py``, with the name ``filter_nodes``::

    def filter_nodes(node_type, search_text, country, jurisdiction, source_id):
        node_set = node_type.nodes

        # On Address nodes we want to check the search_text against the address property
        # For any other we check against the name property
        if node_type.__name__ == 'Address':
            node_set.filter(address__icontains=search_text)
        else:
            node_set.filter(name__icontains=search_text)

        # Only entities store jurisdiction info
        if node_type.__name__ == 'Entity':
            node_set.filter(jurisdiction__icontains=jurisdiction)

        node_set.filter(countries__icontains=country)
        node_set.filter(sourceID__icontains=source_id)

        return node_set

Count Nodes Helper
------------------

We will create a function that return the length of the nodeset returned by the ``filter_nodes``
helper we created before. It will receive a dictionary of filters.

Here a representation of the required dictionary keys::

    {
        'node_type': '',
        'name': '',
        'country': '',
        'jurisdiction': '',
        'sourceID': ''
    }

Let's add this helper function to the ``utils.py``, with the name ``count_nodes``::

    def count_nodes(count_info):
        count = {}
        node_type               = count_info['node_type']
        search_word             = count_info['name']
        country                 = count_info['country']
        jurisdiction            = count_info['jurisdiction']
        data_source             = count_info['sourceID']
        node_set                = filter_nodes(MODEL_ENTITIES[node_type], search_word, country, jurisdiction, data_source)
        count['count']          = len(node_set)

        return count

Fetch Nodes Helper
------------------

We will create a function that returns a subset of nodes filtered by the ``filter_nodes`` helper that we created previously. It will receive a dictionary of filters.

Here a representation of the required dictionary keys::

    {
        'node_type': '',
        'name': '',
        'country': '',
        'jurisdiction': '',
        'sourceID': ''
        'limit': 10,
        'page': 1
    }

The ``limit`` and ``page`` filters are necessary to calculate the ``start`` and ``end`` values that
we will use to get a subset of nodes from a nodeset.
Just like we learned in the Fetching Nodes section, we will return the nodes in batches using
slice python syntax on the nodeset.

Let's add this helper function to the ``utils.py``, with the name ``fetch_nodes``::

    def fetch_nodes(fetch_info):
        node_type       = fetch_info['node_type']
        search_word     = fetch_info['name']
        country         = fetch_info['country']
        limit           = fetch_info['limit']
        start           = ((fetch_info['page'] - 1) * limit)
        end             = start + limit
        jurisdiction    = fetch_info['jurisdiction']
        data_source     = fetch_info['sourceID']
        node_set        = filter_nodes(MODEL_ENTITIES[node_type], search_word, country, jurisdiction, data_source)
        fetched_nodes   = node_set[start:end]

        return fetched_nodes

Fetch Node Details Helper
-------------------------

We will create a function that return a single node. It will receive a dictionary of filters with
the ``node_type`` and the ``node_id``.

Here a representation of the required dictionary keys::

    {
        'node_type': '',
        'node_id': ''
    }

Let's add this helper function to the ``utils.py``, with the name ``fetch_node_details``::

    def fetch_node_details(node_info):
        node_type       = node_info['node_type']
        node_id         = node_info['node_id']
        node            = MODEL_ENTITIES[node_type].nodes.get(node_id=node_id)
        node_details    = node

        return node_details

Fetch countries, jurisdictions and data source helpers
------------------------------------------------------

As we are filtering the nodes by countries, jurisdictions and data source, we will need a list of
valid filtering values.

First let's create a new python module under our ``paradise_papers_search/`` directory.
Name the file as ``constants.py``.

We will fetch this data from the database, however, we are not going to use the models. Sometimes
it is convenient to make raw cypher queries to the database. Neomodel allows you to do that.

In your ``constant.py`` module, import the database util 'db' from ``neomodel``::

    from neomodel import db

Now you can use the ``cypher_query`` method, to execute raw cypher queries and get the results.

For example, we will query the countries, jurisdictions and data sources in the ``constants.py``::

    countries = db.cypher_query(
        '''
        MATCH (n)
        WHERE NOT n.countries CONTAINS ';'
        RETURN DISTINCT n.countries AS countries
        '''
        )[0]

    jurisdictions = db.cypher_query(
        '''
        MATCH (n)
        RETURN DISTINCT n.jurisdiction AS jurisdiction
        '''
    )[0]

    data_sources = db.cypher_query(
        '''
        MATCH (n)
        RETURN DISTINCT n.sourceID AS dataSource
        '''
    )[0]


With the results we will make sorted lists of COUNTRIES, JURISDICTIONS and DATASOURCE::

    COUNTRIES = sorted([country[0] for country in countries])
    JURISDICTIONS = sorted([jurisdiction[0] for jurisdiction in jurisdictions if isinstance(jurisdiction[0], str)])
    DATASOURCE = sorted([data_source[0] for data_source in data_sources if isinstance(data_source[0], str)])

In the ``utils.py``, import COUNTRIES, JURISDICTIONS, DATASOURCE::

    from paradise_papers_search.constants import COUNTRIES, JURISDICTIONS, DATASOURCE

Then create ``fetch_countries``, ``fetch_jurisdictions`` and ``fetch_data_source`` helpers::

    def fetch_countries():
        return COUNTRIES


    def fetch_jurisdictions():
        return JURISDICTIONS


    def fetch_data_source():
        return DATASOURCE    

We created the ``constants.py`` module because we want to make the cypher queries once. Not each
time we call ``fetch_countries``, ``fetch_jurisdictions`` or ``fetch_data_source`` helpers.

Since these queries might take some time to execute, we want them ready at the start of the
application. We can do that by executing the ``constants.py`` code.

Open the ``fetch_api/app.py`` module. Add a new method to the Django application class with the name ``ready`` and import the ``constants.py`` module. That will be enough to initialize
COUNTRIES, JURISDICTIONS and DATASOURCE constants.

Here how the ``fetch_api/app.py`` would look like::

    from django.apps import AppConfig

    class FetchApiConfig(AppConfig):
        name = 'fetch_api'

        def ready(self):
            from paradise_papers_search import constants

Using the search utils
======================

To use the search utils, we just need to import them into the module they will be used. In this
case, we will need to import them into the ``fetch_api/views.py`` module. Later they will be used
to create our application views.

Here the import statement, place this code in the ``fetch_api/views.py`` module::

    from .utils import (
        count_nodes,
        fetch_nodes,
        fetch_node_details,
        fetch_countries,
        fetch_jurisdictions,
        fetch_data_source,
    )

In the next section, you will build the rest of this code.

Testing utils in the console
----------------------------

Just like we did with models, we can import the utils in the project python interpreter and play around.

Make sure you are at the ``paradise_papers_search`` root directory
where you created the Django project(where the ``manage.py`` module is).

Run the command::

        python manage.py shell


With the python interpreter in hand, we can import our utils and start to use them as soon as we
execute the following python import command::

    from .utils import (
        count_nodes,
        fetch_nodes,
        fetch_node_details,
        fetch_countries,
        fetch_jurisdictions,
        fetch_data_source,
    )

Now, for example, you can count all the nodes that pass some filters::

    count_nodes({
        'node_type': 'Entity',
        'name':'Junior',
        'country':'',
        'jurisdiction':'',
        'sourceID':''
    })

Or you can fetch a subset of nodes that pass some filters::

    fetch_nodes({
        'node_type': 'Entity',
        'name':'Junior',
        'country':'',
        'jurisdiction':'',
        'sourceID':''
        'limit': 10,
        'page': 1,
    })