============================================== How to query the Neo4j database with Neomodel. ============================================== At this point of the tutorial, you should have already created and setup the paradise_paper_search Django project(:doc:`Part 1 `). You learned how to integrate the Neomodel OGM into the Django project(:doc:`Part 2 `). Also, at :doc:`Part 3 ` you created the fetch_api Django app and learned the way a Neo4j Graph Database is modeled using Neomodel. You ended up with a group of python class definitions that represent the nodes, properties and relationships in the Paradise Paper Graph Database(PPGDB). Current project structure:: paradise_papers_search/ ├── paradise_papers_search/ │ └── + ├── fetch_api/ │ ├── __init__.py │ ├── admin.py │ ├── apps.py │ ├── migrations/ │ │ └── __init__.py │ ├── models/ │ │ ├── __init__.py │ │ ├── address.py │ │ ├── entity.py │ │ ├── intermediary.py │ │ ├── officer.py │ │ └── other.py │ ├── tests.py │ ├── urls.py │ └── views.py └── manage.py Now, how you actually query a graph database inside your Django project or apps? Using your models ================= Using your models is pretty standard. You usually just import the ones you need and use them, for example, in your Django views. Before we get to that, we need to learn the Neomodel Query API. This API will allow us to express queries to the database without having to write them in plain Cypher. We will learn to query using the Paradise Paper models we did before. To do that we will first use an instance of the python interpreter. The project python interpreter: --------------------------------------- Let's open the python interpreter through our Django project ``manage.py`` which will import our project settings(remember you set the DATABASE_URL in there, this is needed to connect to the db). Also, it will try to use ipython or bpython if available. First let’s start opening the console, if necessary. Then make sure you are at the ``paradise_papers_search`` root directory where you created the Django project(where the ``manage.py`` module is). Run the command:: python manage.py shell With the python interpreter in hand, we can import our models and start to use them as soon as we execute the following python import commands:: from fetch_api.models import Entity from fetch_api.models import Intermediary from fetch_api.models import Officer from fetch_api.models import Address from fetch_api.models import Other Each of the models we just imported maps to a specific structure of a node label, property keys and relationship types in the PPGDB. Now we are ready to start exploring the Paradise Paper Graph Database through the Neomodel Query API. Neomodel Query API ================== Each of your models has some properties and methods(inherited from StructureNode or DjangoNode) that help us to express queries to the Neo4j database. NodeSet and Nodes Neomodel objects -------------------------------------------------------- A *NodeSet* object represents a set of nodes matching common query parameters or filters. A *Node* object is an instance of one of our models. That means we can access all the properties and methods defined on the model class. Each instance represents a single node in the database. The ``.nodes`` class property of each model store a NodeSet object. Each time we access this ``.nodes`` property we get a brand new nodeset object, which means we get nodeset without any filters applied. Initially, before applying any filters, this noseset represents all the nodes mapped under a model(nodes labeled with the same class name). For instance, ``Entity.nodes`` contains all the nodes with the label Entity on the database. Later we will see how we can apply filters in order to match a specific subset of nodes. Length of a NodeSet ------------------- If we wanted to count all the Entity nodes that are stored in the database, we just call the ``len`` python function over the ``Entity.nodes`` nodeset. Example:: len(Entity.nodes) When we call ``len(Entity.nodes)``, Neomodel will generate a cypher query that counts all the nodes with the label ``Entity``. Then that query is executed in the Neo4j database and we get back the count. The cypher query string that is generated by Neomodel behind the scene is:: MATCH (n:Entity) RETURN COUNT(n) .. note:: We are not retrieving all the nodes from the database and then count them. The actual counting is done by the Neo4j database engine which is faster. Another example, to get a count of all the nodes that exist in the PPGDB database:: len(Entity.nodes) \ + len(Officer.nodes) \ + len(Intermediary.nodes) \ + len(Address.nodes) \ + len(Other.nodes) If nodeset is filtered, only nodes that fulfill the filters will be counted. Fetching nodes ---------------- In order to retrieve the nodes, read their properties and relationships, an actual cypher query needs to be executed by Neomodel. This is handled completely by Neomodel and we just need to use its query API. A call to the NodeSet method ``.all()``, would return all the nodes of a nodeset; nevertheless this would result in an expensive query. The reason is that Neomodel will actually try to retrieve all the nodes at once. It is recommended to use ``.all()`` when the nodeset is small. We can reduce the size by filtering the nodeset as will see in the later. It is better to fetch the nodes in batches from a nodeset. The NodeSet objects support the same operators for indexing and slicing just like the normal python lists. To get the first element of the ``Entity.nodes`` nodeset, we can reference its index:: Entity.nodes[0] To get a subset of nodes, we can use the python slice syntax. This is convenient for writing code that retrieves the nodes in batches. For example to get the first 10 nodes in a list:: Entity.nodes[0:10] .. note:: Neomodel will generate and execute cypher query only to retrieve the nodes we are asking for. So we are not actually retrieving all the nodes at once from the database. An example of a cypher query string generated by new model would be ``MATCH (n:Entity) RETURN n SKIP 10 LIMIT 10`` Finding nodes ------------- If we know exactly what node we are looking for, for instance we have the node_id or the exact name property value, we can use the ``.get()`` or ``.get_or_none()`` nodeset methods. The difference is that if no match, the first one will raise a DoesNotExist exception and the second will return `None`. To get the node which node_id is ``160380`` in a given nodeset:: Entity.nodes.get_or_none(node_id=160380) Entity.nodes.get(node_id=160380) .. warning:: These methods will raise MultipleNodesReturned exception if the property value used to get the node is not unique. Filtering nodes --------------- It is very probable that we want to get a subset of nodes that fulfill a specified condition. For example, getting all the Entity nodes which name property contains a specific word. In order to filter nodes in a nodeset, we use the NodeSet method ``.filter```. The filter method borrows the same django filter format with double underscore prefixed operators. To get Entity nodes which name property has the word "financial", we use the operator `contains`:: Entity.nodes.filter(name__contains='financial') The above statement will return a filtered nodeset, in order to actually retrieve the data see the Fetching Nodes section. For more prefixed operators refer to this page: http://neomodel.readthedocs.io/en/latest/queries.html#node-sets-and-filtering Creating some utils to search the PPGDB ======================================= The purpose of this tutorial is to show you how we can use Neomodel with Django. In order to do that we will build an app that will search the Paradise Paper Graph Database. With what we have learned so far is enough for our purpose. We will create some function utils that will help us search the PPGDB. Later, we will find ourselves importing and using these helper functions to fetch data from the DB in our Django views. To start coding, first let's create a new python module under our ``fetch_api/`` directory. Name the file as ``utils.py`` Now, as we will want to query the Neo4j database, we will import our models. Put the below import statements at the start of the `utils.py``:: from .models import Entity from .models import Intermediary from .models import Officer from .models import Address from .models import Other In order to easily access each of the model classes programmatically, let's create a key-value map. The key will be the model class name and the value will be the model class itself:: MODEL_ENTITIES = { 'Entity': Entity, 'Address': Address, 'Intermediary': Intermediary, 'Officer': Officer, 'Other': Other } Filter Nodes Helper ------------------- We will create a function that receives a model class and some filter parameters like *name, country jurisdiction and source_id*. Then this function will return a filtered nodeset containing only the model nodes that pass our filters. Let's add this helper function to the ``utils.py``, with the name ``filter_nodes``:: def filter_nodes(node_type, search_text, country, jurisdiction, source_id): node_set = node_type.nodes # On Address nodes we want to check the search_text against the address property # For any other we check against the name property if node_type.__name__ == 'Address': node_set.filter(address__icontains=search_text) else: node_set.filter(name__icontains=search_text) # Only entities store jurisdiction info if node_type.__name__ == 'Entity': node_set.filter(jurisdiction__icontains=jurisdiction) node_set.filter(countries__icontains=country) node_set.filter(sourceID__icontains=source_id) return node_set Count Nodes Helper ------------------ We will create a function that return the length of the nodeset returned by the ``filter_nodes`` helper we created before. It will receive a dictionary of filters. Here a representation of the required dictionary keys:: { 'node_type': '', 'name': '', 'country': '', 'jurisdiction': '', 'sourceID': '' } Let's add this helper function to the ``utils.py``, with the name ``count_nodes``:: def count_nodes(count_info): count = {} node_type = count_info['node_type'] search_word = count_info['name'] country = count_info['country'] jurisdiction = count_info['jurisdiction'] data_source = count_info['sourceID'] node_set = filter_nodes(MODEL_ENTITIES[node_type], search_word, country, jurisdiction, data_source) count['count'] = len(node_set) return count Fetch Nodes Helper ------------------ We will create a function that returns a subset of nodes filtered by the ``filter_nodes`` helper that we created previously. It will receive a dictionary of filters. Here a representation of the required dictionary keys:: { 'node_type': '', 'name': '', 'country': '', 'jurisdiction': '', 'sourceID': '' 'limit': 10, 'page': 1 } The ``limit`` and ``page`` filters are necessary to calculate the ``start`` and ``end`` values that we will use to get a subset of nodes from a nodeset. Just like we learned in the Fetching Nodes section, we will return the nodes in batches using slice python syntax on the nodeset. Let's add this helper function to the ``utils.py``, with the name ``fetch_nodes``:: def fetch_nodes(fetch_info): node_type = fetch_info['node_type'] search_word = fetch_info['name'] country = fetch_info['country'] limit = fetch_info['limit'] start = ((fetch_info['page'] - 1) * limit) end = start + limit jurisdiction = fetch_info['jurisdiction'] data_source = fetch_info['sourceID'] node_set = filter_nodes(MODEL_ENTITIES[node_type], search_word, country, jurisdiction, data_source) fetched_nodes = node_set[start:end] return fetched_nodes Fetch Node Details Helper ------------------------- We will create a function that return a single node. It will receive a dictionary of filters with the ``node_type`` and the ``node_id``. Here a representation of the required dictionary keys:: { 'node_type': '', 'node_id': '' } Let's add this helper function to the ``utils.py``, with the name ``fetch_node_details``:: def fetch_node_details(node_info): node_type = node_info['node_type'] node_id = node_info['node_id'] node = MODEL_ENTITIES[node_type].nodes.get(node_id=node_id) node_details = node return node_details Fetch countries, jurisdictions and data source helpers ------------------------------------------------------ As we are filtering the nodes by countries, jurisdictions and data source, we will need a list of valid filtering values. First let's create a new python module under our ``paradise_papers_search/`` directory. Name the file as ``constants.py``. We will fetch this data from the database, however, we are not going to use the models. Sometimes it is convenient to make raw cypher queries to the database. Neomodel allows you to do that. In your ``constant.py`` module, import the database util 'db' from ``neomodel``:: from neomodel import db Now you can use the ``cypher_query`` method, to execute raw cypher queries and get the results. For example, we will query the countries, jurisdictions and data sources in the ``constants.py``:: countries = db.cypher_query( ''' MATCH (n) WHERE NOT n.countries CONTAINS ';' RETURN DISTINCT n.countries AS countries ''' )[0] jurisdictions = db.cypher_query( ''' MATCH (n) RETURN DISTINCT n.jurisdiction AS jurisdiction ''' )[0] data_sources = db.cypher_query( ''' MATCH (n) RETURN DISTINCT n.sourceID AS dataSource ''' )[0] With the results we will make sorted lists of COUNTRIES, JURISDICTIONS and DATASOURCE:: COUNTRIES = sorted([country[0] for country in countries]) JURISDICTIONS = sorted([jurisdiction[0] for jurisdiction in jurisdictions if isinstance(jurisdiction[0], str)]) DATASOURCE = sorted([data_source[0] for data_source in data_sources if isinstance(data_source[0], str)]) In the ``utils.py``, import COUNTRIES, JURISDICTIONS, DATASOURCE:: from paradise_papers_search.constants import COUNTRIES, JURISDICTIONS, DATASOURCE Then create ``fetch_countries``, ``fetch_jurisdictions`` and ``fetch_data_source`` helpers:: def fetch_countries(): return COUNTRIES def fetch_jurisdictions(): return JURISDICTIONS def fetch_data_source(): return DATASOURCE We created the ``constants.py`` module because we want to make the cypher queries once. Not each time we call ``fetch_countries``, ``fetch_jurisdictions`` or ``fetch_data_source`` helpers. Since these queries might take some time to execute, we want them ready at the start of the application. We can do that by executing the ``constants.py`` code. Open the ``fetch_api/app.py`` module. Add a new method to the Django application class with the name ``ready`` and import the ``constants.py`` module. That will be enough to initialize COUNTRIES, JURISDICTIONS and DATASOURCE constants. Here how the ``fetch_api/app.py`` would look like:: from django.apps import AppConfig class FetchApiConfig(AppConfig): name = 'fetch_api' def ready(self): from paradise_papers_search import constants Using the search utils ====================== To use the search utils, we just need to import them into the module they will be used. In this case, we will need to import them into the ``fetch_api/views.py`` module. Later they will be used to create our application views. Here the import statement, place this code in the ``fetch_api/views.py`` module:: from .utils import ( count_nodes, fetch_nodes, fetch_node_details, fetch_countries, fetch_jurisdictions, fetch_data_source, ) In the next section, you will build the rest of this code. Testing utils in the console ---------------------------- Just like we did with models, we can import the utils in the project python interpreter and play around. Make sure you are at the ``paradise_papers_search`` root directory where you created the Django project(where the ``manage.py`` module is). Run the command:: python manage.py shell With the python interpreter in hand, we can import our utils and start to use them as soon as we execute the following python import command:: from .utils import ( count_nodes, fetch_nodes, fetch_node_details, fetch_countries, fetch_jurisdictions, fetch_data_source, ) Now, for example, you can count all the nodes that pass some filters:: count_nodes({ 'node_type': 'Entity', 'name':'Junior', 'country':'', 'jurisdiction':'', 'sourceID':'' }) Or you can fetch a subset of nodes that pass some filters:: fetch_nodes({ 'node_type': 'Entity', 'name':'Junior', 'country':'', 'jurisdiction':'', 'sourceID':'' 'limit': 10, 'page': 1, })