Get started
===========

The NeDRex database (NeDRexDB) integrates data from various biomedical databases, including OMIM, DisGeNET, UniProt, NCBI gene info, IID, MONDO, DrugBank, Reactome, and DrugCentral. 
Behind the scenes, the NeDRexDB is conceptualised as a graph database; in brief, a graph database stores entities (“things”) as nodes and the relationships between entities as edges. 
One example of a node in NeDRexDB is the `Human sonic hedgehog protein <https://www.uniprot.org/uniprot/Q15465>`_ and one example of an edge is the gene encodes protein relationship between the sonic hedgehog gene and the sonic hedgehog protein. 

NeDRexDB data is stored in a MongoDB database, with one collection for each node and edge type. The NeDRexAPI exposes a number of routes to access and operate on this data.

Using the API to explore the contents of NeDRexDB
=================================================

Nodes
-----

One of the first questions a user of NeDRexDB will have is “what are the types of entities (nodes) and relationships (edges) stored in NeDRexDB? 
There are two separate API routes that can be used to obtain this information:

* Get the node types: https://api.nedrex.net/list_node_collections
* Get the edge types: https://api.nedrex.net/list_edge_collections

An example of how to acheive this in Python is shown below.

.. code-block:: python

    import requests
    node_url = "https://api.nedrex.net/list_node_collections"
    edge_url = "https://api.nedrex.net/list_edge_collections"

    nodes = requests.get(node_url).json()
    edges = requests.get(edge_url).json()

    print(nodes)
    # ['disorder', 'drug', 'gene', 'pathway', 'protein', 'signature']


In the above example, you can see that some of the nodes in NeDRexDB represent proteins. 
Next, you’ll likely want to know what information (attributes) are stored about the different types of entities in NeDRexDB. 
For this, we have the `/<type>/attributes <https://api.nedrex.net/#operation/list_attributes__t__attributes_get>`_ route. 
An example below shows this route being used to get the attributes of items in the protein collection. 
The result shows that proteins have a geneName attribute, amongst others, which contains the name of the gene that encodes the protein. 

.. code-block:: python

    import requests

    protein_attributes_url = "http://api.nedrex.net/protein/attributes"
    protein_attributes = requests.get(protein_attributes_url).json()
    print(protein_attributes)
    # ["comments","taxid","geneName","type","displayName","domainIds",...]

Note that all items in the node collections have the :code:`domainIds` and :code:`primaryDomainId` attributes. 
A domain ID is a combination of a source database and the accession used to refer to the entity in the database -- for example, proteins have a domain ID based on their UniProt accession (the sonic hedgehog protein has the :code:`primaryDomainID` of :code:`uniprot.Q15465`). 
The :code:`primaryDomainId` of an entity is used to refer to the entity in relationships. 

Edges
-----

For example, :code:`uniprot.Q15465` is used to refer to the sonic hedgehog protein in the :code:`protein_encoded_by` relationship collection.

Edges refer to two entities, and the attributes used to store the entities participating in the relationship differ depending on whether the relationship type is directed or not. 
For example one collection in NeDRexDB is :code:`protein_interacts_with_protein` which stores protein-protein interactions. 
A protein-protein interaction is undirected: if protein A interacts with protein B, then protein B interacts with protein A. 
Undirected collections use :code:`memberOne` and :code:`memberTwo` as attributes to store the primaryDomainIds of the entities involved in the relationship. 
Other collections in NeDRexDB contain directed relationships that only make sense in one direction. 
For example, relationships in the :code:`protein_encoded_by` collection are directed: a protein is encoded by a gene, not the other way around. 
Directed collections use :code:`sourceDomainId` and :code:`targetDomainId` as attributes so that the relationship can be read as :code:`[source node]-(edge type)->[target node]`.


Obtaining nodes using the API
-----------------------------
Nodes can be obtained via the `/get_by_id/<type> <https://api.nedrex.net/#operation/get_by_id_get_by_id__t__get>`_ route.
To use this API route, you need to know (1) the type of the node you want to get and (2) at least one of its domain IDs.
Continuing the example of the sonic hedgehog protein, details of this protein can be obtained as follows:

.. code-block:: python

    import requests
    route = "https://api.nedrex.net/get_by_id/"
    node_type = "protein"
    to_get = ["uniprot.Q15465"]

    response = requests.get(route + node_type, params={"q": to_get})
    print(response)

    # [{'comments': 'FUNCTION: [Sonic hedgehog protein]: The C-terminal part...
    #   'displayName': 'SHH_HUMAN',
    #   'domainIds': ['uniprot.Q15465'],
    #   'geneName': 'SHH',
    #   'primaryDomainId': 'uniprot.Q15465',
    #   'sequence': 'MLLLARCLLLVLVSSLLVCSGLACGPGRGFG...',
    #   'synonyms': ['Sonic hedgehog protein',
    #                'SHH',
    #                'HHG-1',
    #                'Shh unprocessed N-terminal signaling and C-terminal '
    #                'autoprocessing domains {ECO:0000303|PubMed:24522195}',
    #                'ShhNC {ECO:0000303|PubMed:24522195}'],
    #   'taxid': 9606,
    #   'type': 'Protein'}]


In this example, the type is specified in the URL route as being :code:`protein`.
Note that this type is the name of the collection the types are stored in (as returned by the `/list_node_collections <http://api.nedrex.net/list_node_collections>`_ route).
This route is designed so that multiple nodes can be collected at once, by including additional domain IDs in the array passed to the query parameter :code:`q`.
These domain IDs do `not` need to be the :code:`primaryDomainId`, but can be any domain ID.


Obtaining edges using the API
-----------------------------
At present, there is no way to specify specific edges to be returned from an edge collection. 
The NeDRexAPI does, however, provide the `/<type>/all <https://api.nedrex.net/#operation/list_all_collection_items__t__all_get>`_ route, which allows all items in a collecton to be returned.
For example, all :code:`protein_encoded_by` edges can be obtained in Python as follows:


.. code-block:: python

    import requests
    url = "http://api.nedrex.net/protein_encoded_by/all"
    response = requests.get(url)
    print(response.json())
    # [{"sourceDomainId":"uniprot.P31946","targetDomainId":"entrez.7529","type":"ProteinEncodedBy"},
    #  {"sourceDomainId":"uniprot.P62258","targetDomainId":"entrez.7531","type":"ProteinEncodedBy"},
    #  ...]

Note that some of the edge collections in NeDRexDB are very large -- for example, at the time of writing, the :code:`protein_similarity_protein` has over 1M edges.
Thus, it is important to ensure your code design can handle this large volume.