Get started

The NeDRex database (NeDRexDB) integrates data from various biomedical databases, including OMIM, DisGeNET, UniProt, NCBI gene info, IID, MONDO, DrugBank, Reactome, and DrugCentral. Behind the scenes, the NeDRexDB is conceptualised as a graph database; in brief, a graph database stores entities (“things”) as nodes and the relationships between entities as edges. One example of a node in NeDRexDB is the Human sonic hedgehog protein and one example of an edge is the gene encodes protein relationship between the sonic hedgehog gene and the sonic hedgehog protein.

NeDRexDB data is stored in a MongoDB database, with one collection for each node and edge type. The NeDRexAPI exposes a number of routes to access and operate on this data.

Using the API to explore the contents of NeDRexDB

Nodes

One of the first questions a user of NeDRexDB will have is “what are the types of entities (nodes) and relationships (edges) stored in NeDRexDB? There are two separate API routes that can be used to obtain this information:

Get the node types: https://api.nedrex.net/list_node_collections
Get the edge types: https://api.nedrex.net/list_edge_collections

An example of how to acheive this in Python is shown below.

import requests
node_url = "https://api.nedrex.net/list_node_collections"
edge_url = "https://api.nedrex.net/list_edge_collections"

nodes = requests.get(node_url).json()
edges = requests.get(edge_url).json()

print(nodes)
# ['disorder', 'drug', 'gene', 'pathway', 'protein', 'signature']

In the above example, you can see that some of the nodes in NeDRexDB represent proteins. Next, you’ll likely want to know what information (attributes) are stored about the different types of entities in NeDRexDB. For this, we have the /<type>/attributes route. An example below shows this route being used to get the attributes of items in the protein collection. The result shows that proteins have a geneName attribute, amongst others, which contains the name of the gene that encodes the protein.

import requests

protein_attributes_url = "http://api.nedrex.net/protein/attributes"
protein_attributes = requests.get(protein_attributes_url).json()
print(protein_attributes)
# ["comments","taxid","geneName","type","displayName","domainIds",...]

Note that all items in the node collections have the domainIds and primaryDomainId attributes. A domain ID is a combination of a source database and the accession used to refer to the entity in the database – for example, proteins have a domain ID based on their UniProt accession (the sonic hedgehog protein has the primaryDomainID of uniprot.Q15465). The primaryDomainId of an entity is used to refer to the entity in relationships.

Edges

For example, uniprot.Q15465 is used to refer to the sonic hedgehog protein in the protein_encoded_by relationship collection.

Edges refer to two entities, and the attributes used to store the entities participating in the relationship differ depending on whether the relationship type is directed or not. For example one collection in NeDRexDB is protein_interacts_with_protein which stores protein-protein interactions. A protein-protein interaction is undirected: if protein A interacts with protein B, then protein B interacts with protein A. Undirected collections use memberOne and memberTwo as attributes to store the primaryDomainIds of the entities involved in the relationship. Other collections in NeDRexDB contain directed relationships that only make sense in one direction. For example, relationships in the protein_encoded_by collection are directed: a protein is encoded by a gene, not the other way around. Directed collections use sourceDomainId and targetDomainId as attributes so that the relationship can be read as [source node]-(edge type)->[target node].

Obtaining nodes using the API

Nodes can be obtained via the /get_by_id/<type> route. To use this API route, you need to know (1) the type of the node you want to get and (2) at least one of its domain IDs. Continuing the example of the sonic hedgehog protein, details of this protein can be obtained as follows:

import requests
route = "https://api.nedrex.net/get_by_id/"
node_type = "protein"
to_get = ["uniprot.Q15465"]

response = requests.get(route + node_type, params={"q": to_get})
print(response)

# [{'comments': 'FUNCTION: [Sonic hedgehog protein]: The C-terminal part...
#   'displayName': 'SHH_HUMAN',
#   'domainIds': ['uniprot.Q15465'],
#   'geneName': 'SHH',
#   'primaryDomainId': 'uniprot.Q15465',
#   'sequence': 'MLLLARCLLLVLVSSLLVCSGLACGPGRGFG...',
#   'synonyms': ['Sonic hedgehog protein',
#                'SHH',
#                'HHG-1',
#                'Shh unprocessed N-terminal signaling and C-terminal '
#                'autoprocessing domains {ECO:0000303|PubMed:24522195}',
#                'ShhNC {ECO:0000303|PubMed:24522195}'],
#   'taxid': 9606,
#   'type': 'Protein'}]

In this example, the type is specified in the URL route as being protein. Note that this type is the name of the collection the types are stored in (as returned by the /list_node_collections route). This route is designed so that multiple nodes can be collected at once, by including additional domain IDs in the array passed to the query parameter q. These domain IDs do not need to be the primaryDomainId, but can be any domain ID.

Obtaining edges using the API

At present, there is no way to specify specific edges to be returned from an edge collection. The NeDRexAPI does, however, provide the /<type>/all route, which allows all items in a collecton to be returned. For example, all protein_encoded_by edges can be obtained in Python as follows:

import requests
url = "http://api.nedrex.net/protein_encoded_by/all"
response = requests.get(url)
print(response.json())
# [{"sourceDomainId":"uniprot.P31946","targetDomainId":"entrez.7529","type":"ProteinEncodedBy"},
#  {"sourceDomainId":"uniprot.P62258","targetDomainId":"entrez.7531","type":"ProteinEncodedBy"},
#  ...]

Note that some of the edge collections in NeDRexDB are very large – for example, at the time of writing, the protein_similarity_protein has over 1M edges. Thus, it is important to ensure your code design can handle this large volume.