.. _Methods:

NeDRex: Algorithms
===================

You can find more theoretical explanations of the network-based algorithms as well as statistical validation methods implemented or integrated in the NeDRex platform in this section.

.. _MuST methods:

MuST
------

The `Steiner tree problem <https://link.springer.com/article/10.1007%2FBF00288961>`_ is an optimization problem with the
objective of finding a tree of minimum cost connecting the set of seeds (terminals). For NeDRex we established a multi-Steiner
trees (MuST) method that combines several approximates of Steiner trees into a single subnetwork. By selecting genes associated
with a disease of interest as seeds, MuST runs on the gene-gene layer of the integrated network in the backend and
extracts a connected subnetwork which potentially incorporates the genes involved in the disease pathways and mechanism.
These genes could be targets of putative drug repurposing candidates.

In order to penalize the hubs nodes and consequently extract mechanisms more specific to the disease of interest, users can
conduct the MuST algorithm with the hub penalty parameter. This parameter incorporates the degree of neighboring nodes as edge
weights in the optimization. For more detailed information about the hub penalty see the Supplementary Information document of
our `CoVex <https://www.nature.com/articles/s41467-020-17189-2>`_ tool.


.. _DIAMOnD methods:

DIAMOnD
---------

`DIAMOnD <https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004120>`_ identifies a candidate
disease module around a set of known disease genes (seeds) in the gene-gene layer of the integrated network by greedily adding nodes with a high connectivity significance
to the module. In the iterative algorithm of DIAMOnD, the connectivity significance of all direct neighbors of seeds is computed.
Then, the most significantly connected node is integrated into the module, leading to expansion of the module by one node
per iteration. Subsequently, the connectivity significance is recomputed w.r.t. the updated module and the process
iterates until the desired module size has been reached.

The derived disease modules could incorporate targets of potential drug repurposing candidates.

.. _BiCoN methods:

BiCoN
-------

`BiCoN <https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa1076/6050718>`_
is a network-constrained biclustering method that is used for integrative analysis of gene expression and
PPI networks. BiCoN simultaneously clusters patients and genes such that genes also form a connected subnetwork in the
PPI network. The derived genes clusters in the form of a connected subnetwork could incorporate targets of potential drug
repurposing candidates.

.. _Closeness centrality methods:

Closeness centrality
-----------------------

Closeness centrality is a node centrality measure that prioritizes the nodes in a network based on the lengths of
their shortest paths to all other nodes in the network. For NeDRex, we implemented a modified version running on the heterogeneous
network of protein-protein and protein-drug associations, where closeness of drugs is calculated with respect to only the selected
protein seeds. The rational behind this modification is to favourably give higher ranks to drugs that are at a close distance to the
nodes in the disease module and could be hence good candidates as repurposable drugs.


.. _TrustRank methods:

TrustRank
-----------

`TrustRank <https://www.vldb.org/conf/2004/RS15P3.PDF>`_
is a modification of Google’s PageRank algorithm, where the initial “trust” score is iteratively propagated
from seed nodes to neighbor nodes using the network topology. It prioritizes nodes in a network based on how well they
are connected to a (trusted) set of seed nodes. In NeDRex, TrustRank is executed on the heterogeneous network of protein-protein and
protein-drug associations to obtain a ranked list of drugs that could be putative drug repurposing candidates.

The rate of trust propagation across the network is controlled by damping factor parameter (0.0-1.0). A higher damping factor returns
results in a more explorative fashion.

.. _Statistical validation methods:

Statistical Validations
-----------------------
For evaluation of results returned by NeDRex, a list of drugs as the true reference list needs to be compiled. This reference list contains
indicated drugs for the treatment of the disease, which can be obtained directly from NeDRexDB using :ref:`Get drugs indicated in disease function
<get disease drugs function>` or from other resources. Since drug indication data from DrugBank is not available via non-commercial license and hence is
not integrated into NeDRexDB, the list of indicated drugs can be complemented by browsing DrugBank directly. For the cases where user can only retrieve
a few indicated drugs (ten or fewer), the reference list can be extended by drugs from clinical trials or therapeutic drugs supported by literature
evidence from CTD database.

.. _drug-validation algorithm:

Drug list validation
^^^^^^^^^^^^^^^^^^^^^^^
First, N lists of randomly selected drugs, matching the size of the drug list predicted by NeDRex, are generated. The significance of
the result drugs is estimated by calculating an empirical P-value by counting the number of random lists having larger overlap with
the reference list of drugs than that of the NeDRex result list. A variation of this method is also implemented where the ranks
of the reference drugs in the output are also considered. We define discounted cumulative gain (DCG) for a list of ranked drugs as follows:

.. math:: DCG = \sum_{i=1}^n\frac{d_i}{\log_2(i+1)}

where `n` is the length of the ranked list of drugs, d\ :sub:`i`\=1 if the i\ :sup:`th`\  drug from the sorted list of drugs is indicated for the disease of interest and d\ :sub:`i`\=0 otherwise.

The DCG metric captures whether the true list of drugs (reference drugs) are retrieved early or late in the ranked list. The DCG-based
empirical P-value is computed by counting the number of random drug lists with DCG values higher than of the NeDRex result list.


.. _module-validation algorithm:

Disease module validation
^^^^^^^^^^^^^^^^^^^^^^^^^
This method takes into account the role of disease module identification step in the NeDRex drug repurposing pipeline. We generate 1000 mock modules
(mechanisms) that match the size and the number of connected components of a disease module returned by NeDRex. The latter constraint is set to
keep the topology of random modules comparable to the result disease module.
For the disease module computed by NeDRex as well as each mock module, we define its precision as the number of reference drugs targeting the module
divided by the overall number of drugs targeting the module. We then compute an empirical P-value by counting the number of mock modules with higher
precision values than the disease module computed by NeDRex. We have also implemented a simplified approach where we do not normalize by the overall
number of targeting drugs, i.e., compare intersection sizes with the reference drugs instead of precision values as defined above


.. _joint-validation algorithm:

Joint validation of disease module and drug list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This method takes into account, both steps of the drug repurposing pipeline, i.e. disease module identification and drug ranking, as a whole in the final
validation of results. Computationally, this approach is similar to the validation method for disease modules described previously. The only
difference is that we now compute the precision for the NeDRex result as the number of reference drugs contained in the drug list computed by NeDRex
divided by the overall number of drugs in the computed list. Analogously, we use the drug lists returned by NeDRex to compute the intersection size for
the disease module computed by NeDRex. Precision values and intersection sizes for the mock modules are computed as before.