# NeDRex: Algorithms

You can find more theoretical explanations of the network-based algorithms as well as statistical validation methods implemented or integrated in the NeDRex platform in this section.

## MuST

The Steiner tree problem is an optimization problem with the objective of finding a tree of minimum cost connecting the set of seeds (terminals). For NeDRex we established a multi-Steiner trees (MuST) method that combines several approximates of Steiner trees into a single subnetwork. By selecting genes associated with a disease of interest as seeds, MuST runs on the gene-gene layer of the integrated network in the backend and extracts a connected subnetwork which potentially incorporates the genes involved in the disease pathways and mechanism. These genes could be targets of putative drug repurposing candidates.

In order to penalize the hubs nodes and consequently extract mechanisms more specific to the disease of interest, users can conduct the MuST algorithm with the hub penalty parameter. This parameter incorporates the degree of neighboring nodes as edge weights in the optimization. For more detailed information about the hub penalty see the Supplementary Information document of our CoVex tool.

## DIAMOnD

DIAMOnD identifies a candidate disease module around a set of known disease genes (seeds) in the gene-gene layer of the integrated network by greedily adding nodes with a high connectivity significance to the module. In the iterative algorithm of DIAMOnD, the connectivity significance of all direct neighbors of seeds is computed. Then, the most significantly connected node is integrated into the module, leading to expansion of the module by one node per iteration. Subsequently, the connectivity significance is recomputed w.r.t. the updated module and the process iterates until the desired module size has been reached.

The derived disease modules could incorporate targets of potential drug repurposing candidates.

## BiCoN

BiCoN is a network-constrained biclustering method that is used for integrative analysis of gene expression and PPI networks. BiCoN simultaneously clusters patients and genes such that genes also form a connected subnetwork in the PPI network. The derived genes clusters in the form of a connected subnetwork could incorporate targets of potential drug repurposing candidates.

## Closeness centrality

Closeness centrality is a node centrality measure that prioritizes the nodes in a network based on the lengths of their shortest paths to all other nodes in the network. For NeDRex, we implemented a modified version running on the heterogeneous network of protein-protein and protein-drug associations, where closeness of drugs is calculated with respect to only the selected protein seeds. The rational behind this modification is to favourably give higher ranks to drugs that are at a close distance to the nodes in the disease module and could be hence good candidates as repurposable drugs.

## TrustRank

TrustRank is a modification of Google’s PageRank algorithm, where the initial “trust” score is iteratively propagated from seed nodes to neighbor nodes using the network topology. It prioritizes nodes in a network based on how well they are connected to a (trusted) set of seed nodes. In NeDRex, TrustRank is executed on the heterogeneous network of protein-protein and protein-drug associations to obtain a ranked list of drugs that could be putative drug repurposing candidates.

The rate of trust propagation across the network is controlled by damping factor parameter (0.0-1.0). A higher damping factor returns results in a more explorative fashion.

## Statistical Validations

For evaluation of results returned by NeDRex, a list of drugs as the true reference list needs to be compiled. This reference list contains indicated drugs for the treatment of the disease, which can be obtained directly from NeDRexDB using Get drugs indicated in disease function or from other resources. Since drug indication data from DrugBank is not available via non-commercial license and hence is not integrated into NeDRexDB, the list of indicated drugs can be complemented by browsing DrugBank directly. For the cases where user can only retrieve a few indicated drugs (ten or fewer), the reference list can be extended by drugs from clinical trials or therapeutic drugs supported by literature evidence from CTD database.

### Drug list validation

First, N lists of randomly selected drugs, matching the size of the drug list predicted by NeDRex, are generated. The significance of the result drugs is estimated by calculating an empirical P-value by counting the number of random lists having larger overlap with the reference list of drugs than that of the NeDRex result list. A variation of this method is also implemented where the ranks of the reference drugs in the output are also considered. We define discounted cumulative gain (DCG) for a list of ranked drugs as follows:

$DCG = \sum_{i=1}^n\frac{d_i}{\log_2(i+1)}$

where n is the length of the ranked list of drugs, di=1 if the ith drug from the sorted list of drugs is indicated for the disease of interest and di=0 otherwise.

The DCG metric captures whether the true list of drugs (reference drugs) are retrieved early or late in the ranked list. The DCG-based empirical P-value is computed by counting the number of random drug lists with DCG values higher than of the NeDRex result list.

### Disease module validation

This method takes into account the role of disease module identification step in the NeDRex drug repurposing pipeline. We generate 1000 mock modules (mechanisms) that match the size and the number of connected components of a disease module returned by NeDRex. The latter constraint is set to keep the topology of random modules comparable to the result disease module. For the disease module computed by NeDRex as well as each mock module, we define its precision as the number of reference drugs targeting the module divided by the overall number of drugs targeting the module. We then compute an empirical P-value by counting the number of mock modules with higher precision values than the disease module computed by NeDRex. We have also implemented a simplified approach where we do not normalize by the overall number of targeting drugs, i.e., compare intersection sizes with the reference drugs instead of precision values as defined above

### Joint validation of disease module and drug list

This method takes into account, both steps of the drug repurposing pipeline, i.e. disease module identification and drug ranking, as a whole in the final validation of results. Computationally, this approach is similar to the validation method for disease modules described previously. The only difference is that we now compute the precision for the NeDRex result as the number of reference drugs contained in the drug list computed by NeDRex divided by the overall number of drugs in the computed list. Analogously, we use the drug lists returned by NeDRex to compute the intersection size for the disease module computed by NeDRex. Precision values and intersection sizes for the mock modules are computed as before.