The complete cardiovascular drug-target information was downloaded from the DrugBank database [14], therapeutic target database (TTD) [15] and FDA orange book [16] as of November 2012. The resulting list of drug targets was manually inspected one by one by literature curation to assure the quality of the data. We classified drugs and target proteins according to therapeutic areas and functional family, respectively. The reproducible set of interactions, pharmacological activities of drugs and function annotations of targets were provided in supplementary information as a resource for researchers who are interested in the cardiovascular pharmacology (Additional file 1). The curation of the drug-target data set involved the identification of 254 approved cardiovascular drugs with 206 successful cardiovascular protein targets. This data set was used to build the drug-target network.
The most complete and best-curated list of known phenotype-gene associations is maintained in the Morbid Map (MM) of the Online Mendelian Inheritance in Man (OMIM) [17]. Each entry of the MM is composed of four fields, the name of the disorder, the associated gene symbols, its corresponding OMIM id, and the chromosomal location. We analyzed the complete data set and performed a manual curation following procedure of the visionary study by Goh et al.[18]. We downloaded the MM file on January 2013. Out of 6,252 MM entries, we selected 4,811 entries with the “(3)” tag, for which there is strong evidence that at least one mutation in the particular gene is causative to the phenotype. We then parsed these 4,811 phenotype terms into 1,775 distinct phenotypes by merging phenotype subtypes of a single phenotype, based on their given names and corresponding Medical Subject Headings (MeSH) [19] vocabulary on January, 2013. The merging was done first automatically and then each entry was verified manually. Each disease was then assigned a unique disease ID.
The curated data set contained 1,775 phenotypes and 3,039 associated genes (Additional file 1), of which 98 are cardiovascular disorders associated with 268 genes (Additional file 1). In addition, 111 disease genes encode the cardiovascular target proteins, of which 35 overlaps the cardiovascular genes associated with 26 cardiovascular disorders (Additional file 1).
We constructed disease gene-gene network (DGG network; Additional file 1) and gene disease-disease network (GDD network; Additional file 1) which were derivative from the gene-disease associations (Additional file 1). In the GG network, every two genes are applied to connect with a common disease based on the global gene-disease associations. Similarly, the GDD network is transformed by connecting two disorders if they are associated with the same gene in the gene-disorder associations.
A network was generated by determining the first-order interactions of cardiovascular gene products associated with a given phenotypic subgroup in the PPI network. Interactions of the cardiovascular gene products were integrated into a network by always including direct interactions between cardiovascular gene products, and only including interactions with other proteins above a network score threshold. The network score for a protein is the amount of interactions to cardiovascular gene products out of all interaction partners of the protein, making networks consisting of proteins with many interactions less important and reducing noise from highly interacting proteins for non-cardiovascular proteins. The median of all scores for all non-cardiovascular proteins is 0.25 and is used as the threshold-score [20]. Detailed views of the networks can be seen in Additional file 1.
Human protein-protein interaction (PPI) set were assembled from HINT (High-quality protein interactomes) [21] updated June 3, 2013. HINT is a database of high-quality PPIs integrated from various sources and filtered to remove low-quality/erroneous interactions. The resulting set of PPIs contained 28,629 non-self-interacting, non-redundant interactions between 8,495 proteins, of which 132 were cardiovascular targets and 191 were cardiovascular gene products mapped by Gene names. The list of PPIs used is available at the online database CVDSP (http://sm.nwsuaf.edu.cn/lsp/cvdsp.php).
To quantify the cellular network-level relationship between pair of phenotypes, we assessed the molecular associations for each pair of phenotype modules by their shared protein-protein interactions in the disease modular network. Number of shared protein-protein interactions is the number of protein-protein interactions that link genes between the two modules. The significance of shared protein-protein interactions was measured by randomization tests of the resulting network. For two phenotype modules, we firstly randomly generated two modules with the same number of disease genes. We then calculated the numbers of shared protein-protein interactions between the two random modules. This procedure was performed for 10, 000 times to obtain significant statistics and P values for the two disorders. All pairs of disorders involving shared protein-protein interactions and P values are listed in Additional file 1.
The degree of a node is the number of edges connecting to the node. The shortest path between two nodes is the path with the smallest number of links between the selected nodes. The betweenness (centrality) denotes the proportion of all shortest paths between node pairs in a network passing through the measured node, indicating the relative importance of the particular node in network global connectivity. Closeness (centrality) is defined as the inverse sum of shortest distances to all other nodes from a focal node, indicating the expected time from a focal node to reach others. The clustering coefficient is defined as C
i
= 2n/k
i
(k
i
– 1), where n is the number of direct links connecting the k
i
nearest neighbors of node i. The average of C
i
over all nodes of a network assesses network modularity.
To validate the intimate relationship between cardiovascular targets and genes derived from the network properties, we calculated the GO-based semantic similarity between cardiovascular targets and genes. We firstly downloaded Biological Process (BP), Cellular Component (CC), or Molecular Function (MF) branches of the Gene Ontology (GO) from the GO database [22]. GO-based semantic similarity scores (GSS) between cardiovascular targets and genes were calculated according to Resnik [23], using the csbl.go R package [24] selecting the option to use all three ontologies. We calculated the average GSS of all pairs of cardiovascular target and gene. Random controls were obtained by selecting the same number of genes 10, 000 times randomly to control for cardiovascular genes. All statistics are shown in Additional file 1.
To accompany the findings from this study, an online database CVDSP (http://sm.nwsuaf.edu.cn/lsp/ cvdsp.php) was developed to allow researchers to access the underlying information in a user-friendly manner. We have included all of our data sets in this database. The drug-target interactions, gen-phenotype associations, drug-indication associations and target-gene relationships as well as their derivate networks such as drug-drug and gene-gene networks can be explored interactively. We will regularly update our data sets and the website to keep up with the growth of the databases used.
All the t-tests and z-tests were done in Mathematica (Wolfram Research) using the HypothesisTests package. Kolmogorov-Smirnov and Wilcoxon rank sum tests were done in Matlab (Mathworks) using the “kstest2” and “ranksum” commands, respectively. All the error terms in the text and the figures are the standard errors.









