DAMLE T.1*, KSHIRSAGAR M.2
1Department of Computer Technology, Yeshwantrao Chavan College of Enggineering, Nagpur- 441110, MS, India.
2Department of Computer Technology, Yeshwantrao Chavan College of Enggineering, Nagpur- 441110, MS, India.
* Corresponding Author : damle.tejashree@gmail.com
Received : 15-03-2012 Accepted : 12-04-2012 Published : 16-04-2012
Volume : 3 Issue : 2 Pages : 85 - 87
J Signal Image Process 3.2 (2012):85-87
Information theory is useful for finding the information content of the data which is referred to as Entropy. Microarray chip gives the data for gene expression. For finding co-expression between different genes generated from microarray, we have applied mutual information theory by finding entropy of each gene expression. This mutual information is converted into adjacency and dissimilarity matrix. Co-expressed networks are formed by applying cut- tree algorithm to dissimilarity matrix. Our work finds the pair wise relatedness of different microarray genes. This relatedness finds the different gene networks. We used the diabetes Mellitus Type II as a disease model. This paper describes our approach of finding co-expressed gene network .
Entropy, tree cut, co-expressed gene network.
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables. The most common unit of measurement of mutual information is the bit, when logarithms to the base 2 are used. From Microarray chip thousand of gene expression produced simultaneously. Mutual Information theory calculates the pair wise relatedness of gene sets and then finds matrices. This information is useful for analyzing the dependency between gene set.
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The average linkage clustering is a method of calculating distance between clusters in hierarchical cluster analysis. The linkage function specifying the distance between two clusters is computed as the average distance between objects from the first cluster and objects from the second cluster.
Our approach uses average linkage method for finding the clusters of correlated and similar genes. Co- expressed networks are formed from this clusters. For finding Co-expressed networks we used the gene set after elaborating Significance Analysis of Microarray (SAM) algorithm. This set contains total 238 genes.
In this we take data from Mootha VK et al. (2003). PGC-1α-responsive genes involved in oxidative phosphorylation are co-ordinately down regulated in human diabetes. Nature Genetics; Vol 34(3); 267-273. The disease model is Diabetes mellitus (Type II). The study involved 34 males, 17 with normal glucose tolerance (NGT), and 17 with Diabetes Mellitus (DM Type II). [7]
[Fig-1] shows the flowchart that explains our approach for finding co-expressed gene network. We used different R packages for finding gene network. The Co-expression networks are formed separately for Diabetes and Normal microarray Data. The steps given below explains the process of finding Co-expressed gene network.
Step 1- mutual Info uses the binning approach. The distance between entropies of each gene set is calculated. The output of this step is mutual information matrix. [Table-1] shows this matrix.
Step 2- Hierarchical clustering is takes place using hclust() method in R and average linkage analysis for the mutual information matrix. [Fig-2] shows the cluster dendrogram. [Fig-3] shows the dendrogram using cut tree method.
Step 3- ARACNE is Algorithm for the Reconstruction of Accurate Cellular Networks which is a method for reconstructing biological networks from microarray data. [9,10]
Step 4- The adjacency matrix formed is given in [Table-2] .
Step 5- igraph is plotted using graph. adjency() method and is shown in [Fig-4] .
Step 6- [Table-3] shows the edge list obtained after edge.getList()
Method. We obtained total 937 edges. [Fig-5] shows the Co-Expression network of normal data with cluster frequency 10,11 and 12.
In our work Microarray gene set is used for finding Co-Expressed gene network. This set consists of 238 genes. We found the hub of genes which are related with each other. In our further analysis we will explore this data for finding regression analysis of data with different biochemical parameters.
[1] Tusher V.G., Tibshirani R. and Chu G. (2001) PNAS, 98 (9).
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[2] Chu G., Li J., Balasubramanian Narasimhan, Tibshirani R. and Tusher V. (2002) Department of Biochemistry.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[3] Selvaraj S. and Natarajan J. (2011) Bioinformation, 6 (3), 95-99.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[4] Sivriver J., Habib N. and Friedman N. (2011) An integrative clustering and modeling algorithm for dynamical gene expression data.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[5] Langfelder P., Zhang B. and Horvath S. (2008) Data Mining of Microarray Databases for the Analysis of Environmental Factors on Plants Using Cluster Analysis and Predictive Regression, 24 (5), 719-720.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[6] Hovatta I., Saharinen J., Kimppa K., Laine M.M. and Antti Lehmussola (2005) DNA microarray data analysis.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[7] Mootha V.K., Lindgren C.M., Eriksson K.F. and Aravind Subramanian (2003) Nature Genetics, Nature Genetics, 34 (3), 267-273.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[8] http://spotfire.tibco.com.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[9] Meyer P.E., Lafitte F. and Bontempi G. (2008) BMC Bioinformatics.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[10] Margolin A.A., Nemenman I., Basso K, Wiggins C., Stolovitzky G., Favera R.D., Califano A. (2006) BMC Bioinformatics.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus