INFORMATION THEORY TO FIND CO-EXPRESSED GENE NETWORK FOR MICROARRAY GENE EXPRESSION

DAMLE T.1*, KSHIRSAGAR M.2
1Department of Computer Technology, Yeshwantrao Chavan College of Enggineering, Nagpur- 441110, MS, India.
2Department of Computer Technology, Yeshwantrao Chavan College of Enggineering, Nagpur- 441110, MS, India.
* Corresponding Author : damle.tejashree@gmail.com

Received : 15-03-2012     Accepted : 12-04-2012     Published : 16-04-2012
Volume : 3     Issue : 2       Pages : 85 - 87
J Signal Image Process 3.2 (2012):85-87

Cite - MLA : DAMLE T. and KSHIRSAGAR M. "INFORMATION THEORY TO FIND CO-EXPRESSED GENE NETWORK FOR MICROARRAY GENE EXPRESSION ." Journal of Signal and Image Processing 3.2 (2012):85-87.

Cite - APA : DAMLE T., KSHIRSAGAR M. (2012). INFORMATION THEORY TO FIND CO-EXPRESSED GENE NETWORK FOR MICROARRAY GENE EXPRESSION . Journal of Signal and Image Processing, 3 (2), 85-87.

Cite - Chicago : DAMLE T. and KSHIRSAGAR M. "INFORMATION THEORY TO FIND CO-EXPRESSED GENE NETWORK FOR MICROARRAY GENE EXPRESSION ." Journal of Signal and Image Processing 3, no. 2 (2012):85-87.

Copyright : © 2012, DAMLE T. and KSHIRSAGAR M., Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

Information theory is useful for finding the information content of the data which is referred to as Entropy. Microarray chip gives the data for gene expression. For finding co-expression between different genes generated from microarray, we have applied mutual information theory by finding entropy of each gene expression. This mutual information is converted into adjacency and dissimilarity matrix. Co-expressed networks are formed by applying cut- tree algorithm to dissimilarity matrix. Our work finds the pair wise relatedness of different microarray genes. This relatedness finds the different gene networks. We used the diabetes Mellitus Type II as a disease model. This paper describes our approach of finding co-expressed gene network .

Keywords

Entropy, tree cut, co-expressed gene network.

Introduction

In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables. The most common unit of measurement of mutual information is the bit, when logarithms to the base 2 are used. From Microarray chip thousand of gene expression produced simultaneously. Mutual Information theory calculates the pair wise relatedness of gene sets and then finds matrices. This information is useful for analyzing the dependency between gene set.
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The average linkage clustering is a method of calculating distance between clusters in hierarchical cluster analysis. The linkage function specifying the distance between two clusters is computed as the average distance between objects from the first cluster and objects from the second cluster.
Our approach uses average linkage method for finding the clusters of correlated and similar genes. Co- expressed networks are formed from this clusters. For finding Co-expressed networks we used the gene set after elaborating Significance Analysis of Microarray (SAM) algorithm. This set contains total 238 genes.

Data Source

In this we take data from Mootha VK et al. (2003). PGC-1α-responsive genes involved in oxidative phosphorylation are co-ordinately down regulated in human diabetes. Nature Genetics; Vol 34(3); 267-273. The disease model is Diabetes mellitus (Type II). The study involved 34 males, 17 with normal glucose tolerance (NGT), and 17 with Diabetes Mellitus (DM Type II). [7]

Process of finding Co-expressed Gene Network

[Fig-1] shows the flowchart that explains our approach for finding co-expressed gene network. We used different R packages for finding gene network. The Co-expression networks are formed separately for Diabetes and Normal microarray Data. The steps given below explains the process of finding Co-expressed gene network.
Step 1- mutual Info uses the binning approach. The distance between entropies of each gene set is calculated. The output of this step is mutual information matrix. [Table-1] shows this matrix.
Step 2- Hierarchical clustering is takes place using hclust() method in R and average linkage analysis for the mutual information matrix. [Fig-2] shows the cluster dendrogram. [Fig-3] shows the dendrogram using cut tree method.
Step 3- ARACNE is Algorithm for the Reconstruction of Accurate Cellular Networks which is a method for reconstructing biological networks from microarray data. [9,10]
Step 4- The adjacency matrix formed is given in [Table-2] .
Step 5- igraph is plotted using graph. adjency() method and is shown in [Fig-4] .
Step 6- [Table-3] shows the edge list obtained after edge.getList()
Method. We obtained total 937 edges. [Fig-5] shows the Co-Expression network of normal data with cluster frequency 10,11 and 12.

Conclusion

In our work Microarray gene set is used for finding Co-Expressed gene network. This set consists of 238 genes. We found the hub of genes which are related with each other. In our further analysis we will explore this data for finding regression analysis of data with different biochemical parameters.

References

[1] Tusher V.G., Tibshirani R. and Chu G. (2001) PNAS, 98 (9).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Chu G., Li J., Balasubramanian Narasimhan, Tibshirani R. and Tusher V. (2002) Department of Biochemistry.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Selvaraj S. and Natarajan J. (2011) Bioinformation, 6 (3), 95-99.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Sivriver J., Habib N. and Friedman N. (2011) An integrative clustering and modeling algorithm for dynamical gene expression data.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Langfelder P., Zhang B. and Horvath S. (2008) Data Mining of Microarray Databases for the Analysis of Environmental Factors on Plants Using Cluster Analysis and Predictive Regression, 24 (5), 719-720.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Hovatta I., Saharinen J., Kimppa K., Laine M.M. and Antti Lehmussola (2005) DNA microarray data analysis.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Mootha V.K., Lindgren C.M., Eriksson K.F. and Aravind Subramanian (2003) Nature Genetics, Nature Genetics, 34 (3), 267-273.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] http://spotfire.tibco.com.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Meyer P.E., Lafitte F. and Bontempi G. (2008) BMC Bioinformatics.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Margolin A.A., Nemenman I., Basso K, Wiggins C., Stolovitzky G., Favera R.D., Califano A. (2006) BMC Bioinformatics.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Flowchart for finding Co-expressed Gene Network
Fig. 2- Cluster Dendrogram for Normal Data
Fig. 3- Hierarchical clustering for adjacency matrix
Fig. 4- Plot for Edge list
Fig. 5- Co-expression network for frequency 10,11,12
Table 1- Mutual Information Matrix
Table 2- adjacency matrix for Normal Data
Table 3- Edge list