A NEW CLASSIFICATION AND PREDICTION MODEL WITH TWO STAGE GENE SELECTION METHOD USING MINIMAL SUBSETS OF GENE EXPRESSION DATA

Mallika R.1, Saravanan V.2
1Department of Computer Science, Sri Ramakrishna College of Arts & Science for Women, Coimbatore, India.
2Department of Computer Applications, Karunya University, Coimbatore, India

Received : -     Accepted : -     Published : 21-12-2009
Volume : 1     Issue : 2       Pages : 14 - 25
Int J Mach Intell 1.2 (2009):14-25
DOI : http://dx.doi.org/10.9735/0975-2927.1.2.14-25

Keywords : Support Vector Machines, Linear Discriminant Analysis, K Nearest neighbour, Microarray data, Gene selection, Classification
Conflict of Interest : None declared

Cite - MLA : Mallika R. and Saravanan V. "A NEW CLASSIFICATION AND PREDICTION MODEL WITH TWO STAGE GENE SELECTION METHOD USING MINIMAL SUBSETS OF GENE EXPRESSION DATA." International Journal of Machine Intelligence 1.2 (2009):14-25. http://dx.doi.org/10.9735/0975-2927.1.2.14-25

Cite - APA : Mallika R., Saravanan V. (2009). A NEW CLASSIFICATION AND PREDICTION MODEL WITH TWO STAGE GENE SELECTION METHOD USING MINIMAL SUBSETS OF GENE EXPRESSION DATA. International Journal of Machine Intelligence, 1 (2), 14-25. http://dx.doi.org/10.9735/0975-2927.1.2.14-25

Cite - Chicago : Mallika R. and Saravanan V. "A NEW CLASSIFICATION AND PREDICTION MODEL WITH TWO STAGE GENE SELECTION METHOD USING MINIMAL SUBSETS OF GENE EXPRESSION DATA." International Journal of Machine Intelligence 1, no. 2 (2009):14-25. http://dx.doi.org/10.9735/0975-2927.1.2.14-25

Copyright : © 2009, Mallika R. and Saravanan V., Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

Data mining models are extensively used in the field of disease diagnosis. Gene expression data are a main factor for the success of disease diagnosis. With thousands of gene expression data, gene selection is being a big challenge prior to classification. The proposed method incorporates two stages in gene selection. In the first stage pair wise gene selection was performed using a popular statistical technique. In the second stage the gene pairs that achieved 100% Cross Validation (CV) accuracy of those genes selected in first stage were used for classification. The testing results were compared with the single stage method and improvement on the computational burden was also proven to be the best in the proposed two-stage method. The paper also compares the performances of the three different classifiers Support Vector Machines (SVM), K Nearest Neighbour (KNN), Linear Discriminant Analysis (LDA) and promising results have been achieved.

References

[1] Alireza Osareh, Bita Shadgar (2009) Journal of Applied Sciences 9(3):459-468,ISSN 1812-5654  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Andrew D. Keller, MichH Schummer, Lee Hood, Walter L. Ruzzo (2000), Technical Report UW-CSE-2000-08-01  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Chih-wei Hsu and chih jen Lin (2002) IEEE transactions on neural networks  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Dov Stekel (2003) Microarray Bioinformatics, Cambridge university press, ISBN 0 521- 670500  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Dudoit, S., Fridlyand, J., & Speed, T. (2002) Journal of the American Statistical Association, 97, 77–87  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Elena Marchiori, Michele Sebag (2005) Evo Workshops PP.74- 83.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Hong Chai and Carlotta Domeniconi (2001) Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] J. Jaeger, R. Sengupta, W.L. Ruzzo (2003) Pacific Symposium on Biocomputing 8:53- 64  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Juan Liu, Hitoshi Iba (2001) Genome Informatics 12: 14–23  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Li Y., Campbell C., Tipping M. (2002) Bioinformatics , 18:1332-1339.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Lipo Wang, Feng Chu, and Wei Xie (2007) IEEE/ACM Transactions on computational biology and bioinformatics, vol. 4, no. 1, January-march  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] Li-Yeh Chuang,Chao-Hsuan Ke, Hsueh-Wei Chang,Cheng-hong Yang (2009) OMICS A journal of Integrative Biology,Volume.13,number 2.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Mingjun Song, Sanguthevar Rajasekaran (2007) 21st International Conference on Advanced Information Networking and Applications workshop, IEEE  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Saravanan V., Mallika R. (2009) International conference on e- CASE and e-Technology.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] Saravanan V., Mallika R. (2009) International conference on computer engineering and technology,IEEE  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Tibshirani R., Hastie T., Narasimhan B., Chu G. (2003) Statistical Science, vol. 18, pp. 104-117  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Venu Satuluri (2007) March 15  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Yeo lee chin, Safaai Ders, Jurnal Teknologi (D) 43 (D). pp. 111-124. ISSN 0127-9696  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus