Feature Selection in Microarray using Proposed Hybrid Minimum Redundancy-Maximum Relevance (MRMR) and Modified Genetic Algorithm (MGA)

  • P. Nancy Vincentina Mary Department of MCA, Fatima College, Madurai, Tamil Nadu, India & Department of Computer and Information Science, Faculty of Science, Annamalai University, Chidambaram, Tamil Nadu, India https://orcid.org/0000-0003-0677-5483
  • R. Nagarajan Department of Computer and Information Science, Faculty of Science, Annamalai University, Chidambaram, Tamil Nadu, India https://orcid.org/0000-0001-7733-5085
Keywords: Classification, Feature Selection, Gene Selection, Gene Expression Classification, Microarray Classification, Modified Genetic Algorithm

Abstract

Gene expression microarray data commonly have an enormous number of genes with a smaller number of samples. In these genes, many are irrelevant, insignificant or redundant for the classification analysis. Therefore, the identification of informative genes, which have the greatest role in classification and diagnosis, is of essential and practical importance to the classification problems, such as cancer versus non-cancer classification and classification of different tumor types. This paper aims to present a novel idea for implementing MRMR, the hybrid Minimum Redundancy-Maximum Relevance method combined with a Modified Genetic Algorithm, to minimize the selection of microarray data feature sets. This paper proposes a two-step feature selection algorithm by integrating Minimum Redundancy Maximum Relevance (MRMR) and Modified Genetic Algorithm (MGA). In the first step, MRMR is used to filter redundant genes in high-dimensional microarray data. The second step is used to eliminate irrelevant genes. The proposed MRMR-MGA algorithm is compared with traditional MRMR with the GA algorithm. The implementation results show that the proposed method has good selection and classification performances.

References

Albadr, M. A., Tiun, S., Ayob, M., & Al-Dhief, F. (2020). Genetic algorithm based on natural selection theory for optimization problems. Symmetry, 12(11), 1758. https://doi.org/10.3390/sym12111758

Almugren, N., & Alshamlan, H. (2019). A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access, 7, 78533-78548. https://doi.org/10.1109/ACCESS.2019.2922987

Alromema, N., Syed, A. H., & Khan, T. (2023). A hybrid machine learning approach to screen optimal predictors for the classification of primary breast tumors from gene expression microarray data. Diagnostics, 13(4), 708. https://doi.org/10.3390/diagnostics13040708

Balcha, A. and Woldie, S. (2023). Impact of Genetic Algorithm for the Diagnosis of Breast Cancer: Literature Review. Advances in Infectious Diseases, 13, 41-46. https://doi.org/10.4236/aid.2023.131005.

Bhartiya, R., & Prajapati, G. L. (2023). NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data. EAI Endorsed Transactions on Pervasive Health and Technology, 9(1). http://dx.doi.org/10.4108/eetpht.9.3910

Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324

Brown, E. C., & Sumichrast, R. T. (2005). Evaluating performance advantages of grouping genetic algorithms.Engineering Applications of Artificial Intelligence, 18(1), 1-12. https://doi.org/10.1016/j.engappai.2004.08.024

Cerrada, M., Zurita, G., Cabrera, D., Sánchez, R. V., Artés, M., & Li, C. (2016). Fault diagnosis in spur gears based on genetic algorithm and random forest. Mechanical Systems and Signal Processing, 70, 87-103. https://doi.org/10.1016/j.ymssp.2015.08.030

Cheng, J. H., Sun, D. W., & Pu, H. (2016). Combining the genetic algorithm and successive projection algorithm for the selection of feature wavelengths to evaluate exudative characteristics in frozen–thawed fish muscle. Food Chemistry, 197, 855-863. https://doi.org/10.1016/j.foodchem.2015.11.019

El Akadi, A., Amine, A., El Ouardighi, A., &Aboutajdine, D. (2011). A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowledge and Information Systems, 26, 487-500. https://doi.org/10.1007/s10115-010-0288-x

Ghaheri, A., Shoar, S., Naderan, M., & Hoseini, S. S. (2015). The Applications of Genetic Algorithms in Medicine. Oman Medical Journal, 30(6), 406–416. https://doi.org/10.5001/omj.2015.82

Hajieskandar, A., Mohammadzadeh, J., Khalilian, M., & Najafi, A. (2020). Molecular cancer classification method on microarrays gene expression data using hybrid deep neural network and grey wolf algorithm. Journal of Ambient Intelligence & Humanized Computing, 14(5), 5297–5307. https://doi.org/10.1007/s12652-020-02478-x

Hameed, S.S., Hassan, R., Hassan, W. H., Muhammadsharif, F. F., & Latiff, L. A. (2021). The microarray dataset of ovarian cancer in csv format. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0246039.s006

Li, X., & Yin, M. (2013). Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Transactions on NanoBioscience, 12(4), 343-353. https://doi.org/10.1109/tnb.2013.2294716

Liu, S., Tai, H., Ding, Q., Li, D., Xu, L., & Wei, Y. (2013). A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction.Mathematical and Computer Modelling, 58(3-4), 458-465. https://doi.org/10.1016/j.mcm.2011.11.021

Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H. S. (2018).A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access, 6, 22863-22874. https://doi.org/10.1109/ACCESS.2018.2818682

Mandal, M., & Mukhopadhyay, A. (2013).An improved minimum redundancy maximum relevance approach for feature selection in gene expression data.Procedia Technology, 10, 20-27. https://doi.org/10.1016/j.protcy.2013.12.332

Mishra, V., Mishra, M., Sheetlani, J., Kumar, A., Pachouri, P., Nagapraveena, T., Puttamallaiah, A., Sravya, M., & Parijatha, K. (2023). The Classification and Segmentation of Pneumonia using Deep Learning Algorithms: A Comparative Study. Int. J. Exp. Res. Rev., 36, 76-88. https://doi.org/10.52756/ijerr.2023.v36.007

Osama, S., Shaban, H., & Ali, A. A. (2023). Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review. Expert Systems with Applications, 213, 118946. https://doi.org/10.1016/j.eswa.2022.118946

Shukla, A. K., Singh, P., & Vardhan, M. (2020). Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm and Evolutionary Computation, 54, 100661. https://doi.org/10.1016/j.swevo.2020.100661

Syahidin, Y., Maulidevi, N. U., & Surendro, K. (2023). Feature selection method based on genetic algorithm with wrapper-embedded technique for medical record classification. In Proceedings of the 2023 12th International Conference on Software and Computer Applications, pp. 184-191. https://doi.org/10.1145/3587828.3587856

Tyagi, K., Kumar, D., & Gupta, R. (2024). Application of Genetic Algorithms for Medical Diagnosis of Diabetes Mellitus. International Journal of Experimental Research and Review, 37(Special Vol), 1-10. https://doi.org/10.52756/ijerr.2024.v37spl.001

Welikala, R., Fraz, M., Dehmeshki, J., Hoppe, A., Tah, V., Mann, S., Williamson, T., & Barman, S. (2015). Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Computerized Medical Imaging and Graphics, 43, 64–77. https://doi.org/10.1016/j.compmedimag.2015.03.003

Zare, M., Azizizadeh, N., & Kazemipour, A. (2023). Supervised feature selection on gene expression microarray datasets using manifold learning. Chemometrics and Intelligent Laboratory Systems, 237, 104828. https://doi.org/10.1016/j.chemolab.2023.104828

Ziegler, A., & König, I. R. (2014). Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(1), 55-63. https://doi.org/10.1002/widm.1114

Published
2024-05-30
How to Cite
Mary, P. N., & Nagarajan, R. (2024). Feature Selection in Microarray using Proposed Hybrid Minimum Redundancy-Maximum Relevance (MRMR) and Modified Genetic Algorithm (MGA). International Journal of Experimental Research and Review, 39(Spl Volume), 82-91. https://doi.org/10.52756/ijerr.2024.v39spl.006