Feature Selection in Microarray using Proposed Hybrid Minimum Redundancy-Maximum Relevance (MRMR) and Modified Genetic Algorithm (MGA)
DOI:
https://doi.org/10.52756/ijerr.2024.v39spl.006Keywords:
Classification, Feature Selection, Gene Selection, Gene Expression Classification, Microarray Classification, Modified Genetic AlgorithmAbstract
Gene expression microarray data commonly have an enormous number of genes with a smaller number of samples. In these genes, many are irrelevant, insignificant or redundant for the classification analysis. Therefore, the identification of informative genes, which have the greatest role in classification and diagnosis, is of essential and practical importance to the classification problems, such as cancer versus non-cancer classification and classification of different tumor types. This paper aims to present a novel idea for implementing MRMR, the hybrid Minimum Redundancy-Maximum Relevance method combined with a Modified Genetic Algorithm, to minimize the selection of microarray data feature sets. This paper proposes a two-step feature selection algorithm by integrating Minimum Redundancy Maximum Relevance (MRMR) and Modified Genetic Algorithm (MGA). In the first step, MRMR is used to filter redundant genes in high-dimensional microarray data. The second step is used to eliminate irrelevant genes. The proposed MRMR-MGA algorithm is compared with traditional MRMR with the GA algorithm. The implementation results show that the proposed method has good selection and classification performances.
References
Albadr, M. A., Tiun, S., Ayob, M., & Al-Dhief, F. (2020). Genetic algorithm based on natural selection theory for optimization problems. Symmetry, 12(11), 1758. https://doi.org/10.3390/sym12111758
Almugren, N., & Alshamlan, H. (2019). A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access, 7, 78533-78548. https://doi.org/10.1109/ACCESS.2019.2922987
Alromema, N., Syed, A. H., & Khan, T. (2023). A hybrid machine learning approach to screen optimal predictors for the classification of primary breast tumors from gene expression microarray data. Diagnostics, 13(4), 708. https://doi.org/10.3390/diagnostics13040708
Balcha, A. and Woldie, S. (2023). Impact of Genetic Algorithm for the Diagnosis of Breast Cancer: Literature Review. Advances in Infectious Diseases, 13, 41-46. https://doi.org/10.4236/aid.2023.131005.
Bhartiya, R., & Prajapati, G. L. (2023). NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data. EAI Endorsed Transactions on Pervasive Health and Technology, 9(1). http://dx.doi.org/10.4108/eetpht.9.3910
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Brown, E. C., & Sumichrast, R. T. (2005). Evaluating performance advantages of grouping genetic algorithms.Engineering Applications of Artificial Intelligence, 18(1), 1-12. https://doi.org/10.1016/j.engappai.2004.08.024
Cerrada, M., Zurita, G., Cabrera, D., Sánchez, R. V., Artés, M., & Li, C. (2016). Fault diagnosis in spur gears based on genetic algorithm and random forest. Mechanical Systems and Signal Processing, 70, 87-103. https://doi.org/10.1016/j.ymssp.2015.08.030
Cheng, J. H., Sun, D. W., & Pu, H. (2016). Combining the genetic algorithm and successive projection algorithm for the selection of feature wavelengths to evaluate exudative characteristics in frozen–thawed fish muscle. Food Chemistry, 197, 855-863. https://doi.org/10.1016/j.foodchem.2015.11.019
El Akadi, A., Amine, A., El Ouardighi, A., &Aboutajdine, D. (2011). A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowledge and Information Systems, 26, 487-500. https://doi.org/10.1007/s10115-010-0288-x
Ghaheri, A., Shoar, S., Naderan, M., & Hoseini, S. S. (2015). The Applications of Genetic Algorithms in Medicine. Oman Medical Journal, 30(6), 406–416. https://doi.org/10.5001/omj.2015.82
Hajieskandar, A., Mohammadzadeh, J., Khalilian, M., & Najafi, A. (2020). Molecular cancer classification method on microarrays gene expression data using hybrid deep neural network and grey wolf algorithm. Journal of Ambient Intelligence & Humanized Computing, 14(5), 5297–5307. https://doi.org/10.1007/s12652-020-02478-x
Hameed, S.S., Hassan, R., Hassan, W. H., Muhammadsharif, F. F., & Latiff, L. A. (2021). The microarray dataset of ovarian cancer in csv format. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0246039.s006
Li, X., & Yin, M. (2013). Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Transactions on NanoBioscience, 12(4), 343-353. https://doi.org/10.1109/tnb.2013.2294716
Liu, S., Tai, H., Ding, Q., Li, D., Xu, L., & Wei, Y. (2013). A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction.Mathematical and Computer Modelling, 58(3-4), 458-465. https://doi.org/10.1016/j.mcm.2011.11.021
Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H. S. (2018).A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access, 6, 22863-22874. https://doi.org/10.1109/ACCESS.2018.2818682
Mandal, M., & Mukhopadhyay, A. (2013).An improved minimum redundancy maximum relevance approach for feature selection in gene expression data.Procedia Technology, 10, 20-27. https://doi.org/10.1016/j.protcy.2013.12.332
Mishra, V., Mishra, M., Sheetlani, J., Kumar, A., Pachouri, P., Nagapraveena, T., Puttamallaiah, A., Sravya, M., & Parijatha, K. (2023). The Classification and Segmentation of Pneumonia using Deep Learning Algorithms: A Comparative Study. Int. J. Exp. Res. Rev., 36, 76-88. https://doi.org/10.52756/ijerr.2023.v36.007
Osama, S., Shaban, H., & Ali, A. A. (2023). Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review. Expert Systems with Applications, 213, 118946. https://doi.org/10.1016/j.eswa.2022.118946
Shukla, A. K., Singh, P., & Vardhan, M. (2020). Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm and Evolutionary Computation, 54, 100661. https://doi.org/10.1016/j.swevo.2020.100661
Syahidin, Y., Maulidevi, N. U., & Surendro, K. (2023). Feature selection method based on genetic algorithm with wrapper-embedded technique for medical record classification. In Proceedings of the 2023 12th International Conference on Software and Computer Applications, pp. 184-191. https://doi.org/10.1145/3587828.3587856
Tyagi, K., Kumar, D., & Gupta, R. (2024). Application of Genetic Algorithms for Medical Diagnosis of Diabetes Mellitus. International Journal of Experimental Research and Review, 37(Special Vol), 1-10. https://doi.org/10.52756/ijerr.2024.v37spl.001
Welikala, R., Fraz, M., Dehmeshki, J., Hoppe, A., Tah, V., Mann, S., Williamson, T., & Barman, S. (2015). Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Computerized Medical Imaging and Graphics, 43, 64–77. https://doi.org/10.1016/j.compmedimag.2015.03.003
Zare, M., Azizizadeh, N., & Kazemipour, A. (2023). Supervised feature selection on gene expression microarray datasets using manifold learning. Chemometrics and Intelligent Laboratory Systems, 237, 104828. https://doi.org/10.1016/j.chemolab.2023.104828
Ziegler, A., & König, I. R. (2014). Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(1), 55-63. https://doi.org/10.1002/widm.1114