GLSTM: A novel approach for prediction of real & synthetic PID diabetes data using GANs and LSTM classification model
DOI:
https://doi.org/10.52756/ijerr.2023.v30.004Keywords:
GLSTM, GAN, ML, PID, TVAEAbstract
Generative Adversarial Network (GAN) is a revolution in modern artificial systems. Deep learning-based Generative adversarial networks generate realistic synthetic tabular data. Synthetic data are used to enhance the size of a relatively small training dataset while ensuring the confidentiality of the original data. In this context, we implemented the GAN framework for generating diabetes data to help the health care professional in more clinical applications. GAN is used to validate the Pima Indian Diabetes (PID) Dataset. Various preprocessing techniques, such as handling missing values, outliers and data imbalance problems, enhance data quality. Some exploratory data analyses, such as heat maps, bar graphs and histograms, are used for data visualisation. We employed hypothesis testing to examine the resemblance between real data and GAN-generated synthetic data. In this study, we proposed a GAN-Long Short-Term Memory (GLSTM) system, in which GAN is used for data augmentation, and LSTM is used for diabetes classification. Additionally, various GAN models such as CTGAN, Vanilla GAN, Coupula GAN, Gaussian Coupula GAN, and TVAE GAN are used to generate the synthetic dataset. Experiments were conducted on real data, synthetic data, and by combining real and synthetic data. The model that used both real and synthetic data obtained a substantially better accuracy of 97% compared to 92% when only real data was used. We also observed that synthetic data could be used in place of real data, as the mean correlation between synthetic and real data is 0.93. Our study's findings outperformed when compared to state-of-the-art methodologies.
References
Akella, A., & Akella, S. (2021). Machine learning algorithms for predicting coronary artery disease: Efforts toward an open source solution. Future Science OA, 7(6). 7(6), FSO698. https://doi.org/10.2144/FSOA-2020-0206
Alaa Khaleel, F., & Al-Bakry, A. M. (2021). Diagnosis of diabetes using machine learning algorithms. Materials Today: Proceedings. https://doi.org/10.1016/J.MATPR.2021.07.196
Albahli, S. (2020). Type 2 Machine Learning: An Effective Hybrid Prediction Model for Early Type 2 Diabetes Detection. Journal of Medical Imaging and Health Informatics, 10(5), 1069–1075. ttps://doi.org/10.1166/JMIHI.2020.3000
Alex, S. A., Nayahi, J. J. V., Shine, H., & Gopirekha, V. (2022). Deep convolutional neural network for diabetes mellitus prediction. Neural Computing and Applications, 34(2), 1319–1327. https://doi.org/10.1007/S00521-021-06431-7/FIGURES/3
Anil Kumar, C., Harish, S., Ravi, P., Svn, M., Kumar, B. P. P., Mohanavel, V., Alyami, N. M., Priya, S. S., & Asfaw, A. K. (2022). Lung Cancer Prediction from Text Datasets Using Machine Learning. BioMed Research International, 2022. https://doi.org/10.1155/2022/6254177
Aruna Kumari, G. L., Padmaja, P., & Suma, J. G. (2022). A novel method for prediction of diabetes mellitus using deep convolutional neural network and long short-term memory. Indonesian Journal of Electrical Engineering and Computer Science, 26(1), 404–413. https://doi.org/10.11591/IJEECS.V26.I1.PP404-413
Azad, C., Bhushan, B., Sharma, Rohit, Shankar, A., Krishna, •, Singh, K., Khamparia, A., Sharma, R., & Singh, K. K. (n.d.). Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus. 1, 3. https://doi.org/10.1007/s00530-021-00817-2
Baowaly, M. K., Lin, C. C., Liu, C. L., & Chen, K. T. (2019). Synthesizing electronic health records using improved generative adversarial networks. Journal of the American Medical Informatics Association : JAMIA, 26(3), 228. https://doi.org/10.1093/JAMIA/OCY142
Bukhari, M. M., Alkhamees, B. F., Hussain, S., Gumaei, A., Assiri, A., & Ullah, S. S. (2021). An Improved Artificial Neural Network Model for Effective Diabetes Prediction. Complexity, 2021. https://doi.org/10.1155/2021/5525271
Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. Journal of Healthcare Engineering, 2021. https://doi.org/10.1155/2021/9930985
Daanouni, O., Cherradi, B., & Tmiri, A. (2020). Type 2 Diabetes Mellitus Prediction Model Based on Machine Learning Approach. 454–469. https://doi.org/10.1007/978-3-030-37629-1_33
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622
Heo, J. N., Yoo, J., Lee, H., Lee, I. H., Kim, J. S., Park, E., Kim, Y. D., & Nam, H. S. (2022). Prediction of Hidden Coronary Artery Disease Using Machine Learning in Patients With Acute Ischemic Stroke. Neurology, 99(1), E55–E65. https://doi.org/10.1212/WNL.0000000000200576
Islam, S. M. S., Talukder, A., Awal, M. A., Siddiqui, M. M. U., Ahamad, M. M., Ahammed, B., Rawal, L. B., Alizadehsani, R., Abawajy, J., Laranjo, L., Chow, C. K., & Maddison, R. (2022). Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries. Frontiers in Cardiovascular Medicine, 9, 762. https://doi.org/10.3389/FCVM.2022.839379/BIBTEX
Jaiswal, S., & Gupta, P. (2021). MLP-DTP: Performance Evaluation of Diabetes Class Prediction. IEMECON 2021 - 10th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks. https://doi.org/10.1109/IEMECON53809.2021.9689183
Jaiswal, S., & Gupta, P. (2022). Ensemble Approach: XGBoost, CATBoost, and LightGBM for Diabetes Mellitus Risk Prediction. pp. 1–6. https://doi.org/10.1109/ICCSEA54677.2022.9936130
Kaur, H., & Kumari, V. (2022). Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics, 18(1–2), 90–100. https://doi.org/10.1016/J.ACI.2018.12.004/FULL/PDF
Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40–46. https://doi.org/10.1016/J.IJCCE.2021.01.001
Li, X., Zhang, J., & Safara, F. (2021). Improving the Accuracy of Diabetes Diagnosis Applications through a Hybrid Feature Selection Algorithm. Neural Processing Letters. pp. 1-17. https://doi.org/10.1007/S11063-021-10491-0
Mahboob Alam, T., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Imtiaz Baig, T., Hussain, A., Malik, M. A., Raza, M. M., Ibrar, S., & Abbas, Z. (2019). A model for early prediction of diabetes. Informatics in Medicine Unlocked, 16, 100204. https://doi.org/10.1016/J.IMU.2019.100204
Mentis, A. F. A., Garcia, I., Jiménez, J., Paparoupa, M., Xirogianni, A., Papandreou, A., & Tzanakaki, G. (2021). Artificial Intelligence in Differential Diagnostics of Meningitis: A Nationwide Study. Diagnostics (Basel, Switzerland), 11(4), 602. https://doi.org/10.3390/DIAGNOSTICS11040602
Meraihi, Y., Gabis, A. B., Mirjalili, S., Ramdane-Cherif, A., & Alsaadi, F. E. (2022). Machine Learning-Based Research for COVID-19 Detection, Diagnosis, and Prediction: A Survey. SN Computer Science, 3(4). https://doi.org/10.1007/S42979-022-01184-Z
Monirujjaman Khan, M., Islam, S., Sarkar, S., Ayaz, F. I., Ananda, M. K., Tazin, T., Albraikan, A. A., & Almalki, F. A. (2022). Machine Learning Based Comparative Analysis for Breast Cancer Prediction. Journal of Healthcare Engineering, 2022. https://doi.org/10.1155/2022/4365855
Rajendra, P., & Latifi, S. (2021). Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update, 1, 100032. https://doi.org/10.1016/J.CMPBUP.2021.100032
Ramezani, R., Maadi, M., & Khatami, S. M. (2018). A novel hybrid intelligent system with missing value imputation for diabetes diagnosis. Alexandria Engineering Journal, 57(3), 1883–1891. https://doi.org/10.1016/J.AEJ.2017.03.043
Roopa, H., & Asha, T. (2019). A Linear Model Based on Principal Component Analysis for Disease Prediction. IEEE Access, 7, 105314–105318. https://doi.org/10.1109/ACCESS.2019.2931956
Rufo, D. D., Debelee, T. G., Ibenthal, A., & Negera, W. G. (2021). Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM). Diagnostics (Basel, Switzerland), 11(9). https://doi.org/10.3390/DIAGNOSTICS11091714
Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., & Williams, R. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Research and Clinical Practice, 157. https://doi.org/10.1016/J.DIABRES.2019.107843
Saxena, R., Sharma, S. K., Gupta, M., & Sampada, G. C. (2022). A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/3820360
Saxena, S., Mohapatra, D., Padhee, S., & Sahoo, G. K. (2021). Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evolutionary Intelligence, 1, 1–17. https://doi.org/10.1007/S12065-021-00685-9/FIGURES/17
Takahashi, S., Chen, Y., & Tanaka-Ishii, K. (2019). Modeling financial time-series with generative adversarial networks. Physica A: Statistical Mechanics and Its Applications, 527, 121261. https://doi.org/10.1016/J.PHYSA.2019.121261
Zhou, X., Wei, Y., Xing, G., Feng, Y., & Song, L. (2023). A Survey in Virtual Image Generation Based on Generative Adversarial Networks. 137–143. https://doi.org/10.1007/978-981-99-1256-8_16