GLSTM: A novel approach for prediction of real & synthetic PID diabetes data using GANs and LSTM classification model

  • Sushma Jaiswal Department of Computer Science & Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, India https://orcid.org/0000-0002-6253-7327
  • Priyanka Gupta Department of Computer Science & Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, India https://orcid.org/0000-0001-8643-6857
Keywords: GLSTM, GAN, ML, PID, TVAE

Abstract

Generative Adversarial Network (GAN) is a revolution in modern artificial systems. Deep learning-based Generative adversarial networks generate realistic synthetic tabular data. Synthetic data are used to enhance the size of a relatively small training dataset while ensuring the confidentiality of the original data. In this context, we implemented the GAN framework for generating diabetes data to help the health care professional in more clinical applications. GAN is used to validate the Pima Indian Diabetes (PID) Dataset. Various preprocessing techniques, such as handling missing values, outliers and data imbalance problems, enhance data quality. Some exploratory data analyses, such as heat maps, bar graphs and histograms, are used for data visualisation. We employed hypothesis testing to examine the resemblance between real data and GAN-generated synthetic data. In this study, we proposed a GAN-Long Short-Term Memory (GLSTM) system, in which GAN is used for data augmentation, and LSTM is used for diabetes classification. Additionally, various GAN models such as CTGAN, Vanilla GAN, Coupula GAN, Gaussian Coupula GAN, and TVAE GAN are used to generate the synthetic dataset. Experiments were conducted on real data, synthetic data, and by combining real and synthetic data. The model that used both real and synthetic data obtained a substantially better accuracy of 97% compared to 92% when only real data was used. We also observed that synthetic data could be used in place of real data, as the mean correlation between synthetic and real data is 0.93. Our study's findings outperformed when compared to state-of-the-art methodologies.

References

Akella, A., & Akella, S. (2021). Machine learning algorithms for predicting coronary artery disease: Efforts toward an open source solution. Future Science OA, 7(6). 7(6), FSO698. https://doi.org/10.2144/FSOA-2020-0206

Alaa Khaleel, F., & Al-Bakry, A. M. (2021). Diagnosis of diabetes using machine learning algorithms. Materials Today: Proceedings. https://doi.org/10.1016/J.MATPR.2021.07.196

Albahli, S. (2020). Type 2 Machine Learning: An Effective Hybrid Prediction Model for Early Type 2 Diabetes Detection. Journal of Medical Imaging and Health Informatics, 10(5), 1069–1075. ttps://doi.org/10.1166/JMIHI.2020.3000

Alex, S. A., Nayahi, J. J. V., Shine, H., & Gopirekha, V. (2022). Deep convolutional neural network for diabetes mellitus prediction. Neural Computing and Applications, 34(2), 1319–1327. https://doi.org/10.1007/S00521-021-06431-7/FIGURES/3

Anil Kumar, C., Harish, S., Ravi, P., Svn, M., Kumar, B. P. P., Mohanavel, V., Alyami, N. M., Priya, S. S., & Asfaw, A. K. (2022). Lung Cancer Prediction from Text Datasets Using Machine Learning. BioMed Research International, 2022. https://doi.org/10.1155/2022/6254177

Aruna Kumari, G. L., Padmaja, P., & Suma, J. G. (2022). A novel method for prediction of diabetes mellitus using deep convolutional neural network and long short-term memory. Indonesian Journal of Electrical Engineering and Computer Science, 26(1), 404–413. https://doi.org/10.11591/IJEECS.V26.I1.PP404-413

Azad, C., Bhushan, B., Sharma, Rohit, Shankar, A., Krishna, •, Singh, K., Khamparia, A., Sharma, R., & Singh, K. K. (n.d.). Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus. 1, 3. https://doi.org/10.1007/s00530-021-00817-2

Baowaly, M. K., Lin, C. C., Liu, C. L., & Chen, K. T. (2019). Synthesizing electronic health records using improved generative adversarial networks. Journal of the American Medical Informatics Association : JAMIA, 26(3), 228. https://doi.org/10.1093/JAMIA/OCY142

Bukhari, M. M., Alkhamees, B. F., Hussain, S., Gumaei, A., Assiri, A., & Ullah, S. S. (2021). An Improved Artificial Neural Network Model for Effective Diabetes Prediction. Complexity, 2021. https://doi.org/10.1155/2021/5525271

Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. Journal of Healthcare Engineering, 2021. https://doi.org/10.1155/2021/9930985

Daanouni, O., Cherradi, B., & Tmiri, A. (2020). Type 2 Diabetes Mellitus Prediction Model Based on Machine Learning Approach. 454–469. https://doi.org/10.1007/978-3-030-37629-1_33

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622

Heo, J. N., Yoo, J., Lee, H., Lee, I. H., Kim, J. S., Park, E., Kim, Y. D., & Nam, H. S. (2022). Prediction of Hidden Coronary Artery Disease Using Machine Learning in Patients With Acute Ischemic Stroke. Neurology, 99(1), E55–E65. https://doi.org/10.1212/WNL.0000000000200576

Islam, S. M. S., Talukder, A., Awal, M. A., Siddiqui, M. M. U., Ahamad, M. M., Ahammed, B., Rawal, L. B., Alizadehsani, R., Abawajy, J., Laranjo, L., Chow, C. K., & Maddison, R. (2022). Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries. Frontiers in Cardiovascular Medicine, 9, 762. https://doi.org/10.3389/FCVM.2022.839379/BIBTEX

Jaiswal, S., & Gupta, P. (2021). MLP-DTP: Performance Evaluation of Diabetes Class Prediction. IEMECON 2021 - 10th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks. https://doi.org/10.1109/IEMECON53809.2021.9689183

Jaiswal, S., & Gupta, P. (2022). Ensemble Approach: XGBoost, CATBoost, and LightGBM for Diabetes Mellitus Risk Prediction. pp. 1–6. https://doi.org/10.1109/ICCSEA54677.2022.9936130

Kaur, H., & Kumari, V. (2022). Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics, 18(1–2), 90–100. https://doi.org/10.1016/J.ACI.2018.12.004/FULL/PDF

Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40–46. https://doi.org/10.1016/J.IJCCE.2021.01.001

Li, X., Zhang, J., & Safara, F. (2021). Improving the Accuracy of Diabetes Diagnosis Applications through a Hybrid Feature Selection Algorithm. Neural Processing Letters. pp. 1-17. https://doi.org/10.1007/S11063-021-10491-0

Mahboob Alam, T., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Imtiaz Baig, T., Hussain, A., Malik, M. A., Raza, M. M., Ibrar, S., & Abbas, Z. (2019). A model for early prediction of diabetes. Informatics in Medicine Unlocked, 16, 100204. https://doi.org/10.1016/J.IMU.2019.100204

Mentis, A. F. A., Garcia, I., Jiménez, J., Paparoupa, M., Xirogianni, A., Papandreou, A., & Tzanakaki, G. (2021). Artificial Intelligence in Differential Diagnostics of Meningitis: A Nationwide Study. Diagnostics (Basel, Switzerland), 11(4), 602. https://doi.org/10.3390/DIAGNOSTICS11040602

Meraihi, Y., Gabis, A. B., Mirjalili, S., Ramdane-Cherif, A., & Alsaadi, F. E. (2022). Machine Learning-Based Research for COVID-19 Detection, Diagnosis, and Prediction: A Survey. SN Computer Science, 3(4). https://doi.org/10.1007/S42979-022-01184-Z

Monirujjaman Khan, M., Islam, S., Sarkar, S., Ayaz, F. I., Ananda, M. K., Tazin, T., Albraikan, A. A., & Almalki, F. A. (2022). Machine Learning Based Comparative Analysis for Breast Cancer Prediction. Journal of Healthcare Engineering, 2022. https://doi.org/10.1155/2022/4365855

Rajendra, P., & Latifi, S. (2021). Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update, 1, 100032. https://doi.org/10.1016/J.CMPBUP.2021.100032

Ramezani, R., Maadi, M., & Khatami, S. M. (2018). A novel hybrid intelligent system with missing value imputation for diabetes diagnosis. Alexandria Engineering Journal, 57(3), 1883–1891. https://doi.org/10.1016/J.AEJ.2017.03.043

Roopa, H., & Asha, T. (2019). A Linear Model Based on Principal Component Analysis for Disease Prediction. IEEE Access, 7, 105314–105318. https://doi.org/10.1109/ACCESS.2019.2931956

Rufo, D. D., Debelee, T. G., Ibenthal, A., & Negera, W. G. (2021). Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM). Diagnostics (Basel, Switzerland), 11(9). https://doi.org/10.3390/DIAGNOSTICS11091714

Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., & Williams, R. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Research and Clinical Practice, 157. https://doi.org/10.1016/J.DIABRES.2019.107843

Saxena, R., Sharma, S. K., Gupta, M., & Sampada, G. C. (2022). A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/3820360

Saxena, S., Mohapatra, D., Padhee, S., & Sahoo, G. K. (2021). Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evolutionary Intelligence, 1, 1–17. https://doi.org/10.1007/S12065-021-00685-9/FIGURES/17

Takahashi, S., Chen, Y., & Tanaka-Ishii, K. (2019). Modeling financial time-series with generative adversarial networks. Physica A: Statistical Mechanics and Its Applications, 527, 121261. https://doi.org/10.1016/J.PHYSA.2019.121261

Zhou, X., Wei, Y., Xing, G., Feng, Y., & Song, L. (2023). A Survey in Virtual Image Generation Based on Generative Adversarial Networks. 137–143. https://doi.org/10.1007/978-981-99-1256-8_16

Published
2023-04-30
How to Cite
Jaiswal, S., & Gupta, P. (2023). GLSTM: A novel approach for prediction of real & synthetic PID diabetes data using GANs and LSTM classification model. International Journal of Experimental Research and Review, 30, 32-45. https://doi.org/10.52756/ijerr.2023.v30.004
Section
Articles