Analysis and Visualize the Predictive Model Performance: Manual Vs Automated Machine Learning (AutoML) Algorithms for Heart Failure Prediction

Authors

DOI:

https://doi.org/10.52756/ijerr.2024.v46.003

Keywords:

AutoML, AutoGluon, Heart Failure, Machine Learning, SHapley Additive exPlanations

Abstract

Heart failure (HF) is a common complication of cardiovascular diseases. This research focuses on assessing the effectiveness of different models for predicting HF using both Traditional Machine Learning (TML) methods and Automated Machine Learning (AutoML) approaches. TML models need extensive manual tuning and expert knowledge for algorithm selection and optimization, making the process slow and susceptible to human error. To tackle this challenge, the work proposed an AutoML approach utilizing the AutoGluon framework for predicting HF. The main goal of this study is to automate the process of selecting the most efficient model. This study compares a total of twenty (20) individual-trained ML models, consisting of fourteen (14) from AutoML and six (6) from TML. In TML, Logistic Regression (LR) produced the highest 87.50% accuracy and ROC-AUC of 88.83% compared to Support Vector Models (SVM), Decision Trees (DT), Gaussian Naïve Bayes (GNB), Random Forests (RF) and K-Nearest Neighbors (KNN). In AutoML, the CatBoost model outperforms the other thirteen algorithms with the highest accuracy of 99.39% and ROC-AUC of 99.89%. The results show that an AutoML based algorithm called the CatBoost model gives the most accurate model among all 20 models. SHAP was employed to interpret the top-performing model, increasing its transparency and usability.

References

Absar, N., Das, E. K., Shoma, S. N., Khandaker, M. U., Miraz, M. H., Faruque, M. R. I., Tamam, N., Sulieman, A., & Pathan, R. K. (2022). The Efficacy of Machine-Learning-Supported Smart System for Heart Disease Prediction. Healthcare, 10(6), 1137. https://doi.org/10.3390/healthcare10061137

Baseer, K.K., Nas, S.A., Dharani, S., Sravani, S., Yashwanth, P., & Jyothirmai, P. (2023). Medical Diagnosis of Human Heart Diseases with and without Hyperparameter tuning through Machine Learning. IEEE, In 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1-8. https://doi.org/10.1109/ICCMC56507.2023.10084156

Bodapati, J., & Sajja, V. (2019). Robust Cluster-then-label (RCTL) Approach for Heart Disease Prediction. Ingénierie Des Systèmes d Information, 24(3), 255–260. https://doi.org/10.18280/isi.240305

David, H., & Belcy, S. A. (2018). Heart disease prediction using data mining techniques. Journal on Soft Computing, 9(1), 1824-1830. https://doi.org/10.21917/ijsc.2018.0254

Deepa, S., Prasath, S., Mohanasathiya, K. S., Ilango, M., & Ragavi, A. (2024). A Hybrid Machine Learning Approach for Enhanced Prediction of Breast Cancer with Lasso Method for Feature Extraction. Proceedings of 4th International Conference on Artificial Intelligence and Smart Energy, pp. 1–17. https://doi.org/10.1007/978-3-031-61471-2_1

Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.-J., Sandhu, S., Guppy, K. H., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. The American Journal of Cardiology, 64(5), 304–310. https://doi.org/10.1016/0002-9149(89)90524-9

ElShawi, R., Sherif, Y., Al?Mallah, M., & Sakr, S. (2021). Interpretability in healthcare: A comparative study of local machine learning interpretability techniques. Computational Intelligence, 37(4), 1633-1650. https://doi.org/10.1111/coin.12410.

Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M., & Smola, A. (2020). Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505.

Ferreira, L., Pilastri, A., Martins, C. M., Pires, P. M., & Cortez, P. (2021). A comparison of AutoML tools for machine learning, deep learning and XGBoost. IEEE, In 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1-8. https://doi.org/10.1109/IJCNN52387.2021.9534091.

Gardner, W. A. (1984). Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique. Signal Processing, 6(2), 113-133. https://doi.org/10.1016/0165-1684(84)90013-6.

Gazelo?lu, C. (2020). Prediction of heart disease by classifying with feature selection and machine learning methods. Progress in Nutrition, 22(2).

Hajouli, S., Ludhwani, D.H.F., & Ejection Fraction. (2022). In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan. Available from: https://www.ncbi.nlm.nih.gov/books/NBK553115/.

He, X., Zhao, K., & Chu, X. (2021). AutoML: A survey of the state-of-the-art. Knowledge-based Systems, 212, 106622. https://doi.org/10.1016/j.knosys.2020.106622

Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1988). Heart disease data set. The UCI KDD Archive. https://archive.ics.uci.edu/ml/datasets/heart+disease

Jiang, P., Suzuki, H., & Obi, T. (2023). XAI-based cross-ensemble feature ranking methodology for machine learning models. International Journal of Information Technology, 15(4), 1759-1768. https://doi.org/10.1007/s41870-023-01270-2

Khourdifi, Y., & Bahaj, M. (2019). K-nearest neighbour model optimized by particle swarm optimization and ant colony optimization for heart disease classification. In Big Data and Smart Digital Environment. Springer International Publishing, pp. 215-224. https://doi.org/10.1007/978-3-030-12048-1_23

Krittanawong, C., Johnson, K. W., Rosenson, R. S., Wang, Z., Aydar, M., Baber, U., ... & Narayan, S. M. (2019). Deep learning for cardiovascular medicine: a practical primer. European Heart Journal, 40(25), 2058-2073. https://doi.org/10.1093/eurheartj/ehz056

Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542-81554. https://doi.org/10.1109/ACCESS.2019.2923707

Nasarian, E., Abdar, M., Fahami, M. A., Alizadehsani, R., Hussain, S., Basiri, M. E., ... & Sarrafzadegan, N. (2020). Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach. Pattern Recognition Letters, 133, 33-40. https://doi.org/10.1016/j.patrec.2020.02.010

National Heart, Lung and Blood Institute. (2018). Know the Differences: Cardiovascular Disease, Heart Disease, Coronary Heart Disease. Accessed August 7, 2018.

Natarajan, K., & Rajeev, C. (2024). Prediction of heart failure disease using classification algorithms along with performance parameters. In S. Kadry & S. Mahajan (Eds.), Data Science in the Medical Field, Academic Press, pp. 213–226. https://doi.org/10.1016/B978-0-443-24028-7.00015-5

Orlenko, A., Kofink, D., Lyytikäinen, L. P., Nikus, K., Mishra, P., Kuukasjärvi, P., ... & Moore, J. H. (2020). Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinformatics, 36(6), 1772-1778. https://doi.org/10.1093/bioinformatics/btz796

Padmanabhan, M., Yuan, P., Chada, G., & Nguyen, H. V. (2019). Physician-friendly machine learning: A case study with cardiovascular disease risk prediction. Journal of Clinical Medicine, 8(7), 1050. https://doi.org/10.3390/jcm8071050

Pal, M., Parija, S., Panda, G., Dhama, K., & Mohapatra, R. K. (2022). Risk prediction of cardiovascular disease using machine learning classifiers. Open Medicine, 17(1), 1100-1113. https://doi.org/10.1515/med-2022-0508

Paladino, L. M., Hughes, A., Perera, A., Topsakal, O., & Akinci, T. C. (2023). Evaluating the performance of automated machine learning (AutoML) tools for heart disease diagnosis and prediction. AI, 4(4), 1036-1058. https://doi.org/10.3390/ai4040053

Patil, S. B., & Kumaraswamy, Y. S. (2009). Intelligent and effective heart attack prediction system using data mining and artificial neural network. European Journal of Scientific Research, 31(4), 642-656.

Pol, U. R., & Sawant, T. U. (2021). Automl: Building a classification model with PyCaret. Ymer, 20, 547-552.

Purusothaman, G., & Krishnakumari, P. (2015). A survey of data mining techniques on risk prediction: Heart disease. Indian Journal of Science and Technology, 8(12), 1. https://doi.org/10.17485/ijst/2015/v8i12/58385

Rajeev, C. (2024). A comparative study of Autogluonand H2O for early prediction of coronary artery disease using automated machine learning and XAI. African Journal of Biomedical Research, 5183–5193. https://doi.org/10.53555/ajbr.v27i3s.3297

Ranganathan, L. B., Rajasundaram, A., & Kumar, S. K. S. (2024). A Cross-Sectional Study on the Effect of Stress on Short-Term Heart Rate Variability and Muscle Strength Among Construction Site Workers. International Journal of Experimental Research and Review, 44, 1–10. https://doi.org/10.52756/ijerr.2024.v44spl.001

Rimal, Y., Paudel, S., Sharma, N., & Alsadoon, A. (2024). Machine learning model matters its accuracy: a comparative study of ensemble learning and automl using heart disease prediction. Multimedia Tools and Applications, 83(12), 35025-35042. https://doi.org/10.1007/s11042-023-16380-z

Shah, D., Patel, S., & Bharti, S. K. (2020). Heart disease prediction using machine learning techniques. SN Computer Science, 1(6), 345. https://doi.org/10.1007/s42979-020-00365-y

Shah, S. M. S., Batool, S., Khan, I., Ashraf, M. U., Abbas, S. H., & Hussain, S. A. (2017). Feature extraction through parallel probabilistic principal component analysis for heart disease diagnosis. Physica A: Statistical Mechanics and its Applications, 482, 796-807. https://doi.org/10.1016/j.physa.2017.04.113

Shen, Z., Zhang, Y., Wei, L., Zhao, H., & Yao, Q. (2018). Automated Machine Learning: From Principles to Practices. arXiv preprint arXiv:1810.13306. ArXiv, abs/1810.13306

Sun, B., Cui, W., Liu, G., Zhou, B., & Zhao, W. (2023). A hybrid strategy of AutoML and SHAP for automated and explainable concrete strength prediction. Case Studies in Construction Materials, 19, e02405. https://doi.org/10.1016/j.cscm.2023.e02405

Tarawneh, M., & Embarak, O. (2019). Hybrid approach for heart disease prediction using data mining techniques. Springer International Publishing, In advances in internet, data and web technologies: the 7th international conference on emerging internet, Data and Web technologies (EIDWT-2019), pp. 447-454.

Tufail, S., Riggs, H., Tariq, M., & Sarwat, A. I. (2023). Advancements and challenges in machine learning: A comprehensive review of models, libraries, applications, and algorithms. Electronics, 12(8), 1789. https://doi.org/10.3390/electronics12081789

Waqar, M., Dawood, H., Dawood, H., Majeed, N., Banjar, A., & Alharbey, R. (2021). An Efficient SMOTE?Based Deep Learning Model for Heart Attack Prediction. Scientific Programming, 2021(1), 6621622. https://doi.org/10.1155/2021/6621622

Published

2024-12-30

How to Cite

Rajeev, C., & Natarajan, K. (2024). Analysis and Visualize the Predictive Model Performance: Manual Vs Automated Machine Learning (AutoML) Algorithms for Heart Failure Prediction. International Journal of Experimental Research and Review, 46, 31–44. https://doi.org/10.52756/ijerr.2024.v46.003

Issue

Section

Articles