User Interface Bug Classification Model Using ML and NLP Techniques: A Comparative Performance Analysis of ML Models

Sara Khan; Saurabh Pal

doi:10.52756/ijerr.2024.v45spl.005

Authors

Sara Khan Department of Computer Applications, Veer Bahadur Singh Purvanchal University, Jaunpur- 222003, India https://orcid.org/0000-0002-2614-3929
Saurabh Pal Department of Computer Applications, Veer Bahadur Singh Purvanchal University, Jaunpur- 222003, India https://orcid.org/0000-0001-9545-7481

DOI:

https://doi.org/10.52756/ijerr.2024.v45spl.005

Keywords:

Bug classification, feature extraction, hyperparameter tuning, imbalance classification, UI

Abstract

Analyzing user interface (UI) bugs is an important step taken by testers and developers to assess the usability of the software product. UI bug classification helps in understanding the nature and cause of software failures. Manually classifying thousands of bugs is an inefficient and tedious job for both testers and developers. Objective of this research is to develop a classification model for the User Interface (UI) related bugs using supervised Machine Learning (ML) algorithms and Natural Language Processing (NLP) techniques. Also, to assess the effect of different sampling and feature vectorization techniques on the performance of ML algorithms. Classification is based upon ‘Summary’ feature of the bug report and utilizes six classifiers i.e., Gaussian Naïve Bayes (GNB), Multinomial Naïve Bayes (MNB), Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF) and Gradient Boosting (GB). Dataset obtained is vectored using two vectorization techniques of NLP i.e., Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). ML models are trained after vectorization and data balancing. The models ' hyperparameter tuning (HT) has also been done using the grid search approach to improve their efficacy. This work provides a comparative performance analysis of ML techniques using Accuracy, Precision, Recall and F1 Score. Performance results showed that a UI bug classification model can be built by training a tuned SVM classifier using TF-IDF and SMOTE (Synthetic Minority Oversampling Techniques). SVM classifier provided the highest performance measure with Accuracy: 0.88, Precision: 0.86, Recall: 0.85 and F1: 0.85. Result also inferred that the performance of ML algorithms with TF-IDF is better than BoW in most cases. This work provides classification of bugs that are related to only the user interface. Also, the effect of two different feature extraction techniques and sampling techniques on algorithms were analyzed, adding novelty to the research work.

References

Ahmed, H.A., Bawany, N.Z., & Shamsi, J.A. (2021). CaPBug-A Framework for Automatic Bug Categorization and Prioritization Using NLP and Machine Learning Algorithms. IEEE Access, 9, 50496-50512. https://doi.org/10.1109/ACCESS.2021.3069248 DOI: https://doi.org/10.1109/ACCESS.2021.3069248

Aho, P., & Vos, T.E. (2018). Challenges in Automated Testing Through Graphical User Interface. 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 118-121. https://doi.org/10.1109/icstw.2018.00038 DOI: https://doi.org/10.1109/ICSTW.2018.00038

Alqahtani, S. S. (2023). Security bug reports classification using fast text. International Journal of Information Security, 23(2), 1347–1358. https://doi.org/10.1007/s10207-023-00793-w DOI: https://doi.org/10.1007/s10207-023-00793-w

Alsaedi, S.A., Noaman, A.Y., Gad-Elrab, A.A., & Eassa, F.E. (2023). Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learning Model. IEEE Access, 11, 63916-63931. https://doi.org/10.1109/ACCESS.2023.3288156 DOI: https://doi.org/10.1109/ACCESS.2023.3288156

Antoniol, G., Ayari, K., Penta, M.D., Khomh, F., & Guéhéneuc, Y. (2008). Is it a bug or an enhancement? : a text-based approach to classify change requests. Conference of the Centre for Advanced Studies on Collaborative Research, pp. 304-318. https://doi.org/10.1145/1463788.1463819 DOI: https://doi.org/10.1145/1463788.1463819

Bhandari, P., & Rodríguez-Pérez, G. (2023). BuggIn: Automatic Intrinsic Bugs Classification Model using NLP and ML. Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering. DOI: https://doi.org/10.1145/3617555.3617875

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, pp. 785–794. https://doi.org/10.1145/2939672.2939785. DOI: https://doi.org/10.1145/2939672.2939785

Colavito, G., Lanubile, F., Novielli, N., & Quaranta, L. (2024). Leveraging GPT-like LLMs to Automate Issue Labeling. 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR), pp. 469-480. https://doi.org/10.1145/3643991.3644903 DOI: https://doi.org/10.1145/3643991.3644903

Fazzini, M., Prammer, M., d’Amorim, M., & Orso, A. (2018). Automatically translating bug reports into test cases for mobile apps. Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3213846.3213869 DOI: https://doi.org/10.1145/3213846.3213869

Ghawi, R. & Pfeffer, J. (2019). Efficient Hyperparameter Tuning with Grid Search for Text Categorization using KNN Approach with BM25 Similarity. Open Computer Science, 9, 160 – 180. https://doi.org/10.1515/comp-2019-0011 DOI: https://doi.org/10.1515/comp-2019-0011

Goseva-Popstojanova, K. & Tyo, J. (2018). Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification. In Proceedings of IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 344-355. https://doi.org/10.1109/QRS.2018.00047.

Hammouri, A., Hammad, M., Alnabhan, M. M. & Alsarayrah, F. (2018).Software Bug Prediction using Machine Learning Approach. International Journal of Advanced Computer Science and Applications, 9(2). http://dx.doi.org/10.14569/IJACSA.2018.090212.

Hasib, K. M. et al. (2020). A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem. Journal of Computer Science,16(11), 1546-1557.

Hickman, L., Thapa, S., Tay, L., Cao, M. & Srinivasan, P. (2020). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146. https://doi.org/10.1177/1094428120971683

Hirsch, T., & Hofer, B. (2022). Using textual bug reports to predict the fault category of software bugs. Array, 15, 100189. https://doi.org/10.1016/j.array.2022.100189

Kang, S., Yoon, J., Askarbekkyzy, N., & Yoo, S. (2024). Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction. IEEE Transactions on Software Engineering, 50, 2677-2694. https://doi: 10.1109/TSE.2024.3450837

Köksal, Ö. & Tekinerdogan, B. (2022). Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research. Appl. Sci., 12(1), 338. https://doi.org/10.3390/ app12010338.

Iqbal, S., Naseem, R., Jan, S., Alshmrany, S., Yasar, M., & Ali, A. (2018). Determining Bug Prioritization Using Feature Reduction and Clustering With Classification. IEEE Access, 8, 215661–215678.

Juba, B., & Le, H. S. (2019). Precision-Recall versus Accuracy and the Role of Large Data Sets. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 4039-4048. https://doi.org/10.1609/aaai.v33i01.33014039

Kukkar, A., & Mohana, R.M. (2018). A Supervised Bug Report Classification with Incorporate and Textual Field Knowledge. Procedia Computer Science, 132, 352-361. https://doi.org/10.1016/j.procs.2018.05.194

Li, R., Liu, M., Xu, D., Gao, J., Wu, F., & Zhu, L. (2022). A Review of Machine Learning Algorithms for Text Classification. In Proceedings of Lu, W., Zhang, Y., Wen, W., Yan, H., Li, C. (eds) Cyber Security. CNCERT 2021. Communications in Computer and Information Science, vol 1506. Springer, Singapore. https://doi.org/10.1007/978-981-16-9229-1_14

Lopes, F., Agnelo, J., Teixeira, C.A., Laranjeiro, N., & Bernardino, J. (2020). Automating orthogonal defect classification using machine learning algorithms. Future Generation Computer Systems, 102, 932-947. https://doi.org/10.1016/j.future.2019.09.009

Meng, F., Wang, X., Wang, J., Wang, P. (2022). Automatic Classification of Bug Reports Based on Multiple Text Information and Reports’ Intention. In: Aït-Ameur, Y., Cr?ciun, F. (eds) Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol 13299. Springer, Cham, 131- 147. https://doi.org/10.1007/978-3-031-10363-6_9

Paul, A., Mukherjee, D.P., Das, P., Gangopadhyay, A., Chintha, A.R., & Kundu, S. (2018). Improved Random Forest for Classification. IEEE Transactions on Image Processing, 27, 4012-4024. https://doi.org/10.1109/TIP.2018.2834830

Ramay, W.Y., Umer, Q., Yin, X., Zhu, C., & Illahi, I. (2019). Deep Neural Network-Based Severity Prediction of Bug Reports. IEEE Access, 7, 46846-46857. https://doi.org/ 10.1109/ACCESS.2019.2909746

Starbuck, C. (2023). Logistic Regression. In: The Fundamentals of People Analytics. Springer, Cham, pp. 223-238. https://doi.org/10.1007/978-3-031-28674-2_12

Soltani, M., Hermans, F.F., & Bäck, T. (2020). The significance of bug report elements. Empirical Software Engineering, 25, 5255 - 5294. https://doi.org/10.1007/s10664-020-09882-z

Steidl, G. (2015). Supervised Learning by Support Vector Machines. In: Handbook of Mathematical Methods in Imaging, Springer, New York, NY. https://doi.org/10.1007/978-3-642-27795-5_22-5

Subramani, P., Thiyaneswaran, B., Sujatha, M., Nalini, C., & Rajkumar, S. (2022). Grid Search for Predicting Coronary Heart Disease by Tuning Hyper-Parameters. Comput. Syst. Sci. Eng., 43, 737-749. https://doi.org/10.32604/csse.2022.022739

Tabassum, N., Namoun, A., Alyas, T., Tufail, A., Taqi, M., & Kim, K. (2023). Classification of Bugs in Using Supervised and Unsupervised Classification. In Proceedings of IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 344-355. https://doi.org/10.1109/QRS.2018.00047. DOI: https://doi.org/10.1109/QRS.2018.00047

Hammouri, A., Hammad, M., Alnabhan, M. M. & Alsarayrah, F. (2018).Software Bug Prediction using Machine Learning Approach. International Journal of Advanced Computer Science and Applications, 9(2). http://dx.doi.org/10.14569/IJACSA.2018.090212. DOI: https://doi.org/10.14569/IJACSA.2018.090212

Hasib, K. M. et al. (2020). A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem. Journal of Computer Science,16(11), 1546-1557. DOI: https://doi.org/10.3844/jcssp.2020.1546.1557

Hickman, L., Thapa, S., Tay, L., Cao, M. & Srinivasan, P. (2020). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146. https://doi.org/10.1177/1094428120971683 DOI: https://doi.org/10.1177/1094428120971683

Hirsch, T., & Hofer, B. (2022). Using textual bug reports to predict the fault category of software bugs. Array, 15, 100189. https://doi.org/10.1016/j.array.2022.100189 DOI: https://doi.org/10.1016/j.array.2022.100189

Kang, S., Yoon, J., Askarbekkyzy, N., & Yoo, S. (2024). Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction. IEEE Transactions on Software Engineering, 50, 2677-2694. https://doi: 10.1109/TSE.2024.3450837 DOI: https://doi.org/10.1109/TSE.2024.3450837

Köksal, Ö. & Tekinerdogan, B. (2022). Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research. Appl. Sci., 12(1), 338. https://doi.org/10.3390/ app12010338. DOI: https://doi.org/10.3390/app12010338

Iqbal, S., Naseem, R., Jan, S., Alshmrany, S., Yasar, M., & Ali, A. (2018). Determining Bug Prioritization Using Feature Reduction and Clustering With Classification. IEEE Access, 8, 215661–215678. DOI: https://doi.org/10.1109/ACCESS.2020.3035063

Juba, B., & Le, H. S. (2019). Precision-Recall versus Accuracy and the Role of Large Data Sets. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 4039-4048. https://doi.org/10.1609/aaai.v33i01.33014039 DOI: https://doi.org/10.1609/aaai.v33i01.33014039

Kukkar, A., & Mohana, R.M. (2018). A Supervised Bug Report Classification with Incorporate and Textual Field Knowledge. Procedia Computer Science, 132, 352-361. https://doi.org/10.1016/j.procs.2018.05.194 DOI: https://doi.org/10.1016/j.procs.2018.05.194

Li, R., Liu, M., Xu, D., Gao, J., Wu, F., & Zhu, L. (2022). A Review of Machine Learning Algorithms for Text Classification. In Proceedings of Lu, W., Zhang, Y., Wen, W., Yan, H., Li, C. (eds) Cyber Security. CNCERT 2021. Communications in Computer and Information Science, vol 1506. Springer, Singapore. https://doi.org/10.1007/978-981-16-9229-1_14 DOI: https://doi.org/10.1007/978-981-16-9229-1_14

Lopes, F., Agnelo, J., Teixeira, C.A., Laranjeiro, N., & Bernardino, J. (2020). Automating orthogonal defect classification using machine learning algorithms. Future Generation Computer Systems, 102, 932-947. https://doi.org/10.1016/j.future.2019.09.009 DOI: https://doi.org/10.1016/j.future.2019.09.009

Meng, F., Wang, X., Wang, J., Wang, P. (2022). Automatic Classification of Bug Reports Based on Multiple Text Information and Reports’ Intention. In: Aït-Ameur, Y., Cr?ciun, F. (eds) Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol 13299. Springer, Cham, 131- 147. https://doi.org/10.1007/978-3-031-10363-6_9 DOI: https://doi.org/10.1007/978-3-031-10363-6_9

Paul, A., Mukherjee, D.P., Das, P., Gangopadhyay, A., Chintha, A.R., & Kundu, S. (2018). Improved Random Forest for Classification. IEEE Transactions on Image Processing, 27, 4012-4024. https://doi.org/10.1109/TIP.2018.2834830 DOI: https://doi.org/10.1109/TIP.2018.2834830

Ramay, W.Y., Umer, Q., Yin, X., Zhu, C., & Illahi, I. (2019). Deep Neural Network-Based Severity Prediction of Bug Reports. IEEE Access, 7, 46846-46857. https://doi.org/ 10.1109/ACCESS.2019.2909746 DOI: https://doi.org/10.1109/ACCESS.2019.2909746

Starbuck, C. (2023). Logistic Regression. In: The Fundamentals of People Analytics. Springer, Cham, pp. 223-238. https://doi.org/10.1007/978-3-031-28674-2_12 DOI: https://doi.org/10.1007/978-3-031-28674-2_12

Soltani, M., Hermans, F.F., & Bäck, T. (2020). The significance of bug report elements. Empirical Software Engineering, 25, 5255 - 5294. https://doi.org/10.1007/s10664-020-09882-z DOI: https://doi.org/10.1007/s10664-020-09882-z

Steidl, G. (2015). Supervised Learning by Support Vector Machines. In: Handbook of Mathematical Methods in Imaging, Springer, New York, NY. https://doi.org/10.1007/978-3-642-27795-5_22-5 DOI: https://doi.org/10.1007/978-1-4939-0790-8_22

Subramani, P., Thiyaneswaran, B., Sujatha, M., Nalini, C., & Rajkumar, S. (2022). Grid Search for Predicting Coronary Heart Disease by Tuning Hyper-Parameters. Comput. Syst. Sci. Eng., 43, 737-749. https://doi.org/10.32604/csse.2022.022739 DOI: https://doi.org/10.32604/csse.2022.022739

Tabassum, N., Namoun, A., Alyas, T., Tufail, A., Taqi, M., & Kim, K. (2023). Classification of Bugs in Vito, G.D., Starace, L.L.L., Martino, S.D., Ferrucci, F., & Palomba, F. (2024). Large Language Models in Software Engineering: A Focus on Report Issue Classification and User Acceptance Test Generation. Ital-IA 2024: 4th National Conference on Artificial Intelligence, organized by CINI, May 29-30, 2024, Naples, Italy.