A Novel Data Handling Technique for Wine Quality Analysis using ML Techniques
DOI:
https://doi.org/10.52756/ijerr.2024.v45spl.003Keywords:
Decision tree, multi-layer perceptron, multiclass, random forest, yeo-johnson transformationAbstract
In this era, wine is a regularly redeemed beverage, and industries are seeing increased sales due to product quality certification. This research aims to identify key wine characteristics that contribute to significant outcomes through the application of machine learning classification techniques, specifically Random Forest (RF), Decision Tree (DT) and Multi-Layer Perceptron (MLP), using white and red wine datasets sourced from the UCI Machine Learning repository. This research aims to develop a multiclass classification model using machine learning (ML) to accurately assess the quality of a balanced wine dataset comprising both white and red wines. The dataset is balanced by random oversampling to avoid biases in ML techniques for the majority class obtained by the imbalanced multiclass dataset (IMD). Furthermore, we apply a Yeo-Jhonson transformation (YJT) to the datasets to reduce skewness. We validated the ML algorithm's result using a 10-fold cross-validation approach and found that RF yielded the highest overall accuracy of 93.14%, within a range of 75% to 94%. We have observed that the proposed approach for balanced white wine dataset accuracy is 93.14% using RF, 90.83% using DT, and 75.49% using MLP. Similarly, for the balanced red wine dataset, accuracy is 89.36% using RF, 85.36% using DT, and 78.00% using MLP. The proposed approach improves accuracy by RF 23%, DT 30%, and MLP 21% for the white wine dataset. Similarly, accuracy by RF remained the same, DT 10%, and MLP 22% is improved in the red wine dataset. Additionally, the proposed approach's RF, DT, and MLP yield mean squared error (MSE) values of 0.080, 0.151, and 0.443 for the white wine dataset and 0.143, 0.221, and 0.396 for the red wine dataset. We also observed that the RF accuracy for the proposed technique is the highest among all specified classifiers for white and red wine datasets, respectively.
References
Benjamin, A. C. (2022). Wine Quality Classification Using Machine Learning Algorithms. International Journal of Computer Applications Technology and Research, 11(06), 241-246. https://doi.org/10.7753/IJCATR1106.1010.
Burigo., R., Scott, F., Eli, K., & Nibhrat, L. (2023). Comparison of sampling Methods for Predicting Wine Quality Based on Physicochemical Properties. SMU Data Science Review, 7(1), 8. https://scholar.smu.edu/datasciencereview/vol7/iss1/
Cao, Y., Chen, H., & Lin, B. (2022). Wine Type Classification Using Random Forest Model. Highlights in Science, Engineering and Technology SDPIT, 4. https://doi.org/10.54097/hset.v4i.1032
Carpita, M., & Goli, S. (2023). Categorical Classifiers in multiclass classification with imbalanced datasets. WILEY. https://doi.org/10.1002/sam.11624
Chaudhari, M.S., Kiran A. A., Shahare H., Helwatkar V., Shinde, S., Janbandhu, D., & Rangari, S. (2023). VinQCheck: An Intelligent Wine Quality Assessment. International Journal of Innovative Science and Research Technology, 8(12). https://doi.org/10.5281/zenodo.10393843
Dahal, K. R., Dahal J. N., Banjade, H., & Gaire, S. (2021). Prediction of Wine Quality Using Machine Learning Algorithms. Open Journal of Statistics, 11, 278-289. https://doi.org/10.4236/ojs.2021.112015
Danrui Q., Peng, J., Yongjun H., & Wang, J. (2023). Auto-FP: An Experiment Study of automated Feature Preprocessing for tabular Data. Open Proceedings. https://doi.org/10.48550/arXiv.2310.02540
Dhaliwal, P., Sharma, S., & Chauhan, L. (2022). Detailed Study of Wine Dataset and its Optimization. I. J. Intelligent Systems and Applications. 5, 35-46. https://doi.org/10.5815/ijisa.2022.05.04
Gawale, A.S. (2022). Wine Quality Prediction using Machine Learning and Hybrid Modeling. School of Computing, National College of Ireland.
Geethanjali, T. M., Sowjanya, M.Y., Rohith, S.N., & Shubashree, B.E. (2021). Prediction of Wine Quality using Machine Learning. Journal of Emerging Technologies and Innovative Research (JETIR), 8(11). http://www.jetir.org/papers/JETIR2111328.pdf
Handball, I. F., Ingosan, J. S., Oyam, N. A., & Hu, Y. (2020). Classifying Wastes Using Random Forests, Gaussian Naïve Bayes, Support Vector Machine and Multilayer Perceptron. IOP Conf. Series: Materials Science and Engineering. https://doi.org/10.1088/1757-899X/803/1/012017
https://archive.ics.uci.edu/ml/datasets/Wine+Quality
Kaliappan, J., Bagepalli, A. R., Almal, S., Mishra, R., Hu, Y. C., & Srinivasan, K. (2023). Impact of Cross-Validation on Machine Learning Models for Early Detection of Intrauterine Fetal Demise. Diagnostics, 13, 1692. https://doi.org/10.3390/diagnostics13101692
Khilari, N., Hadawale, P., Shaikh, H., & Kolase, S. (2021). Analysis of Machine Learning Algorithm to Predict Wine Quality. International Research Journal of Engineering and Technology (IRJET), 08(12). https://doi.org/10.32628/IJSRSET229235
Kumar, S., Agarwal, K., & Mandan, N. (2020). Red Wine Quality Prediction Using Machine Learning Techniques. International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India. pp. 1-6. https://doi.org/10.1109/ICCCI48352.2020.9104095
Lee, J., Kang, J., Park, S., Jang, D., & Lee, J. (2020). A Multi-Class Classification Model for Technology Evaluation. Sustainability, 12, 6153. https://doi.org/10.3390/su12156153.
Nwakuyal, M. T., & Anyaogu, I. V. (2022). Implementation of Yeo-Johnson in Quantile Regression. Benin Journal of Statistics, 5, 123-136.
Patkar, G.S., Balaganesh D., (2021). Smart Agri Wine: An Artificial Intelligence Approach to Predict Wine Quality. Journal of Computer Science. https://doi.org/10.3844/jcssp.2021.1099.1103
Saito, M., Ohsato, T., & Yamanaka, S. (2021). An empirical evaluation of machine learning performance in corporate sales growth prediction. JSIAM Letters, 13, 25-28. https://doi.org/10.14495/jsiaml.13.25
Siddiqi, M. A., & Pak, W. (2021). An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3118361
Sindayigaya, L., & Dey, A. (2022). Machine Learning Algorithms: A Review. International Journal of Science and Research (IJSR), 11( 8).
Tan, P.N., Steinbach, M., Karpatne, A., & Kumar, V. (2022). Introduction to Data Mining. 2nd ed., Pearson Publications.
Tigga, O., Pal, J., & Mustafi, D. (2023). Performance Analysis of Machine Learning Algorithms for Data Classification. International Conference on Machine Intelligence with Applications (ICMIA 2023). https://doi.org/10.1063/5.0214183
Tigga, O., Pal, J., & Mustafi, D. (2023). A Comparative Study of Rule-Based Classifier and Decision Tree in Machine Learning, In Proceedings of 4th International Journal of Advances in Soft Computing and Intelligent Systems (IJASCIS), 02(01), 40-47.
https://sciencetransactions.com/ijascis/uploads/2023/02/j23-40-47.pdf
Uma, R., Kaladevi, R., Jebamalar, T. J., Sarasu, P., & Charles, P. V. (2023). Analysis of multiclass imbalance handling in red wine quality dataset using oversampling and machine learning techniques. Journal of Theoretical and Applied Information Technology, 101(19).
Weisberg, S. (2001). Yeo-Johnson Power Transformations. National Science Foundation Grant DUE 97-52887.
Yang, Y., Khorshidi H.A. & Aickelin, U. (2024). A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems. Front. Digit. Health, 6, 1430245. https://doi.org/10.3389/fdgth.2024.1430245.
Zhang, Y., Li, Q., & Xin, Y. (2024). Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks. Front. Comput. Neurosci., 18, 1345575. https://doi.org/10.3389/fncom.2024.1345575.
Zhao, Y., Huang, Z., Gong L., Zhu, L., Yu, O., & Gao, Y. (2023). Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models. Hindawi, IET Software, 2023, 6293074. https://doi.org/10.1049/2023/6293074
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Academic Publishing House (IAPH)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.