Analyzing Resampling Techniques for Addressing the Class Imbalance in NIDS using SVM with Random Forest Feature Selection

K. Swarnalatha; Nirmalajyothi Narisetty; Gangadhara Rao Kancherla; Basaveswararao Bobba

doi:10.52756/ijerr.2024.v43spl.004

Authors

K. Swarnalatha Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, Andhra Pradesh-522510, India https://orcid.org/0009-0005-4325-6254
Nirmalajyothi Narisetty Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Bowrampet, Hyderabad, Telangana-500090, India https://orcid.org/0000-0002-4810-9676
Gangadhara Rao Kancherla Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, Andhra Pradesh-522510, India https://orcid.org/0000-0002-6106-8477
Basaveswararao Bobba Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, Andhra Pradesh-522510, India https://orcid.org/0000-0003-4287-0891

DOI:

https://doi.org/10.52756/ijerr.2024.v43spl.004

Keywords:

Cloud Computing, Cumulative Feature Importance, Intrusion Detection System, Machine Learning, Resampling Methods

Abstract

The purpose of Network Intrusion Detection Systems (NIDS) is to ensure and protect computer networks from harmful actions. A major concern in NIDS development is the class imbalance problem, i.e., normal traffic dominates the communication data plane more than intrusion attempts. Such a state of affairs can pose certain hazards to the effectiveness of detection algorithms, including those useful for detecting less frequent but still highly dangerous intrusions. This paper aims to utilize resampling techniques to tackle this problem of class imbalance in NIDS using a Support Vector Machine (SVM) classifier alongside utilizing features selected by Random Forest to improve the feature subset selection process. The analysis highlights the combativeness of each sampling method, offering insights into their efficiency and practicality for real-world applications. Four resampling techniques are analyzed. Such techniques include Synthetic Minority Over-sampling Technique (SMOTE), Random Under-sampling (RUS), Random Over-sampling (ROS) and SMOTE with two different combinations i.e., RUS SMOTE and RUS ROS. Feature selection was done using Random Forest, which was improved by Bayesian methods to create subsets of features with feature rankings determined by Cumulative Feature Importance Score (CFIS). The CIDDS-2017 dataset is used for the performance evaluation, and the metrics used include accuracy, precision, recall, F-measure and CPU time. The algorithm that performs best overall in the CFIS feature subsets is SMOTE, and the features that give the best result are selected at the 90% level with 25 features. This subset accomplishes a relative accuracy enhancement of 0.08% than the other approaches. The RUS+ROS technique is also fine but somehow slower than SMOTE. On the other hand, RUS+SMOTE shows relatively poor results although it consumes less time in terms of computational time compared to other methods, giving about 50% of the performance shown by the other methods. This paper's novelty is adapting the RUS method as a standalone test for screening new and potentially contaminated datasets. The standalone RUS method is more efficient in terms of computations; the algorithm returned the best result of 98.13% accuracy at 85% at the CFIS level of 34 features with a computation time of 137.812 s. It is also noted that SMOTE is considered to be proficient among all resampling techniques used for handling the problem of class imbalance in NIDS, vice 90% CFIS feature subset. Future research directions could include using these techniques in different data sets and other machine learning and deep learning methods together with ROC curve analysis to provide useful pointers to NIDS designers on how to select the right data mining tools and strategies for their projects.

References

Akgun, D., Hizal, S., & Cavusoglu, U. (2022). A new DDoS attacks intrusion detection model based on deep learning for cybersecurity. Computers & Security, 118, 102748. https://doi.org/10.1016/j.cose.2022.102748 DOI: https://doi.org/10.1016/j.cose.2022.102748

Al, S., & Dener, M. (2021). STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment. Computers & Security, 110, 102435. https://doi.org/10.1016/j.cose.2021.102435 DOI: https://doi.org/10.1016/j.cose.2021.102435

Alqarni, A. A., & El-Alfy, E. M. (2022). Improving Intrusion Detection for Imbalanced Network Traffic using Generative Deep Learning. International Journal of Advanced Computer Science and Applications, 13(4), 959-967.

https://doi.org/10.14569/ijacsa.2022.01304109 DOI: https://doi.org/10.14569/IJACSA.2022.01304109

Awad, M., & Alabdallah, A. (2019). Addressing Imbalanced classes problem of intrusion detection system using Weighted Extreme Learning Machine. International Journal of Computer Networks & Communications, 11(5), 39–58.

https://doi.org/10.5121/ijcnc.2019.11503 DOI: https://doi.org/10.5121/ijcnc.2019.11503

Babu, K. S., & Rao, Y. N. (2023). MCGAN: Modified Conditional Generative Adversarial Network (MCGAN) for class imbalance problems in Network Intrusion Detection System. Applied Sciences, 13(4), 2576. https://doi.org/10.3390/app13042576 DOI: https://doi.org/10.3390/app13042576

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS'11). Curran Associates Inc., Red Hook, NY, USA, pp. 2546–2554.

Chen, R., Dewi, C., Huang, S., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7, 52. https://doi.org/10.1186/s40537-020-00327-4 DOI: https://doi.org/10.1186/s40537-020-00327-4

Chui, K. T., Gupta, B. B., Chaurasia, P., Arya, V., Almomani, A., & Alhalabi, W. (2023). Three-stage data generation algorithm for multiclass network intrusion detection with highly imbalanced dataset. International Journal of Intelligent Networks, 4, 202–210. https://doi.org/10.1016/j.ijin.2023.08.001 DOI: https://doi.org/10.1016/j.ijin.2023.08.001

Cui, J., Zong, L., Xie, J., & Tang, M. (2022). A novel multi-module integrated intrusion detection system for high-dimensional imbalanced data. Applied Intelligence, 53(1), 272–288. https://doi.org/10.1007/s10489-022-03361-2 DOI: https://doi.org/10.1007/s10489-022-03361-2

Elmasry, W., Akbulut, A., & Zaim, A. H. (2021). A Design of an Integrated Cloud-based Intrusion Detection System with Third Party Cloud Service. Open Computer Science, 11(1), 365–379. https://doi.org/10.1515/comp-2020-0214 DOI: https://doi.org/10.1515/comp-2020-0214

Fong, S., Zhuang, Y., Tang, R., Yang, X., & Deb, S. (2013). Selecting optimal feature set in High-Dimensional Data by Swarm Search. Journal of Applied Mathematics, 2013, 1–18. https://doi.org/10.1155/2013/590614 DOI: https://doi.org/10.1155/2013/590614

Gwiazdowicz, M., & Natkaniec, M. (2023). Feature selection and model evaluation for threat detection in smart grids. Energies, 16(12), 4632. https://doi.org/10.3390/en16124632 DOI: https://doi.org/10.3390/en16124632

Hagar, A. A., & Gawali, B. W. (2022). Apache Spark and Deep Learning Models for High-Performance Network Intrusion Detection using CSE-CIC-IDS2018. Computational Intelligence and Neuroscience, 2022, 1–11. https://doi.org/10.1155/2022/3131153 DOI: https://doi.org/10.1155/2022/3131153

Huhn, B. (2021). What could you lose from a DDoS attack? Retrieved August 1,2024, from Citrix Blogs - Official Citrix Blogs website: https://www.citrix.com/blogs/2021/12/09/what-could-you-lose-from-a-ddos-attack/

Kudithipudi, S., Narisetty, N., Kancherla, G. R., & Bobba, B. (2023). Evaluating the efficacy of resampling techniques in addressing class imbalance for network intrusion detection systems using support vector machines. Ingénierie Des Systèmes D Information, 28(5), 1229–1236. https://doi.org/10.18280/isi.280511 DOI: https://doi.org/10.18280/isi.280511

Kumar, N., & Sharma, S. (2013, July). Study of intrusion detection system for DDoS attacks DOI: https://doi.org/10.1109/WOCN.2013.6616255

in cloud computing. In proceedings of the Tenth International Conference on Wireless and Optical Communications Networks (WOCN, 2013), pp. 1-5. DOI: 10.1109/WOCN.2013.6616255 DOI: https://doi.org/10.1109/WOCN.2013.6616175

Madhuri, T. N. P., Rao, M. S., Santosh, P. S., Tejaswi, P., & Devendra, S. (2022). Data Communication Protocol using Elliptic Curve Cryptography for Wireless Body Area Network. In proceedings of the 6th International Conference on Computing Methodologies and Communication (ICCMC), 29-31 March 2022, pp.133-139. https://doi.org/10.1109/iccmc53470.2022.9753898 DOI: https://doi.org/10.1109/ICCMC53470.2022.9753898

Mbow, M., Koide, H., & Sakurai, K. (2022). Handling class Imbalance problem in Intrusion Detection System based on deep learning. International Journal of Networking and Computing, 12(2), 467–492. https://doi.org/10.15803/ijnc.12.2_467 DOI: https://doi.org/10.15803/ijnc.12.2_467

Mijalkovic, J., & Spognardi, A. (2022). Reducing the false negative rate in deep learning based network intrusion detection systems. Algorithms, 15(8), 258. https://doi.org/10.3390/a15080258 DOI: https://doi.org/10.3390/a15080258

Mjahed, O., Hadaj, S. E., Guarmah, E. M. E., & Mjahed, S. (2023). New Denial of Service Attacks Detection Approach Using Hybridized Deep Neural Networks and Balanced Datasets. Computer Systems Science and Engineering, 47(1), 757–775. https://doi.org/10.32604/csse.2023.039111 DOI: https://doi.org/10.32604/csse.2023.039111

Mohammad, A. H. (2021). Intrusion Detection Using a New Hybrid Feature Selection Model. Intelligent Automation & Soft Computing, 29(3), 65–80. https://doi.org/10.32604/iasc.2021.016140 DOI: https://doi.org/10.32604/iasc.2021.016140

Narisetty, N., Kancherla, G. R., Bobba, B., & K.Swathi. (2021). Investigative Study of the Effect of Various Activation Functions with Stacked Autoencoder for Dimension Reduction of NIDS using SVM. International Journal of Advanced Computer Science and Applications, 12(5), 152-161. https://doi.org/10.14569/ijacsa.2021.0120519 DOI: https://doi.org/10.14569/IJACSA.2021.0120519

Narisetty, N., Kancherla, G. R., Bobba, B., & Swathi, K. (2021). Hybrid Intrusion Detection Method based on constraints optimized SAE and grid search based SVM-RBF on cloud. International Journal of Computer Networks and Applications, 8(6), 776. https://doi.org/10.22247/ijcna/2021/210725 DOI: https://doi.org/10.22247/ijcna/2021/210725

Nayani, A. S. K., Sekhar, C., Rao, M. S., & Rao, K. V. (2021). Enhancing image resolution and denoising using autoencoder. In Lecture notes on data engineering and communications technologies, pp. 649–659. https://doi.org/10.1007/978-981-15-8335-3_50 DOI: https://doi.org/10.1007/978-981-15-8335-3_50

Rao, M. S., Sekhar, C., & Bhattacharyya, D. (2021). Comparative analysis of machine learning models on loan risk analysis. In Advances in intelligent systems and computing, pp. 81–90. https://doi.org/10.1007/978-981-15-9516-5_7 DOI: https://doi.org/10.1007/978-981-15-9516-5_7

Rish, I. (2001). An Empirical Study of the Naive Bayes Classifier. In IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, 4 August 2001. pp. 41-46.

Salo, F., Nassif, A. B., & Essex, A. (2019). Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Computer Networks, 148, 164–175. https://doi.org/10.1016/j.comnet.2018.11.010 DOI: https://doi.org/10.1016/j.comnet.2018.11.010

Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), pp. 108-116. https://doi.org/10.5220/0006639801080116 DOI: https://doi.org/10.5220/0006639801080116

Soliman, O. S., & Mahmoud, A. S. (2012). A classification system for remote sensing satellite images using support vector machine with non-linear kernel functions. In 8th International Conference on Informatics and Systems (INFOS, 2012), pp. BIO-181.

Sulzmann, J., Fürnkranz, J., & Hüllermeier, E. (2007). On pairwise naive Bayes classifiers. In Lecture notes in computer science, pp. 371–381. https://doi.org/10.1007/978-3-540-74958-5_35 DOI: https://doi.org/10.1007/978-3-540-74958-5_35

Wang, C., Sun, Y., Wang, W., Liu, H., & Wang, B. (2023). Hybrid Intrusion detection system based on combination of random forest and autoencoder. Symmetry, 15(3), 568. https://doi.org/10.3390/sym15030568 DOI: https://doi.org/10.3390/sym15030568

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061 DOI: https://doi.org/10.1016/j.neucom.2020.07.061

Zekan, M., Tomi?i?, I., & Schatten, M. (2022). Low-sample classification in NIDS using the EC-GAN method. JUCS - Journal of Universal Computer Science, 28(12), 1330–1346. https://doi.org/10.3897/jucs.85703 DOI: https://doi.org/10.3897/jucs.85703

Zhang, G., Wang, X., Li, R., Song, Y., He, J., & Lai, J. (2020a). Network intrusion detection based on conditional Wasserstein generative adversarial network and Cost-Sensitive stacked autoencoder. IEEE Access, 8, 190431–190447.

https://doi.org/10.1109/access.2020.3031892

Zhang, G., Wang, X., Li, R., Song, Y., He, J., & Lai, J. (2020b). Network intrusion detection based on conditional Wasserstein generative adversarial network and Cost-Sensitive stacked autoencoder. IEEE Access, 8, 190431–190447.

https://doi.org/10.1109/access.2020.3031892 DOI: https://doi.org/10.1109/ACCESS.2020.3031892

Zhang, H., Zhang, B., Huang, L., Zhang, Z., & Huang, H. (2023). An efficient Two-Stage network intrusion detection system in the internet of things. Information, 14(2), 77. https://doi.org/10.3390/info14020077 DOI: https://doi.org/10.3390/info14020077