A Computation of Frequent Itemset using Matrix Based Apriori Algorithm
Abstract
The Apriori Algorithm is a traditional method for determining the frequent itemsets from a lot of data. Association rules can be generated based on frequently occurring itemsets. The Apriori algorithm has two bottlenecks: it generates a large number of candidate sets and repeatedly examines the database. It takes a long time to execute and takes up a lot of space. We provide a novel strategy called Matrix-Based Apriori Algorithm to overcome these problems. It is easy to implement but effective in handling the issues of Apriori. We don't need to constantly scan the database because all operations are first applied to the matrix, after which the database is converted back into its original form. In addition, we have reduced the potential itemsets by using several pruning techniques. The Matrix Based Apriori algorithm outperforms the standard Apriori algorithm in terms of time, with an average time reduction rate of 71.5% with the first experiment and 86% with the second. In a similar vein, we contrasted our Matrix Based Apriori with an effective alternative known as improved apriori. We discovered that our method outperforms the upgraded apriori by 20%.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. ACM Sigmoid Record, 22(2), 207–216. https://doi.org/10.1145/170036.170072
Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules. Proceedings of the 20th VLDB Conference Santiago. Chile, 1994, 487–499.
Al-bana, M.R., & Farhan, M.S. (2022). An Efficient Spark-Based Hybrid Frequent Itemset Mining. Data (MDPI), 7(11), 1–22. https://doi.org/https://doi.org/10.3390/data7010011
Al-Maolegi, M., & Arkok, B. (2014). An Improved Apriori Algorithm For Association Rules. International Journal on Natural Language Computing, 3(1), 21–29. https://doi.org/10.5121/ijnlc.2014.3103
Carter, C. L., Hamilton, H. J., & Cercone, N. (1997). Share Based Measures for Itemsets 1 Introduction. Principles of Data Mining and Knowledge Discovery, First European Symposium, PKDD ’97, Trondheim, Norway, June 24-27, 1997, Proceedings, pp. 14–24.
Groceries Dataset. (n.d.).
https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset
Gupta, G.K. (2019). Introduction to data mining with case studies (Third Edit). PHI Learning Priivate Limited.
Han, J., Pei, J., & Yin, Y. (2000). Mining Frequent Patterns without Candidate Generation. ACM SIGMOD Record, 29(2), 1–12. https://doi.org/10.1145/335191.335372
Ji, L., Zhang, B., & Li, J. (2006). A New Improve-ment on Apriori Algorithm. International Conference on Computational Intelligence and Security, Guangzhou, China, pp. 840–844. https://doi.org/10.1109/ICCIAS.2006.294255
Jiawei, H., & Micheline, K. (2006). Data Mining: Concepts and Techniques (Second). Morgan Kaufmann Publishers.
Magdy, M., Ghaleb, F.F.M., Mohamed, D.A.E.A., & Zakaria, W. (2022). CC-IFIM: an efficient approach for incremental frequent itemset mining based on closed candidates. Journal of Supercomputing, 79(7), 7877–7899. https://doi.org/10.1007/s11227-022-04976-5
Ming-Syan, C., Jiawei, H., & Philip, S.Y. (1996). Data Mining: An Overview from a Database Perspective. IEEE transactions on knowledge and Data Engineering, 8(6), 866–883.. https://doi.org/10.1109/69.553155
Park, J.S., Chen, M.S., & Yu, P.S. (1995). An Effective Hash-Based Algorithm for Mining Association Rules. ACM Sigmoid Record., 24(2), 175–186. https://doi.org/10.1145/568271.223813
Sandhu, P.S., Dhaliwal, D.S., Panda, S.N., & Bisht, A. (2010). An improvement in apriori algorithm using profit and quantity. 2nd International Conference on Computer and Network Technology, ICCNT 2010, pp. 3–7. https://doi.org/10.1109/ICCNT.2010.46
Shuwen, L., & Jiyi, X. (2020). An Improved Apriori Algorithm Based on Matrix. 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 488–491. https://doi.org/10.1109/ICMTMA50254.2020.00111
Singh, H., & Dhir, R. (2013). A New Efficient Matrix Based Frequent Itemset Mining Algorithm with Tags. International Journal of Future Computer and Communication, 2016, 355–358. https://doi.org/10.7763/ijfcc.2013.v2.184
Sun, L.N. (2020). An improved apriori algorithm based on support weight matrix for data mining in transaction database. Journal of Ambient Intelligence and Humanized Computing, 11(2), 495–501. https://doi.org/10.1007/s12652-019-01222-4
Vivekanandan, S.J., & Gunasekaran, G. A novel way to compute association rules. Int. J. Syst. Assur. Eng. Manag., (2022). https://doi.org/10.1007/s13198-022-01676-4
Vivekanandan, S.J., Ammu, S.P., Sripriyadharshini, R., & Preetha, T.R. (2021). Computation Of High Utility Itemsets By Using Range Of Utility Technique. Journal of University of Shanghai for Science and Technology, 23(4), 94–101.
Vivekanandan, S.J., & Gunasekaran, G. (2020). An Improvisation on Apriori Algorithm Applied in Medical Transaction. Journal of Green Engineering (JGE), 10(10), 8574–8586.
Vivekanandan, S.J., & Gunasekaran, G. (2019). A Survey on Association Rules Mining. Asian Resonance, 8(1), 1–4.
Wang, F., & Li, Y.H. (2008). An Improved Apriori Algorithm Based on the Matrix. 2008 International Seminar on Future BioMedical Information Engineering, Wuhan, China, pp. 152-155. https://doi.org/ 10.1109/FBIE.2008.80.
Wang, C., & Zheng, X. (2020). Application of improved time series Apriori algorithm by frequent itemsets in association rule data mining based on temporal constraint. Evolutionary Intelligence, 13(1), 39–49.
Wang, K., Zhou, S., Man, J., Yeung, S., Yang, Q., & Kong, H. (2005). Mining Customer Value: From Association Rules to Direct Marketing. Data Mining and Knowledge Discovery, 11(1), 57–79. http://www.kdnuggets.com/meetings/kdd98/kdd
Wu, L., Gong, K., Ge, H.X., & Cui, J. (2010). A Study of Improving Apriori Algorithm. 22010 2nd International Workshop on Intelligent Systems and Applications, Wuhan, China, 2010, pp. 1-4. https://doi.org/10.1109/IWISA.2010.5473450.
Xiao, H. (2022). Algorithm of Apriori-Based Rural Tourism Driving Factors and Its System Optimization. Mobile Information Systems, 2022, 9. https://doi.org/https://doi.org/10.1155/2022/3380609
Xie, H. (2021). Research and Case Analysis of Apriori Algorithm Based on Mining Frequent Item-Sets. Open Journal of Social Sciences, 09(04), 458–468. https://doi.org/10.4236/jss.2021.94034
Yang, Q., Fu, Q., Wang, C., & Yang, J. (2018). A matrix-based apriori algorithm improvement. Proceedings - 2018 IEEE 3rd International Conference on Data Science in Cyberspace, DSC 2018, pp. 824–828. https://doi.org/10.1109/DSC.2018.00132
Ye, F. (2020). Research and Application of Improved APRIORI Algorithm Based on Hash Technology. 2020 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), pp. 64–67. https://doi.org/10.1109/IPEC49694.2020.9115141
Zheng, Y. (2022). An Improved Apriori Association Rule for the Identification of Acupoints Combination in Treating COVID-19 Patients. Computational Intelligence and Neuroscience, 2022, 1-9. https://doi.org/10.1155/2022/3900094
Copyright (c) 2023 International Academic Publishing House (IAPH)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.