A Computation of Frequent Itemset using Matrix Based Apriori Algorithm

  • Samin Jayaram Vivekanandan Faculty of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamilnadu 600119, India; Department of Computer Science and Engineering, Dhanalakshmi College of Engineering, Chennai, Tamilnadu 601301, India https://orcid.org/0000-0001-7581-4728
  • Gurusamy Gunasekaran Department of Computer Science and Engineering, Dr. M. G. R. Educational and Research Institute, Chennai, Tamilnadu 600095, India https://orcid.org/0000-0003-2331-8014
Keywords: Transaction Matrix, Matrix Based Apriori (MB_Apriori), Frequent Itemsets

Abstract

The Apriori Algorithm is a traditional method for determining the frequent itemsets from a lot of data. Association rules can be generated based on frequently occurring itemsets. The Apriori algorithm has two bottlenecks: it generates a large number of candidate sets and repeatedly examines the database. It takes a long time to execute and takes up a lot of space. We provide a novel strategy called Matrix-Based Apriori Algorithm to overcome these problems.  It is easy to implement but effective in handling the issues of Apriori. We don't need to constantly scan the database because all operations are first applied to the matrix, after which the database is converted back into its original form. In addition, we have reduced the potential itemsets by using several pruning techniques. The Matrix Based Apriori algorithm outperforms the standard Apriori algorithm in terms of time, with an average time reduction rate of 71.5% with the first experiment and 86% with the second. In a similar vein, we contrasted our Matrix Based Apriori with an effective alternative known as improved apriori. We discovered that our method outperforms the upgraded apriori by 20%.

References

Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. ACM Sigmoid Record, 22(2), 207–216. https://doi.org/10.1145/170036.170072

Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules. Proceedings of the 20th VLDB Conference Santiago. Chile, 1994, 487–499.

Al-bana, M.R., & Farhan, M.S. (2022). An Efficient Spark-Based Hybrid Frequent Itemset Mining. Data (MDPI), 7(11), 1–22. https://doi.org/https://doi.org/10.3390/data7010011

Al-Maolegi, M., & Arkok, B. (2014). An Improved Apriori Algorithm For Association Rules. International Journal on Natural Language Computing, 3(1), 21–29. https://doi.org/10.5121/ijnlc.2014.3103

Carter, C. L., Hamilton, H. J., & Cercone, N. (1997). Share Based Measures for Itemsets 1 Introduction. Principles of Data Mining and Knowledge Discovery, First European Symposium, PKDD ’97, Trondheim, Norway, June 24-27, 1997, Proceedings, pp. 14–24.

Groceries Dataset. (n.d.).

https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset

Gupta, G.K. (2019). Introduction to data mining with case studies (Third Edit). PHI Learning Priivate Limited.

Han, J., Pei, J., & Yin, Y. (2000). Mining Frequent Patterns without Candidate Generation. ACM SIGMOD Record, 29(2), 1–12. https://doi.org/10.1145/335191.335372

Ji, L., Zhang, B., & Li, J. (2006). A New Improve-ment on Apriori Algorithm. International Conference on Computational Intelligence and Security, Guangzhou, China, pp. 840–844. https://doi.org/10.1109/ICCIAS.2006.294255

Jiawei, H., & Micheline, K. (2006). Data Mining: Concepts and Techniques (Second). Morgan Kaufmann Publishers.

Magdy, M., Ghaleb, F.F.M., Mohamed, D.A.E.A., & Zakaria, W. (2022). CC-IFIM: an efficient approach for incremental frequent itemset mining based on closed candidates. Journal of Supercomputing, 79(7), 7877–7899. https://doi.org/10.1007/s11227-022-04976-5

Ming-Syan, C., Jiawei, H., & Philip, S.Y. (1996). Data Mining: An Overview from a Database Perspective. IEEE transactions on knowledge and Data Engineering, 8(6), 866–883.. https://doi.org/10.1109/69.553155

Park, J.S., Chen, M.S., & Yu, P.S. (1995). An Effective Hash-Based Algorithm for Mining Association Rules. ACM Sigmoid Record., 24(2), 175–186. https://doi.org/10.1145/568271.223813

Sandhu, P.S., Dhaliwal, D.S., Panda, S.N., & Bisht, A. (2010). An improvement in apriori algorithm using profit and quantity. 2nd International Conference on Computer and Network Technology, ICCNT 2010, pp. 3–7. https://doi.org/10.1109/ICCNT.2010.46

Shuwen, L., & Jiyi, X. (2020). An Improved Apriori Algorithm Based on Matrix. 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 488–491. https://doi.org/10.1109/ICMTMA50254.2020.00111

Singh, H., & Dhir, R. (2013). A New Efficient Matrix Based Frequent Itemset Mining Algorithm with Tags. International Journal of Future Computer and Communication, 2016, 355–358. https://doi.org/10.7763/ijfcc.2013.v2.184

Sun, L.N. (2020). An improved apriori algorithm based on support weight matrix for data mining in transaction database. Journal of Ambient Intelligence and Humanized Computing, 11(2), 495–501. https://doi.org/10.1007/s12652-019-01222-4

Vivekanandan, S.J., & Gunasekaran, G. A novel way to compute association rules. Int. J. Syst. Assur. Eng. Manag., (2022). https://doi.org/10.1007/s13198-022-01676-4

Vivekanandan, S.J., Ammu, S.P., Sripriyadharshini, R., & Preetha, T.R. (2021). Computation Of High Utility Itemsets By Using Range Of Utility Technique. Journal of University of Shanghai for Science and Technology, 23(4), 94–101.

Vivekanandan, S.J., & Gunasekaran, G. (2020). An Improvisation on Apriori Algorithm Applied in Medical Transaction. Journal of Green Engineering (JGE), 10(10), 8574–8586.

Vivekanandan, S.J., & Gunasekaran, G. (2019). A Survey on Association Rules Mining. Asian Resonance, 8(1), 1–4.

Wang, F., & Li, Y.H. (2008). An Improved Apriori Algorithm Based on the Matrix. 2008 International Seminar on Future BioMedical Information Engineering, Wuhan, China, pp. 152-155. https://doi.org/ 10.1109/FBIE.2008.80.

Wang, C., & Zheng, X. (2020). Application of improved time series Apriori algorithm by frequent itemsets in association rule data mining based on temporal constraint. Evolutionary Intelligence, 13(1), 39–49.

Wang, K., Zhou, S., Man, J., Yeung, S., Yang, Q., & Kong, H. (2005). Mining Customer Value: From Association Rules to Direct Marketing. Data Mining and Knowledge Discovery, 11(1), 57–79. http://www.kdnuggets.com/meetings/kdd98/kdd

Wu, L., Gong, K., Ge, H.X., & Cui, J. (2010). A Study of Improving Apriori Algorithm. 22010 2nd International Workshop on Intelligent Systems and Applications, Wuhan, China, 2010, pp. 1-4. https://doi.org/10.1109/IWISA.2010.5473450.

Xiao, H. (2022). Algorithm of Apriori-Based Rural Tourism Driving Factors and Its System Optimization. Mobile Information Systems, 2022, 9. https://doi.org/https://doi.org/10.1155/2022/3380609

Xie, H. (2021). Research and Case Analysis of Apriori Algorithm Based on Mining Frequent Item-Sets. Open Journal of Social Sciences, 09(04), 458–468. https://doi.org/10.4236/jss.2021.94034

Yang, Q., Fu, Q., Wang, C., & Yang, J. (2018). A matrix-based apriori algorithm improvement. Proceedings - 2018 IEEE 3rd International Conference on Data Science in Cyberspace, DSC 2018, pp. 824–828. https://doi.org/10.1109/DSC.2018.00132

Ye, F. (2020). Research and Application of Improved APRIORI Algorithm Based on Hash Technology. 2020 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), pp. 64–67. https://doi.org/10.1109/IPEC49694.2020.9115141

Zheng, Y. (2022). An Improved Apriori Association Rule for the Identification of Acupoints Combination in Treating COVID-19 Patients. Computational Intelligence and Neuroscience, 2022, 1-9. https://doi.org/10.1155/2022/3900094

Published
2023-04-30
How to Cite
Vivekanandan, S., & Gunasekaran, G. (2023). A Computation of Frequent Itemset using Matrix Based Apriori Algorithm. International Journal of Experimental Research and Review, 30, 247-256. https://doi.org/10.52756/ijerr.2023.v30.022
Section
Articles