Securing the Data Using an Efficient Machine Learning Technique

Keywords: Privacy, Differential privacy, Decision Forest, Decision Tree, PrivacyPreserving Data Mining, Noisy Data

Abstract

More accessible data and the rise of advanced data analysis contribute to using complex models in decision-making across various fields. Nevertheless, protecting people’s privacy is vital. Medical predictions often employ decision trees due to their simplicity; however, they may also be a source of privacy violations. We will apply differential privacy to this end, a mathematical framework that adds random values to the data to provide secure confidentiality while maintaining accuracy. Our novel method Dual Noise Integrated Privacy Preservation (DNIPP) focuses on building decision forests to achieve privacy. DNIPP provides more protection against breaches in deep sections of the tree, thereby reducing noise in final predictions. We combine multiple trees into one forest using a method that considers each tree’s accuracy. Furthermore, we expedite this procedure by employing an iterative approach. Experiments demonstrate that DNIPP outperforms other approaches on real datasets. This means that DNIPP offers a promising approach to reconciling accuracy and privacy during sensitive tasks. In DNIPP, the strategic allocation of privacy budgets provides a beneficial compromise between privacy and utility. DNIPP protects privacy by prioritizing privacy concerns at lower, more vulnerable nodes, resulting in accurate and private decision forests. Furthermore, the selective aggregation technique guarantees the privacy of a forest by combining multiple data points. DNIPP provides a robust structure for decision-making in delicate situations, ensuring the model's effectiveness while safeguarding personal privacy.

References

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. ACM, In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308-318. https://doi.org/10.1145/2976749.2978318.

Abouelmehdi, K., Beni-Hessane, A., & Khaloufi, H. (2018). Big healthcare data: preserving security and privacy. Journal of Big Data, 5(1). https://doi.org/10.1186/s40537-017-0110-7.

Bettini, C., & Riboni, D. (2015). Privacy protection in pervasive systems: State of the art and technical challenges. Pervasive and Mobile Computing, 17, 159-174. https://doi.org/10.1016/j.pmcj.2014.08.004.

Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., & Seth, K. (2019). Towards federated learning at scale: System design. In Proceedings of Machine Learning and Systems (MLSys) 2020.

Bu, Z., Wang, H., & Long, Q. (2021). On the convergence of deep learning with differential privacy. arXiv preprint arXiv:2106.07830.

Claerhout, B., & De Moor, G. J. E. (2005). Privacy protection for clinical and genomic data. International Journal of Medical Informatics, 74(2-4), 257-265. https://doi.org/10.1016/j.ijmedinf.2004.06.010.

Cui, L., Qu, Y., Nosouhi, M.R., & Yu, S. J.W.G. (2019). Improving data utility through game theory in personalized differential privacy. Journal of Computer Science and Technology, 34(2), 272-286. https://doi.org/10.1007/s11390-019-1918-1.

Feng, Q., He, D., Zeadally, S., & Khan, M.K.N. (2019). A survey on privacy protection in blockchain system. Journal of Network and Computer Applications, 126, 45-58. https://doi.org/10.1016/j.jnca.2018.10.020.

Gupta, R., Tanwar, S., Al-Turjman, F., & Italiya, P. A. S. W. (2020). Smart contract privacy protection using AI in cyber-physical systems: Tools, techniques and challenges. IEEE Access, 8, 24746-24772.

Jain, P., & Nandanwar, S. (2015). Securing the clustered database using data modification technique. In 2015 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 1163-1166. https://doi.org/10.1109/CICN.2015.331.

Jain, P., & Shakya, H. K. (2023). Achieving privacy preservation in data mining using hybrid transformation and machine learning technique. MSEA, 71(4), 7883.

Jain, P., Shakya, H. K., & Lala, A. (2023). Advanced privacy preserving model for smart healthcare using deep learning. In Proceedings of the IEEE International Conference IC3I 2023. https://doi.org/10.1109/IC3I59117.2023.10397954.

Jain, P., Shakya, H. K., Nigam, A., Chandanan, A. K., & Murthy, C. R. (2022). Machine learning based privacy preservation in data mining. CIMS, 28(12), 350-360.

Jain, P., Thada, V., & Lala, A. (2023). Design of advanced privacy preserving model for protecting privacy within a fog computing scenario. Proceedings of the IEEE International Conference UPCON 2023. https://doi.org/10.1109/UPCON59197.2023.10434728.

Jain, P., Thada, V., & Motwani, D. (2024). Providing Highest Privacy Preservation Scenario for Achieving Privacy in Confidential Data. International Journal of Experimental Research and Review, 39(spl.) 190-199. https://doi.org/10.52756/ijerr.2024.v39spl.015.

Jain, Pinkal, and Shakya, Harish Kumar (2022). A Review of Different Privacy Preserving Techniques in Data Mining. Paper presented at the International Conference on Innovative Computing & Communication (ICICC) 2022. Retrieved from SSRN: https://ssrn.com/abstract=4021149.

Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Yang, H. (2019). Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977.

Kumar, A., Dutta, S., & Pranav, P. (2023). Supervised learning for Attack Detection in Cloud. Int. J. Exp. Res. Rev., 31(Spl Volume), 74-84. https://doi.org/10.52756/10.52756/ijerr.2023.v31spl.008

Li, H., Dai, Y., & Lin, X. (2015). Efficient e-health data release with consistency guarantee under differential privacy. IEEE, In 2015 17th International Conference on E-health Networking, Application & Services (HealthCom), pp. 602-608. https://doi.org/10.1109/HealthCom.2015.7454576.

Malin, B. A. (2004). An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. Journal of the American Medical Informatics Association, 12(1), 28-34. https://doi.org/10.1197/jamia.M1603.

Sharma, S., Chen, K., & Malin, B. A. (2004). An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. Journal of the American Medical Informatics Association, 12(1), 28-34. https://doi.org/10.1197/jamia.M1603.

Miller, A. R., & Tucker, C. (2009). Privacy protection and technology diffusion: The case of electronic medical records. Management Science, 55(7), 1077-1093. https://doi.org/10.1287/mnsc.1090.1014.

Miller, A. R., & Tucker, C. (2009). Privacy protection and technology diffusion: The case of electronic medical records. Management Science, 55(7), 1077-1093. https://doi.org/10.1287/mnsc.1090.1014.

Mondal, S., Nag, A., Barman, A., & Karmakar, M. (2023). Machine Learning-based maternal health risk prediction model for IoMT framework. Int. J. Exp. Res. Rev., 32, 145-159. https://doi.org/10.52756/ijerr.2023.v32.012

Sheth, A. (2018). Practical approaches to privacy-preserving analytics for IoT and cloud-based healthcare systems. IEEE Internet Computing, 22(2), 42–51. https://doi.org/10.1109/MIC.2018.112102519.

Tayefi, M., Tajfard, M., Saffar, S., Hanachi, P., & Ali, R. (2017). Association of hs-CRP with coronary heart disease: A data mining approach using decision tree algorithm. Computer Methods and Programs in Biomedicine, 141, 105–109. https://doi.org/10.1016/j.cmpb.2017.02.001.

Yadav, R., & Singh, R. (2023). Enhancing Software Maintainability Prediction Using Multiple Linear Regression and Predictor Importance. Int. J. Exp. Res. Rev., 36, 135-146. https://doi.org/10.52756/ijerr.2023.v36.013

Yang, Y., Zheng, X., Guo, W., Liu, X., & Chang, V. (2018). Privacy-preserving fusion of IoT and big data for e-health. Future Generation Computer Systems, 86(SEP), 1437–1455. https://doi.org/10.1016/j.ins.2018.02.005.

Yavanamandha, P., Keerthana, B., Jahnavi, P., Rao, K. V., & Kumar, C. R. (2023). Machine Learning-Based Gesture Recognition for Communication with the Deaf and Dumb. Int. J. Exp. Res. Rev., 34(Special Vol.), 26-35. https://doi.org/10.52756/ijerr.2023.v34spl.004

Yin, C., Xi, J., Sun, R., & Wang, J. (2018). Location privacy protection based on differential privacy strategy for big data in industrial internet of things. IEEE Transactions on Industrial Informatics, 14(8), 3628–3636. https://doi.org/10.1109/TII.2018.2794700

Yuan, J., Yu, S. (2013). Privacy Preserving Back-Propagation Learning Made Practical with Cloud Computing. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 106. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36883-7_18.

Yuksel, B., Kupcu, A., & Ozkasap, O. (2017). Research issues for privacy and security of electronic health services. Future Generation Computer Systems, 68, 1-13. https://doi.org/10.1016/j.future.2016.08.011.

Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., & Liu, Y. Batchcrypt (2020). Efficient homomorphic encryption for cross-silo federated learning. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC 20), pp. 493–506. https://doi.org 10.5555/3485970.3486018.

Zheng, Z., Xie, S., Dai, H. N., Chen, X., & Wang, H. (2017). An overview of blockchain technology: Architecture, consensus, and future trends. IEEE, In 2017 IEEE International Congress on Big Data (BigData Congress), pp. 557-564. https://doi.org/10.1109/BigDataCongress.2017.85

Zhu, T., Ye, D., Wang, W., Zhou, W., & Yu, P.S. (2020). More than privacy: applying differential privacy in key areas of artificial intelligence. https://arxiv.org/abs/2008.01916

Published
2024-06-30
How to Cite
Jain, P., & Thada, V. (2024). Securing the Data Using an Efficient Machine Learning Technique. International Journal of Experimental Research and Review, 40(Spl Volume), 217-226. https://doi.org/10.52756/ijerr.2024.v40spl.018