MCIP: Mining Crop Image Data On pysparkdataframe Using Feature Selection and Cluster Based Techniques

  • yashi chaudhary Gurukul Kangri (Deemed to be University), Haridwar, Uttarakhand, India https://orcid.org/0000-0002-3959-4008
  • Heman Pathak Gurukul Kangri (Deemed to be University), Haridwar, Uttarakhand, India
Keywords: Agriculture, clustering, data mining, k-means, pca, pySpark

Abstract

Crop-related problems such as pests and diseases in India lead to yearly losses exceeding $500 billion. Leaf blight is identified as the principal factor responsible for the substantial financial losses amounting to $500 billion. Farmers engaged in the cultivation of forage and grain sorghum experience the greatest degree of hardship. This disease has a significant impact on various crops, including maize, rice, tomato, potato, millet, and onion. The timely detection and evaluation of disease in plants can contribute to mitigating the extent of associated losses. However, the task presents difficulties as a result of variations in crop species, varieteis of crop diseases, and environmental factors. The current methodologies lack generalizability in their ability to classify and predict diseases. All of the techniques employed in this study are applied to a dataset with predetermined input values and corresponding output values. The current methodologies involve preprocessing the images and performing segmentation for extracting the appropriate characteristics. The process of segmentation necessitates the implementation of pre-processing techniques, such as dilation and edge detection. As a consequence, the loss of crucial information occurs, which subsequently leads to inaccurate classification. Furthermore, the methodologies employed thus far have not been designed to evaluate the performance of the algorithm on specialised or specific datasets. Deep learning methodologies are susceptible to the issue of overfitting. This paper proposed an approach for extracting and analysing crop image data using the PySpark (MCIP) data frame. The MCIP framework employs Principal Component Analysis (PCA) as a method for selecting pertinent features. The PCA features that have been gathered are subsequently employed to identify homogeneous subgroups through the utilisation of the K-means algorithm. The utilisation of a categorised predictive output facilitates the identification and detection of diseases present in potato leaves. The utilisation of the Multispectral Crop Imaging Platform (MCIP) extends beyond the examination of potatoes exclusively, as it possesses the capability to identify diseases present in the foliage of various agricultural crops. In order to validate our assertion, we conducted an experiment utilising the MCIP algorithm on a dataset pertaining to rice diseases. In order to assess the robustness of MCIP, we conducted an evaluation of its Accuracy, Silhouette score, speed, and F1 score. The MCIP model demonstrated high performance in terms of both speed and accuracy compared to existed approaches. The level of accuracy is remarkably near 100 percent.

 

Index Terms Agriculture, clustering, data mining, k-means, pca, pySpark

References

Agarwal, M., Singh, A., Arjaria, S., Sinha, A., & Gupta, S. (2020). ToLeD: Tomato leaf disease detection using convolution neural network. Procedia Computer Science, 167, 293-301. https://doi.org/10.1016/j.procs.2020.03.225

Ali, H., Lali, M. I., Nawaz, M. Z., Sharif, M., & Saleem, B. A. (2017). Symptom based automated detection of citrus diseases using color histogram and textural descriptors. Computers and Electronics in Agriculture, 138, 92-104.

https://doi.org/10.1016/j.compag.2017.04.008

Almoujahed, M. B., Rangarajan, A. K., Whetton, R. L., Vincke, D., Eylenbosch, D., Vermeulen, P., & Mouazen, A. M. (2022). Detection of fusarium head blight in wheat under field conditions using a hyperspectral camera and machine learning. Computers and Electronics in Agriculture, 203, 107456.https://doi.org/10.1016/j.compag.2022.107456

Barman, U., Sahu, D., Barman, G. G., & Das, J. (2020). Comparative Assessment of Deep Learning to Detect the Leaf Diseases of Potato based on Data Augmentation. IEEE.In 2020 International Conference on Computational Performance Evaluation (ComPE) (pp. 682-687). https://doi.org/10.1109/ComPE49325.2020.9200015

Chandy, A. (2019). Pest infestation identification in coconut trees using deep learning. Journal of Artificial Intelligence, 1(01), 10-18. https://doi.org/10.36548/jaicn

Das, I. K., & Rajendrakumar, P. (2016). Disease resistance in sorghum. In Biotic stress resistance in millets (pp. 23-67). Academic Press. https://doi.org/10.1016/B978-0-12-804549-7.00002-0

de Oliveira Dias, F., Magalhães Valente, D. S., Oliveira, C. T., Dariva, F. D., Copati, M. G. F., & Nick, C. (2023). Remote sensing and machine learning techniques for high throughput phenotyping of late blight-resistant tomato plants in open field trials. International Journal of Remote Sensing, 44(6), 1900-1921. https://doi.org/10.1080/01431161.2023.2192878

Deepa, N. R., & Nagarajan, N. (2021). Kuan noise filter with Hough transformation based reweighted linear program boost classification for plant leaf disease detection. Journal of Ambient Intelligence and Humanized Computing, 12(6), 5979-5992. https://doi.org/10.1007/s12652-022-04156-6

Dhaliwal, G. S., Jindal, V., & Dhawan, A. K. (2010). Insect pest problems and crop losses: changing trends. Indian Journal of Ecology, 37(1), 1-7.

Dutta, P., Shah, N., & Saha, S. (2021). A Multi-Objective Optimization-based Clustering Approach for CORD-19 Scholarly Articles. In 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 1393-1398). IEEE.

https://doi.org/10.1109/SMC52423.2021.9658719

Ferentinos, K. P. (2018). Deep learning models for plant disease detection and diagnosis. Computers and Electronics in Agriculture,145(1), 311-318.https://doi.org/10.1016/j.compag.2018.01.009

Ganatra, N., & Patel, A. (2020). Performance Analysis Of Fine-Tuned Convolutional Neural Network Models For Plant Disease Classification. International Journal of Control and Automation, 13(3), 293-305.

Geetharamani, G., & Pandian, A. (2019). Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Computers & Electrical Engineering, 76, 323-338.

https://doi.org/10.1016/j.compeleceng.2019.04.011

Hiremath, S., Wittke, S., Palosuo, T., Kaivosoja, J., Tao, F., Proll, M., ...& Mamitsuka, H. (2021). Crop loss identification at field parcel scale using satellite remote sensing and machine learning. PloS One, 16 (12), e0251952.https://doi.org/10.1371/journal.pone.0251952

Henson, J. M., & French, R. (1993). The polymerase chain reaction and plant disease diagnosis. Annual review of phytopathology, 31(1), 81-109. https://doi.org/10.1146/annurev.py.31.090193.000501

Huang, T., Yang, R., Huang, W., Huang, Y., & Qiao, X. (2018). Detecting sugarcane borer diseases using support vector machine. Information Processing in Agriculture, 5(1), 74-82.

https://doi.org/10.1016/j.inpa.2017.11.001

Hussain, A., Ahmad, M., Mughal, I. A., & Ali, H. (2018). Automatic disease detection in wheat crop using convolution neural network. In The 4th International Conference on Next Generation Computing. http://dx.doi.org/10.13140/RG.2.2.14191.46244

Islam, M., Dinh, A., Wahid, K., & Bhowmik, P. (2017, April). Detection of potato diseases using image segmentation and multiclass support vector machine. In 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1-4.

Jonathan, F., Yang, D., Gowing, G., & Wei, S. (2021, December). Machine Learning Framework for Detecting Offensive Swahili Messages in Social Networks with Apache Spark Implementation. In 2021 IEEE International Conference on Progress in Informatics and Computing (PIC), pp. 293-297. https://doi.org/10.1109/PIC53636.2021.9687001

Johnson, J., Sharma, G., Srinivasan, S., Masakapalli, S. K., Sharma, S., Sharma, J., & Dua, V. K. (2021). Enhanced field-based detection of potato blight in complex backgrounds using deep learning. Plant Phenomics, 2021.

https://doi.org/10.34133/2021/9835724

Khamparia, A., Saini, G., Gupta, D., Khanna, A., Tiwari, S., & de Albuquerque, V. H. C. (2020). Seasonal crops disease prediction and classification using deep convolutional encoder network. Circuits, Systems, and Signal Processing, 39(2), 818-836.https://doi.org/10.1007/s00034-019-01041-0

Karthik, R., Hariharan, M., Anand, S., Mathikshara, P., Johnson, A., &Menaka, R. (2020). Attention embedded residual CNN for disease detection in tomato leaves. Applied Soft Computing, 86, 105933.https://doi.org/10.1016/j.asoc.2019.105933

Koo, C., Malapi-Wight, M., Kim, H. S., Cifci, O. S., Vaughn-Diaz, V. L., Ma, B., ...& Han, A. (2013). Development of a real-time microchip PCR system for portable plant disease diagnosis. PloS one, 8(12), e82704. https://doi.org/10.1371/journal.pone.0082704

Khalifa, N. E. M., Taha, M. H. N., El-Maged, A., Lobna, M., & Hassanien, A. E. (2021). Artificial Intelligence in Potato Leaf Disease Classification: A Deep Learning Approach. Springer, Cham.In Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges, pp. 63-79. https://doi.org/10.1007/978-3-030-59338-4_4

Lee, T. Y., Yu, J. Y., Chang, Y. C., & Yang, J. M. (2020, February). Health detection for potato leaf with convolutional neural network. IEEE,In 2020 Indo–Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN) (pp. 289-293). https://doi.org/10.1109/Indo-TaiwanICAN48429.2020.9181312

Liang, Q., Xiang, S., Hu, Y., Coppola, G., Zhang, D., & Sun, W. (2019). PD2SE-Net: Computer-assisted plant disease diagnosis and severity estimation network. Computers and Electronics in Agriculture, 157, 518-529.

https://doi.org/10.1016/j.compag.2019.01.034

Liu, Z., Bashir, R. N., Iqbal, S., Shahid, M. M. A., Tausif, M., & Umer, Q. (2022). Internet of Things (IoT) and machine learning model of plant disease prediction–blister blight for tea plant. IEEE Access, 10, 44934-44944.

https://doi.org/10.1109/CCECE.2017.7946594

Lu, J., Hu, J., Zhao, G., Mei, F., & Zhang, C. (2017). An in-field automatic wheat disease diagnosis system. Computers and Electronics in Agriculture, 142, 369-379. https://doi.org/10.1016/j.compag.2017.09.012

Mahmud, M., Kaiser, M. S., Hussain, A., & Vassanelli, S. (2018). Applications of deep learning and reinforcement learning to biological data. IEEE Transactions on Neural Networks and Learning Systems, 29(6), 2063-2079.

https://doi.org/10.1016/j.compag.2018.04.002

Mishra, S., Sachan, R., &Rajpal, D. (2020). Deep convolutional neural network based detection system for real-time corn plant disease recognition. Procedia Computer Science, 167, 2003-2010. https://doi.org/10.1016/j.procs.2020.03.236

Muimba-Kankolongo, A. (2018). Food Crop Production by Smallholder Farmers in Southern Africa: Challenges and Opportunities for Improvement. Academic Press.https://doi.org/10.1016/B978-0-12-814383-4.00013-X

Nazki, H., Yoon, S., Fuentes, A., & Park, D. S. (2020). Unsupervised image translation using adversarial networks for improved plant disease recognition. Computers and Electronics in Agriculture, 168, 105117. https://doi.org/10.1016/j.compag.2019.105117

Picon, A., Alvarez-Gila, A., Seitz, M., Ortiz-Barredo, A., Echazarra, J., & Johannes, A. (2019). Deep convolutional neural networks for mobile capture device-based crop disease classification in the wild. Computers and Electronics in Agriculture, 161, 280-290.

Prasad, S., Peddoju, S. K., & Ghosh, D. (2016). Multi-resolution mobile vision system for plant leaf disease diagnosis. Signal, Image and Video Processing, 10(2), 379-388.https://doi.org/10.1007/s11760-015-0751-y

Rashid, J., Khan, I., Ali, G., Almotiri, S. H., AlGhamdi, M. A., & Masood, K. (2021). Multi-Level Deep Learning Model for Potato Leaf Disease Recognition. Electronics, 10(17), 2064. https://doi.org/10.3390/electronics10172064

Rozaqi, A. J., & Sunyoto, A. (2020, November). Identification of Disease in Potato Leaves Using Convolutional Neural Network (CNN) Algorithm. In 2020 3rd International Conference on Information and Communications Technology (ICOIACT), pp. 72-76.https://doi.org/10.1109/ICOIACT50329.2020.9332037

Sambasivam, G., & Opiyo, G. D. (2021). A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egyptian Informatics Journal, 22(1), 27-34. https://doi.org/10.1016/j.eij.2020.02.007

Sanjeev, K., Gupta, N. K., Jeberson, W., & Paswan, S. (2021). Early Prediction of Potato Leaf Diseases Using ANN Classifier. Oriental Journal of Computer Science and Technology, 13(2, 3), 129-134.http://dx.doi.org/10.13005/ojcst13.0203.11

Sasaki, Y., Okamoto, T., Imou, K., & Torii, T. (1998). Automatic diagnosis of plant disease-Spectral reflectance of healthy and diseased leaves. IFAC Proceedings Volumes, 31(5), 145-150.

Sethy, P. K., Barpanda, N. K., Rath, A. K., & Behera, S. K. (2020). Deep feature based rice leaf disease identification using support vector machine. Computers and Electronics in Agriculture, 175, 105527.

https://doi.org/10.1016/j.compag.2020.105527

Sharma, P., Berwal, Y. P. S., & Ghai, W. (2020). Performance analysis of deep learning CNN models for disease detection in plants using image segmentation. Information Processing in Agriculture, 7(4), 566-574. https://doi.org/10.1016/j.inpa.2019.11.001

Singh, U. P., Chouhan, S. S., Jain, S., & Jain, S. (2019). Multilayer convolution neural network for the classification of mango leaves infected by anthracnose disease. IEEE Access, 7, 43721-43729. https://doi.org/10.1109/ACCESS.2019.2907383

Singh, V. (2019). Sunflower leaf diseases detection using image segmentation based on particle swarm optimization. Artificial Intelligence in Agriculture, 3, 62-68.

https://doi.org/10.1016/j.aiia.2019.09.002

Sladojevic, S., Arsenovic, M., Anderla, A., Culibrk, D., &Stefanovic, D. (2016). Deep neural networks based recognition of plant diseases by leaf image classification. Computational Intelligence and Neuroscience, 2016.https://doi.org/10.1155/2016/3289801

Velusamy, P., Rajendran, S., Mahendran, R. K., Naseer, S., Shafiq, M., & Choi, J. G. (2021). Unmanned Aerial Vehicles (UAV) in precision agriculture: applications and challenges. Energies, 15(1), 217. https://doi.org/10.3390/en15010217

Venkataramanan, A, Honakeri, D. K., Agarwal, P. (2019). Plant disease detection and classification using deep neural networks. Int. J. Comput. Sci. Eng.,11(9), 40-6.

Zhang, K., Xu, Z., Dong, S., Cen, C., & Wu, Q. (2019). Identification of peach leaf disease infected by Xanthomonascampestris with deep learning. Engineering in Agriculture, Environment and Food, 12(4), 388-396. https://doi.org/10.1109/ACCESS.2020.2982456

Zhang, Y., Song, C., & Zhang, D. (2020). Deep learning-based object detection improvement for tomato disease. IEEE Access, 8, 56607-56614. https://doi.org/10.1109/ACCESS.2020.2982456

Published
2023-10-30
How to Cite
chaudhary, yashi, & Pathak, H. (2023). MCIP: Mining Crop Image Data On pysparkdataframe Using Feature Selection and Cluster Based Techniques. International Journal of Experimental Research and Review, 34(Special Vo), 106-119. https://doi.org/10.52756/ijerr.2023.v34spl.011