Performance Evaluation of YOLOv5-based Custom Object Detection Model for Campus-Specific Scenario

Dontabhaktuni Jayakumar; Samineni Peddakrishna

doi:10.52756/ijerr.2024.v38.005

Authors

Dontabhaktuni Jayakumar School of Electronics Engineering, VIT-AP University, Amaravati-522241, Andhra Pradesh, India https://orcid.org/0000-0003-3779-9904
Samineni Peddakrishna School of Electronics Engineering, VIT-AP University, Amaravati-522241, Andhra Pradesh, India https://orcid.org/0000-0002-5925-8124

DOI:

https://doi.org/10.52756/ijerr.2024.v38.005

Keywords:

Autonomous electric vehicles, computer vision, custom data, object detection, YOLO

Abstract

This study evaluates the performance of a custom object detection model based on the YOLOv5 architecture, specifically tailored for autonomous electric vehicles. The model undergoes pre-processing using the Roboflow computer vision platform, which offers a wide range of tools for data pre-processing and model training. The experiments were conducted on a diverse dataset comprising various objects encountered in campus-specific driving scenarios, such as pedestrians, vehicles, buildings, and obstacles. The performance of the custom object detection model is assessed using standard metrics, including precision, recall, mean average precision (mAP), and intersection-over-union (IoU) at different thresholds. The training process was conducted in a controlled environment, resulting in a Precision of 0.851, a Recall of 0.831, and a mAP of 0.843. These metrics were analyzed to evaluate the YOLOv5-based custom object detection model's ability to detect and categorize objects accurately, its precision in predicting bounding boxes, and its capability to handle various object categories. We also examined the effects of different hyperparameters and data augmentation techniques on the model's performance, including variations in learning rate, batch size, and optimizer algorithms to determine their impact on accuracy and convergence. This analysis provided valuable insights into the model's strengths and weaknesses, highlighting areas for improvement and optimization. These findings are instrumental in developing and deploying advanced object detection systems to enhance the safety and reliability of autonomous electric vehicles.

References

Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2189-2202. https://doi.org/10.1109/tpami.2012.28

Amjoud, A. B., & Amrouch, M. (2023). Object detection using deep learning, CNNs and vision transformers: a review. IEEE Access, 35479-35516. https://doi.org/10.1109/access.2023.3266093

Banerjee, M., Goyal, R., Gupta, P., & Tripathi, A. (2023). Real-Time Face Recognition System with Enhanced Security Features using Deep Learning. Int. J. Exp. Res. Rev., 32, 131-144. https://doi.org/10.52756/ijerr.2023.v32.011

Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004, 10934, 1-17. https://doi.org/10.48550/arXiv.2004.10934

Busta, M., Neumann, L., & Matas, J. (2017). Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of the IEEE International Conference on Computer Vision, 2017, 2204-2212. https://doi.org/10.1109/iccv.2017.242

Dai, J., He, K., & Sun, J. (2016). Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 3150-3158. https://doi.org/10.1109/CVPR.2016.343

Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems, 29. https://doi.org/10.48550/arXiv.1605.06409

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1, 886-893. https://doi.org/10.1109/cvpr.2005.177

Dhruv, P., & Naskar, S. (2020). Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): A review. Machine learning and information processing: proceedings of ICMLIP, 2019, 367-381. https://doi.org/10.1007/978-981-15-1884-3_34

Diwan, T., Anirudh, G., & Tembhurne, J. V. (2023). Object detection using YOLO: Challenges, architectural successors, datasets and applications. multimedia Tools and Applications, 82(6), 9243-9275. https://doi.org/10.1007/s11042-022-13644-y

Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, 2015, 1440-1448. https://doi.org/10.1109/iccv.2015.169

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 580-587. https://doi.org/10.1109/cvpr.2014.81

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, 580-587.

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, 2017, 2961-2969. https://doi.org/10.1109/iccv.2017.322

https://doi.org/10.1109/cvpr.2014.81

Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. 2, II-II. https://doi.org/10.1109/cvpr.2004.1315206

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Springer International Publishing, In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 21-37. https://doi.org/10.48550/arXiv.1512.02325

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965

Naganuma, K., & Ono, S. (2022). A general destriping framework for remote sensing images using flatness constraint. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-16. https://doi.org/10.48550/arXiv.2104.02845

Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 7263-7271. https://doi.org/10.1109/cvpr.2017.690

Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing systems, 28. https://doi.org/10.48550/arXiv.1506.01497

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information Processing Systems, 28. https://doi.org/10.48550/arXiv.1506.01497

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. https://doi.org/10.48550/arXiv.1312.6229

Srivastava, R., & Tripathi, M. (2023). Systematic Exploration Using Intelligent Computing Techniques for Clinical Diagnosis of Gastrointestinal Disorder: A Review. Int. J. Exp. Res. Rev., 36, 265-284. https://doi.org/10.52756/ijerr.2023.v36.026

Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. 1, I-I. https://doi.org/10.1109/cvpr.2001.990517