Performance Evaluation of YOLOv5-based Custom Object Detection Model for Campus-Specific Scenario
DOI:
https://doi.org/10.52756/ijerr.2024.v38.005Keywords:
Autonomous electric vehicles, computer vision, custom data, object detection, YOLOAbstract
This study evaluates the performance of a custom object detection model based on the YOLOv5 architecture, specifically tailored for autonomous electric vehicles. The model undergoes pre-processing using the Roboflow computer vision platform, which offers a wide range of tools for data pre-processing and model training. The experiments were conducted on a diverse dataset comprising various objects encountered in campus-specific driving scenarios, such as pedestrians, vehicles, buildings, and obstacles. The performance of the custom object detection model is assessed using standard metrics, including precision, recall, mean average precision (mAP), and intersection-over-union (IoU) at different thresholds. The training process was conducted in a controlled environment, resulting in a Precision of 0.851, a Recall of 0.831, and a mAP of 0.843. These metrics were analyzed to evaluate the YOLOv5-based custom object detection model's ability to detect and categorize objects accurately, its precision in predicting bounding boxes, and its capability to handle various object categories. We also examined the effects of different hyperparameters and data augmentation techniques on the model's performance, including variations in learning rate, batch size, and optimizer algorithms to determine their impact on accuracy and convergence. This analysis provided valuable insights into the model's strengths and weaknesses, highlighting areas for improvement and optimization. These findings are instrumental in developing and deploying advanced object detection systems to enhance the safety and reliability of autonomous electric vehicles.
References
Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2189-2202. https://doi.org/10.1109/tpami.2012.28
Amjoud, A. B., & Amrouch, M. (2023). Object detection using deep learning, CNNs and vision transformers: a review. IEEE Access, 35479-35516. https://doi.org/10.1109/access.2023.3266093
Banerjee, M., Goyal, R., Gupta, P., & Tripathi, A. (2023). Real-Time Face Recognition System with Enhanced Security Features using Deep Learning. Int. J. Exp. Res. Rev., 32, 131-144. https://doi.org/10.52756/ijerr.2023.v32.011
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004, 10934, 1-17. https://doi.org/10.48550/arXiv.2004.10934
Busta, M., Neumann, L., & Matas, J. (2017). Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of the IEEE International Conference on Computer Vision, 2017, 2204-2212. https://doi.org/10.1109/iccv.2017.242
Dai, J., He, K., & Sun, J. (2016). Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 3150-3158. https://doi.org/10.1109/CVPR.2016.343
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems, 29. https://doi.org/10.48550/arXiv.1605.06409
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems, 29. https://doi.org/10.48550/arXiv.1605.06409
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1, 886-893. https://doi.org/10.1109/cvpr.2005.177
Dhruv, P., & Naskar, S. (2020). Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): A review. Machine learning and information processing: proceedings of ICMLIP, 2019, 367-381. https://doi.org/10.1007/978-981-15-1884-3_34
Diwan, T., Anirudh, G., & Tembhurne, J. V. (2023). Object detection using YOLO: Challenges, architectural successors, datasets and applications. multimedia Tools and Applications, 82(6), 9243-9275. https://doi.org/10.1007/s11042-022-13644-y
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, 2015, 1440-1448. https://doi.org/10.1109/iccv.2015.169
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 580-587. https://doi.org/10.1109/cvpr.2014.81
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, 580-587.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, 2017, 2961-2969. https://doi.org/10.1109/iccv.2017.322
https://doi.org/10.1109/cvpr.2014.81
Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. 2, II-II. https://doi.org/10.1109/cvpr.2004.1315206
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Springer International Publishing, In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 21-37. https://doi.org/10.48550/arXiv.1512.02325
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965
Naganuma, K., & Ono, S. (2022). A general destriping framework for remote sensing images using flatness constraint. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-16. https://doi.org/10.48550/arXiv.2104.02845
Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 7263-7271. https://doi.org/10.1109/cvpr.2017.690
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing systems, 28. https://doi.org/10.48550/arXiv.1506.01497
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information Processing Systems, 28. https://doi.org/10.48550/arXiv.1506.01497
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. https://doi.org/10.48550/arXiv.1312.6229
Srivastava, R., & Tripathi, M. (2023). Systematic Exploration Using Intelligent Computing Techniques for Clinical Diagnosis of Gastrointestinal Disorder: A Review. Int. J. Exp. Res. Rev., 36, 265-284. https://doi.org/10.52756/ijerr.2023.v36.026
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. 1, I-I. https://doi.org/10.1109/cvpr.2001.990517
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. 1, I-I. https://doi.org/10.1109/cvpr.2001.990517