Gesture-Controlled Virtual Mouse with Voice Assistant using YOLOv8 and Convolutional Neural Networks
DOI:
https://doi.org/10.48001/978-81-980647-6-9-3Keywords:
Gesture Recognition, Voice Assistant, Convolutional Neural Networks, YOLOv8, AI, MediaPipe, Natural Language Processing, Transformer ModelsAbstract
The emergence of advanced computer vision and artificial intelligence (AI) technologies has catalyzed the development of innovative human-computer interaction (HCI) systems. This paper presents a Gesture-Controlled Virtual Mouse with Voice Assistant (GCVA) system, integrating gesture recognition and voice command capabilities to replace traditional input devices. The gesture recognition module, utilizing YOLOv8 and Convolutional Neural Networks (CNNs), detects hand gestures for actions like cursor movement, clicks, and volume control. The voice assistant module employs transformer-based models like BERT and Whisper to recognize and execute verbal commands. The system achieves a gesture accuracy of 99% and voice response times of 0.5 to 30 seconds, demonstrating its potential to enhance accessibility, practicality, and efficiency in digital environments.
The GCVA system bridges the gap between traditional input devices and innovative HCI solutions, focusing on enhancing user experience through real-time processing and robust design. This study further explores the algorithms, architecture, and future improvements for a scalable, user-adaptable system.
Downloads
References
Cardoso, T., Delgado, J., & Barata, J. (2015). Hand gesture recognition towards enhancing accessibility. Procedia Computer Science, 67, 419–429. https://doi.org/10.1016/j.procs.2015.09.287
K, K. (2024). Hand Glide: Gesture-controlled virtual mouse with voice assistant. International Journal for Research in Applied Science and Engineering Technology, 12(4), 5470–5476. https://doi.org/10.22214/ijraset.2024.61178
Kumar, A., Pachauri, R. K., Mishra, R., & Kuchhal, P. (Eds.). (2025). Intelligent communication, control and devices: Proceedings of ICICCD 2024 (Vol. 1164). Springer.
Patil, N. (2024). Gesture voice: Revolutionizing human-computer interaction with an AI-driven virtual mouse system. Turkish Online Journal of Qualitative Inquiry, 15(3), 12–19. https://doi.org/10.53555/tojqi.v15i3.10282
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 2015-Janua, 91–99.
Shibly, K. H., Kumar Dey, S., Islam, M. A., & Iftekhar Showrav, S. (2019). Design and development of hand gesture-based virtual mouse. 1st International Conference on Advances in Science, Engineering and Robotics Technology 2019 (ICASERT 2019). https://doi.org/10.1109/ICASERT.2019.8934612
Osman, S. (2024). Virtual keyboard-mouse in real-time using hand gesture and voice assistant. Journal of the ACS Advances in Computer Science, 15(1), 13–24. https://doi.org/10.21608/asc.2024.376718
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem (NIPS), 5999–6009.