Gesture-Controlled Virtual Mouse with Voice Assistant using YOLOv8 and Convolutional Neural Networks

Authors

DOI:

https://doi.org/10.48001/978-81-980647-6-9-3

Keywords:

Gesture Recognition, Voice Assistant, Convolutional Neural Networks, YOLOv8, AI, MediaPipe, Natural Language Processing, Transformer Models

Abstract

The emergence of advanced computer vision and artificial intelligence (AI) technologies has catalyzed the development of innovative human-computer interaction (HCI) systems. This paper presents a Gesture-Controlled Virtual Mouse with Voice Assistant (GCVA) system, integrating gesture recognition and voice command capabilities to replace traditional input devices. The gesture recognition module, utilizing YOLOv8 and Convolutional Neural Networks (CNNs), detects hand gestures for actions like cursor movement, clicks, and volume control. The voice assistant module employs transformer-based models like BERT and Whisper to recognize and execute verbal commands. The system achieves a gesture accuracy of 99% and voice response times of 0.5 to 30 seconds, demonstrating its potential to enhance accessibility, practicality, and efficiency in digital environments.
The GCVA system bridges the gap between traditional input devices and innovative HCI solutions, focusing on enhancing user experience through real-time processing and robust design. This study further explores the algorithms, architecture, and future improvements for a scalable, user-adaptable system.

Downloads

References

Cardoso, T., Delgado, J., & Barata, J. (2015). Hand gesture recognition towards enhancing accessibility. Procedia Computer Science, 67, 419–429. https://doi.org/10.1016/j.procs.2015.09.287

K, K. (2024). Hand Glide: Gesture-controlled virtual mouse with voice assistant. International Journal for Research in Applied Science and Engineering Technology, 12(4), 5470–5476. https://doi.org/10.22214/ijraset.2024.61178

Kumar, A., Pachauri, R. K., Mishra, R., & Kuchhal, P. (Eds.). (2025). Intelligent communication, control and devices: Proceedings of ICICCD 2024 (Vol. 1164). Springer.

Patil, N. (2024). Gesture voice: Revolutionizing human-computer interaction with an AI-driven virtual mouse system. Turkish Online Journal of Qualitative Inquiry, 15(3), 12–19. https://doi.org/10.53555/tojqi.v15i3.10282

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 2015-Janua, 91–99.

Shibly, K. H., Kumar Dey, S., Islam, M. A., & Iftekhar Showrav, S. (2019). Design and development of hand gesture-based virtual mouse. 1st International Conference on Advances in Science, Engineering and Robotics Technology 2019 (ICASERT 2019). https://doi.org/10.1109/ICASERT.2019.8934612

Osman, S. (2024). Virtual keyboard-mouse in real-time using hand gesture and voice assistant. Journal of the ACS Advances in Computer Science, 15(1), 13–24. https://doi.org/10.21608/asc.2024.376718

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem (NIPS), 5999–6009.

Downloads

Published

2025-03-17

How to Cite

D.Kishore, N.Revathy, B.Keerthana, J.Gayathri, & V.Latha Sivasakari. (2025). Gesture-Controlled Virtual Mouse with Voice Assistant using YOLOv8 and Convolutional Neural Networks. QTanalytics Publication (Books), 24–32. https://doi.org/10.48001/978-81-980647-6-9-3