Scan-GPT: Bridging Visual Input and Textual Output for Question Solution

Sumit Raj Jha; Mukesh Kumar Bhardwaj; Shubham Bhugra; Gourav Kumar

doi:10.48001/joitc.2024.1223-28

Authors

Sumit Raj Jha Department of Computer Science and Engineering, Dronacharya Group of Institution, Greater Noida, Uttar Pradesh, India
Mukesh Kumar Bhardwaj Department of Computer Science and Engineering, Dronacharya Group of Institution, Greater Noida, Uttar Pradesh, India
Shubham Bhugra Department of Computer Science and Engineering, Dronacharya Group of Institution, Greater Noida, Uttar Pradesh, India
Gourav Kumar Department of Computer Science and Engineering, Dronacharya Group of Institution, Greater Noida, Uttar Pradesh, India

DOI:

https://doi.org/10.48001/joitc.2024.1223-28

Keywords:

Cutting-edge technologies, Efficiency, JavaScript, OCR (Optical Character Recognition), Web application

Abstract

The ScanGPT project represents an innovative approach to leveraging the power of advanced language models, specifically OpenAI's GPT-3.5, in conjunction with OCR technology, to provide users with a comprehensive platform for obtaining information. In a rapidly advancing landscape of artificial intelligence, ScanGPT serves as a bridge between textual and non-textual content, allowing users to interact with multiple sources and receive clear and accurate responses. Traditional search methods often fall short when dealing with non-textual content such as images containing valuable information. ScanGPT addresses these limitations by combining OCR technology with advanced language models to deliver precise answers based on both text and image inputs. This paper presents the architecture, functionality, and methodology of ScanGPT, highlighting its role in meeting the diverse needs of users seeking information. The proposed system architecture seamlessly integrates text and image processing capabilities, leveraging existing technologies such as ChatGPT, Microsoft Azure OCR, and the OpenAI API. Through a modular design and rigorous security and privacy measures, ScanGPT ensures scalability, flexibility, and user confidentiality. The role of HTML, CSS, and JavaScript in the user interface design is explored, emphasizing the importance of intuitive interfaces and dynamic capabilities in enhancing user experience. Additionally, existing solutions and challenges in conversational AI are reviewed, providing insights into the evolving landscape of AI-powered interactions. The proposed system architecture of ScanGPT offers a robust, scalable, and flexible solution for conversational AI, enabling users to interact with AI systems using both text and image inputs. By seamlessly integrating text and image processing capabilities, ScanGPT aims to redefine the boundaries of conversational AI platforms, providing users with a comprehensive and user-friendly experience. Future scope and potential advancements in conversational AI are also discussed, highlighting opportunities for integrating additional sensory inputs, personalization, and scalability. Through ongoing improvements in ethical AI considerations and linguistic capabilities, ScanGPT aims to remain a trustworthy and globally accessible technology, fostering wider adoption and cultural inclusivity in AI-driven interactions. Overall, ScanGPT represents a significant step forward in harnessing the power of advanced language models and OCR technology to provide users with accurate, contextually relevant information from diverse sources, paving the way for innovative solutions to everyday problems in the era of artificial intelligence.

Downloads

Download data is not yet available.

References

Gomes, L. M., Martins, F., & Guerra, H. (2020). Teaching web programming using the MEAN stack. In The Impact of the 4th Industrial Revolution on Engineering Education: Proceedings of the 22nd International Conference on Interactive Collaborative Learning (ICL2019)–Volume 2 22 (pp. 256-262). Springer International Publishing. https://doi.org/10.1007/978-3-030-40271-6_26.

Hajba, G. L., & Hajba, G. L. (2018). Website scraping in the cloud. Website Scraping with Python: Using BeautifulSoup and Scrapy, 193-217.

https://doi.org/10.1007/978-1-4842-3925-4_6.

Jordan, M., Ly, K., & Soosai Raj, A. G. (2024, March). Need a Programming Exercise Generated in Your Native Language? ChatGPT's Got Your Back: Automatic Generation of Non-English Programming Exercises Using OpenAI GPT-3.5. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (pp. 618-624). https://doi.org/10.1145/3626252.3630897.

Ughetta, W., & Kernighan, B. W. (2020, September). The Old Bailey and OCR: Benchmarking AWS, Azure, and GCP with 180,000 Page Images. In Proceedings of the ACM Symposium on Document Engineering 2020 (pp. 1-4). https://doi.org/10.1145/3395027.3419595.

Yen, C. M., & Yen, J. (2015). Cloud-based mechanical design-oriented python program development system. Applied Mechanics and Materials, 764, 848-852.

https://doi.org/10.4028/www.scientific.net/AMM.764-765.848.