A Hybrid Approach for Complex Layout Detection of Newspapers in Gurumukhi Script Using Deep Learning

Keywords: Deep Learning, CNN, Gurumukhi, Layout, Newspapers

Abstract

Layout analysis is the crucial stage in the recognition system of newspapers. A good layout analysis results in better recognition results. The complexity of newspaper layout structures poses a formidable challenge in digitization. The intricate arrangement of text, images, and various sections within a newspaper demands sophisticated algorithms and techniques for accurate layout detection. The paper introduces a diverse set of methodologies from existing literature, highlighting the evolution of techniques for newspaper layout analysis. In this paper, we present a novel method to detect the complex layout of newspapers in the Gurumukhi script by using a hybrid approach. The method developed consists of two parts. In the first part, we proposed an algorithm to remove pictures and graphics from Punjabi newspaper images that involve various image preprocessing tasks based on binarization, finding contours, and erosion on the image to remove the graphics from the image. This method removes pictures from complex non-Manhattan layouts. We have tested this algorithm on 100 newspaper images, giving an accuracy of 96.22%. In the second part, a dataset of 500 newspapers was created with images labeled with five classes on which the model was trained. Finally, we have trained the deep-leaning model based on a convolutional network to detect the columns of text in newspapers. We have used four different architectures of CNN and compared their performance based on different metrics such as precision, recall, and F1 score. We have tested this method on a number of newspapers in the Gurumukhi script. We have achieved an accuracy of 95.53% with this approach.

References

Alshameri, A., Abdou, S. M., & Mostafa, K. (2012). A combined algorithm for layout analysis of Arabic document images and text lines extraction. International Journal of Computer Applications, 49(23), 30–37. https://doi.org/10.5120/7945-1282

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A. Q., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8(53). https://doi.org/10.1186/s40537-021-00444-8

BinMakhashen, G. M., & Mahmoud, S. A. (2019). Document Layout analysis. ACM Computing Surveys, 52(6), 1–36. https://doi.org/10.1145/3355610

Biswas, S., Riba, P., Lladós, J., & Pal, U. (2021). Beyond document object detection: instance-level segmentation of complex layouts. International Journal on Document Analysis and Recognition (IJDAR), 24(3), 269–281. https://doi.org/10.1007/s10032-021-00380-6

Chowdhury, S. P., Mandal, S., Das, A. K., & Chanda, B. (2007). Segmentation of Text and Graphics from Document Images.International Conference on Document Analysis and Recognition ICDAR,Curitiba,Brazil, 2, 619-623. https://doi.org/10.1109/icdar.2007.4376989

Kosaraju, S., Masum, M., Tsaku, N. Z., Patel, P., Bayramoglu, T., Modgil, G., & Kang, M. (2019). DoT-Net: Document Layout Classification Using Texture-Based CNN. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, pp. 1029-1034. https://doi.org/10.1109/icdar.2019.00168

Li, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., & Zhou, M. (2020). DocBank: A Benchmark Dataset for Document Layout Analysis. arXiv. https://doi.org/10.18653/v1/2020.coling-main.82

Liebl, B., & Burghardt, M. (2021). An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers. 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, pp. 5153-5160. https://doi.org/10.1109/icpr48806.2021.9412571

Oliveira, S. A., Séguin, B., & Kaplan, F. (2018). dhSegment: A Generic Deep-Learning Approach for Document Segmentation. Arxiv. https://doi.org/10.1109/icfhr-2018.2018.00011

Patil, S., Vijayakumar, V., Mahadevkar, S., Athawade, R., Maheshwari, L., Kumbhare, S., Garg, Y., Dharrao, D., Kamat, P., & Kotecha, K. (2022). Enhancing Optical Character Recognition on Images with Mixed Text Using Semantic Segmentation. Journal of Sensor and Actuator Networks, 11(4), 63. https://doi.org/10.3390/jsan11040063

Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225–236. https://doi.org/10.1016/s0031-3203(99)00055-2

Soma, S., & Shilpa. (2022). A Novel Approach for Newspaper Block Segmentation using Run-Length Smoothing Algorithm. 2022 IEEE 2nd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, Karnataka, India, pp. 1-6. https://doi.org/10.1109/icmnwc56175.2022.10031933

Vasilopoulos, N., & Kavallieratou, E. (2017). Complex layout analysis based on contour classification and morphological operations. Engineering Applications of Artificial Intelligence, 65, 220–229. https://doi.org/10.1016/j.engappai.2017.08.002

Wu, X., Ma, T., Du, X., Hu, Z., Yang, J., & He, L. (2023). DRFN: A unified framework for complex document layout analysis. Information Processing and Management, 60(3), 103339. https://doi.org/10.1016/j.ipm.2023.103339

Xu, C., Shi, C., Bi, H., Liu, C., Yuan, Y., Guo, H., & Chen, Y. (2021). A page object detection method based on mask R-CNN. IEEE Access, 9, 143448–143457. https://doi.org/10.1109/access.2021.3121152

Zhu, W., Sokhandan, N., Yang, G., Martin, S., & Sathyanarayana, S. (2022). DocBed: A Multi-Stage OCR Solution for Documents with Complex Layouts. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12643–12649. https://doi.org/10.1609/aaai.v36i11.21539

Zulfiqar, A., Ul-Hasan, A., & Shafait, F. (2019). Logical Layout Analysis using Deep Learning. Digital Image Computing: Techniques and Applications (DICTA), Perth, WA, Australia, 1-5. https://doi.org/10.1109/dicta47822.2019.8946046

Published
2023-11-30
How to Cite
Kumar, A., & Lehal, G. (2023). A Hybrid Approach for Complex Layout Detection of Newspapers in Gurumukhi Script Using Deep Learning. International Journal of Experimental Research and Review, 35, 34-42. https://doi.org/10.52756/ijerr.2023.v35spl.004