Scene Text Detection and Recognition Using Maximally Stable Extremal Region

Authors

  • Golda Jeyasheeli P Department of Computer Science Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India.
  • Athinarayanan B Department of Computer Science Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India.
  • Manish T Department of Computer Science Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India.
  • Mohamad Umar M Department of Computer Science Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India.

DOI:

https://doi.org/10.37385/jaets.v6i1.5958

Keywords:

MSER, SWT, Text Detection, Text Recognition, Deep Learning, CRNN

Abstract

In recent years, scene text detection and recognition have become important research areas in computer vision and machine learning. Traditional text detection and recognition methods may struggle with detecting and recognizing text in images with low resolution, complex backgrounds, and varying font sizes. The proposed methodology addresses these challenges by combining multiple algorithms and using deep learning techniques. In this paper, we propose a method for scene text detection based on Maximally Stable Extremal Regions (MSER) combined with Stroke Width Transform (SWT) and recognition using Convolutional Recurrent Neural Networks (CRNN). Our method consists of two stages: text detection and text recognition. To detect text, we use MSER and SWT to extract candidate text regions from the input and then, we eradicate non-text regions using image to image translation. Finally, to recognize text, CRNN is used to recognize the text present in the detected regions. Our CRNN architecture consists of convolutional and recurrent layers, which enable us to capture both spatial and temporal features of the text. The methodology is evaluated on various benchmark datasets and has obtained good results with accuracy of 96% when compared to existing methods.

Downloads

Download data is not yet available.

References

Bagi, R., Dutta, T., Nigam, N., Verma, D., & Gupta, H. P. (2021). Met-MLTS: leveraging smartphones for end-to-end spotting of multilingual oriented scene texts and traffic signs in adverse meteorological conditions. IEEE Transactions on Intelligent Transportation Systems, 23(8), 12801-12810. https://doi.org/10.1109/TITS.2021.3117793

Cheng, P., Cai, Y., & Wang, W. (2019). A direct regression scene text detector with position-sensitive segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 30(11), 4171-4181. https://doi.org/10.1109/TCSVT.2019.2947475

Das, A., Palaiahnakote, S., Banerjee, A., Antonacopoulos, A., & Pal, U. (2024). Soft Set-based MSER End-to-End System for Occluded Scene Text Detection, Recognition and Prediction. Knowledge-Based Systems, 112593. https://doi.org/10.1016/j.knosys.2024.112593

Dutta, I. N., Chakraborty, N., Mollah, A. F., Basu, S., & Sarkar, R. (2019). Multi-lingual text localization from camera captured images based on foreground homogenity analysis. In Recent Developments in Machine Learning and Data Analytics: IC3 2018 (pp. 149-158). Springer Singapore. https://doi.org/10.1007/978-981-13-1280-9_15

Epshtein, B., Ofek, E., & Wexler, Y. (2010, June). Detecting text in natural scenes with stroke width transform. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 2963-2970). IEEE. https://doi.org/10.1109/CVPR.2010.5540041

Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., & Zhang, Y. (2022). Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE transactions on pattern analysis and machine intelligence, 45(6), 7123-7141. https://doi.org/10.1109/TPAMI.2022.3223908

Geng, T. (2024). Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3375497

Gomez, L., & Karatzas, D. (2014, August). MSER-based real-time text detection and tracking. In 2014 22nd International Conference on Pattern Recognition (pp. 3110-3115). IEEE. https://doi.org/10.1109/ICPR.2014.536

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27. https://doi.org/10.48550/arXiv.1406.2661

He, W., Zhang, X. Y., Yin, F., & Liu, C. L. (2017). Deep direct regression for multi-oriented scene text detection. In Proceedings of the IEEE international conference on computer vision (pp. 745-753). https://doi.org/10.1109/ICCV.2017.87

Islam, M. R., Mondal, C., Azam, M. K., & Islam, A. S. M. J. (2016, May). Text detection and recognition using enhanced MSER detection and a novel OCR technique. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) (pp. 15-20). IEEE. https://doi.org/10.1109/ICIEV.2016.7760054

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134). https://doi.org/10.48550/arXiv.1611.07004

Kai, H. E., Jinlong, T. A. N. G., Zikang, L. I. U., & Ziqi, Y. A. N. G. (2024). HAFE: A Hierarchical Awareness and Feature Enhancement Network for Scene Text Recognition. Knowledge-Based Systems, 284, 111178. https://doi.org/10.1016/j.knosys.2023.111178

Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., ... & Valveny, E. (2015, August). ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1156-1160). IEEE. https://doi.org/10.1109/ICDAR.2015.7333942

Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L. G., Mestre, S. R., ... & De Las Heras, L. P. (2013, August). ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition (pp. 1484-1493). IEEE. https://doi.org/10.1109/ICDAR.2013.221

Khalid, S., Shah, J. H., Sharif, M., Dahan, F., Saleem, R., & Masood, A. (2024). A Robust Intelligent System for Text-Based Traffic Signs Detection and Recognition in Challenging Weather Conditions. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3401044

Koo, H. I., & Kim, D. H. (2013). Scene text detection via connected component clustering and nontext filtering. IEEE transactions on image processing, 22(6), 2296-2305. https://doi.org/10.1109/TIP.2013.2249082

Liu, Y., Jin, L., & Fang, C. (2019). Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Transactions on Image Processing, 29, 2918-2930. https://doi.org/10.1109/TIP.2019.2954218

Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and vision computing, 22(10), 761-767. https://doi.org/10.1109/TIP.2019.2954218

Mirza, M. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. https://doi.org/10.48550/arXiv.1411.1784

Mu, D., Sun, W., Xu, G., & Li, W. (2021). Random blur data augmentation for scene text recognition. IEEE Access, 9, 136636-136646. https://doi.org/10.1109/ACCESS.2021.3117035

Mukhopadhyay, A., Kumar, S., Chowdhury, S. R., Chakraborty, N., Mollah, A. F., Basu, S., & Sarkar, R. (2019). Multi-lingual scene text detection using one-class classifier. International Journal of Computer Vision and Image Processing (IJCVIP), 9(2), 48-65. https://doi.org/10.4018/IJCVIP.2019040104

Panda, S., Ash, S., Chakraborty, N., Mollah, A. F., Basu, S., & Sarkar, R. (2020). Parameter tuning in MSER for text localization in multi-lingual camera-captured scene text images. In Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019 (pp. 999-1009). Springer Singapore. https://doi.org/10.1007/978-981-13-9042-5_86

Shi, B., Bai, X., & Yao, C. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11), 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371

Sun, W., Wang, Q., Hou, Z., Chen, X., Yan, Q., & Zhang, Y. (2024). DPGS: Cross-cooperation guided dynamic points generation for scene text spotting. Knowledge-Based Systems, 302, 112399. https://doi.org/10.1016/j.knosys.2024.112399

Tian, S., Zhu, K. X., Qin, H. B., & Yang, C. (2024). Dynamic receptive field adaptation for scene text recognition. Pattern Recognition Letters, 178, 55-61. https://doi.org/10.1016/j.patrec.2023.12.005

Tong, G., Dong, M., Sun, X., & Song, Y. (2022). Natural scene text detection and recognition based on saturation-incorporated multi-channel MSER. Knowledge-Based Systems, 250, 109040. https://doi.org/10.1016/j.knosys.2022.109040

Wu, L., Xu, Y., Hou, J., Chen, C. P., & Liu, C. L. (2022). A two-level rectification attention network for scene text recognition. IEEE Transactions on Multimedia, 25, 2404-2414. https://doi.org/10.1109/TMM.2022.3146779

Wu, Y., Kong, Q., Qian, C., Nappi, M., & Wan, S. (2023). End-PolarT: Polar Representation for End-to-End Scene Text Detection. Big Data Research, 34, 100410. https://doi.org/10.1016/j.bdr.2023.100410

Xu, Y., Liang, Z., Liang, Y., Li, X., Pan, W., You, J., ... & Scotti, F. (2024). Data-Driven Container Marking Detection and Recognition System with an Open Large-Scale Scene Text Dataset. IEEE Transactions on Emerging Topics in Computational Intelligence. https://doi.org/10.1109/TETCI.2024.3377680

Yan, X., Fang, Z., & Jin, Y. (2023). An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems, 280, 110964. https://doi.org/10.1016/j.knosys.2023.110964

Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., & Cao, Z. (2016). Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002. https://doi.org/10.48550/arXiv.1606.09002

Ye, Q., & Doermann, D. (2014). Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence, 37(7), 1480-1500. https://doi.org/10.1109/TPAMI.2014.2366765

Yin, X. C., Pei, W. Y., Zhang, J., & Hao, H. W. (2015). Multi-orientation scene text detection with adaptive clustering. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1930-1937. https://doi.org/10.1109/TPAMI.2014.2388210

Yu, W., Liu, Y., Zhu, X., Cao, H., Sun, X., & Bai, X. (2024). Turning a clip model into a scene text spotter. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2024.3379828

Zhang, J., & Kasturi, R. (2014). A novel text detection system based on character and link energies. IEEE Transactions on Image Processing, 23(9), 4187-4198. https://doi.org/10.1109/TIP.2014.2341935

Zhang, Z., Shen, W., Yao, C., & Bai, X. (2015). Symmetry-based text line detection in natural scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2558-2567). https://doi.org/10.1109/CVPR.2015.7298871

Zhou, G., Liu, Y., Tian, Z., & Su, Y. (2011, September). A new hybrid method to detect text in natural scene. In 2011 18th IEEE International Conference on Image Processing (pp. 2605-2608). IEEE. https://doi.org/10.1109/ICIP.2011.6116199

Downloads

Published

2024-12-15

How to Cite

P, G. J., B, A., T, M., & M, M. U. (2024). Scene Text Detection and Recognition Using Maximally Stable Extremal Region . Journal of Applied Engineering and Technological Science (JAETS), 6(1), 103–114. https://doi.org/10.37385/jaets.v6i1.5958