A Transformer-Based Approach with Contextual Position Encoding for Robust Persian Text Recognition in the wild

Zobeir Raisi; Vali Mohammad Nazarzehi

A Transformer-Based Approach with Contextual Position Encoding for Robust Persian Text Recognition in the wild

محل انتشار: مجله هوش مصنوعی و داده کاوی، دوره: 12، شماره: 3

سال انتشار: 1403

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 149

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2145266

شناسه ملی سند علمی:

JR_JADM-12-3_010

تاریخ نمایه سازی: 11 دی 1403

چکیده مقاله:

The Persian language presents unique challenges for scene text recognition due to its distinctive script. Despite advancements in AI, recognition in non-Latin scripts like Persian still faces difficulties. In this paper, we extend the vanilla transformer architecture to recognize arbitrary shapes of Persian text instances. We apply Contextual Position Encoding (CPE) to the baseline transformer architecture to improve the recognition of Persian scripts in wild images, especially for oriented and spaced characters. The CPE utilizes position information to generate contrastive data pairs that help better in capturing Persian characters written in a different direction. Moreover, we evaluate several state-of-the-art deep-learning models using our prepared challenging Persian scene text recognition dataset and develop a transformer-based architecture to enhance recognition accuracy. Our proposed scene text recognition architecture achieves superior word recognition accuracy compared to existing methods on a real-world Persian text dataset.

کلیدواژه ها:

Scene Text Recognition ، Persian Scripts ، Contextual Position Encoding ، Transformers ، deep learning

نویسندگان

Zobeir Raisi

Electrical Engineering Department, Chabahar Maritime University, Chabahar, Iran.

Vali Mohammad Nazarzehi

Electrical Engineering Department, Chabahar Maritime University, Chabahar, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

J. Achiam et al., "GPT-۴ Technical Report," arXiv preprint arXiv:۲۳۰۳.۰۸۷۷۴, ...
F. Alimorad et al., "Synthesizing an Image Dataset for Text ...
J. Baek et al., "What Is Wrong with Scene Text ...
D. Bautista and R. Atienza, "Scene Text Recognition with Permuted ...
F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large Scale ...
R. Buoy et al., "PARSTR: Partially Autoregressive Scene Text Recognition," ...
X. Chen et al., "Text Recognition in the Wild: A ...
J. Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers ...
A. Fateh et al., "Persian Printed Text Line Detection Based ...
O. Golovneva et al., "Contextual Position Encoding: Learning to Count ...
A. Gupta et al., "Synthetic Data for Text Localisation in ...
K. He et al., "Deep Residual Learning for Image Recognition," ...
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," in Neural ...
M. Jaderberg et al., "Synthetic Data and Artificial Neural Networks ...
M. Jaderberg et al., "Spatial Transformer Networks," Proceedings of the ...
L. Kang et al., "Pay Attention to What You Read: ...
D. Karatzas et al., "ICDAR ۲۰۱۳ Robust Reading Competition," ۲۰۱۳ ...
D. Karatzas et al., "ICDAR ۲۰۱۵ Competition on Robust Reading," ...
S. Kheirinejad et al., "Persian Text Based Traffic Sign Detection ...
A. Kirillov et al., "Segment Anything," ۲۰۲۳ IEEE/CVF International Conference ...
J. Lee et al., "On Recognizing Texts of Arbitrary Shapes ...
V. I. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions, ...
W. Liu et al., "STAR-Net: A Spatial Attention Residue Network ...
X. Liu et al., "Learning to Encode Position for Transformer ...
S. Long et al., "Scene Text Detection and Recognition: The ...
Z. Raisi and J. Zelek, “Visual Place Recognition from end-to-end ...
A. Mishra et al., "Scene Text Recognition Using Higher Order ...
T. Q. Phan et al., "Recognizing Text with Perspective Distortion ...
A. Rahman et al., "UTRNet: High-Resolution Urdu Text Recognition in ...
M. Rahmati et al., "Printed Persian OCR System Using Deep ...
Z. Raisi and J. Zelek, "Occluded Text Detection and Recognition ...
Z. Raisi, M. Naiel, P. Fieguth, S. Wardell, and J. ...
Z. Raisi et al., "۲LSPE: ۲D Learnable Sinusoidal Positional Encoding ...
Z. Raisi, "Text Detection and Recognition in the Wild," PhD ...
A. Ramesh et al., "Hierarchical Text-Conditional Image Generation with CLIP ...
A. Risnumawan et al., "A Robust Arbitrary Text Detection System ...
B. Shi, X. Bai, and C. Yao, "An End-to-End Trainable ...
B. Shi et al., "Robust Scene Text Recognition with Automatic ...
B. Shi et al., "ASTER: An Attentional Scene Text Recognizer ...
Y. Sun et al., "ICDAR ۲۰۱۹ Competition on Large-Scale Street ...
R. Anil et al., "Gemini: A Family of Highly Capable ...
A. Vaswani et al., "Attention is All You Need," Proceedings ...
A. Veit et al., "COCO-Text: Dataset and Benchmark for Text ...
B.Wang et al., "On Position Embeddings in BERT," International Conference ...
K. Wang and S. Belongie, "Word Spotting in the Wild," ...
F. Zhan and S. Lu, "ESIR: End-To-End Scene Text Recognition ...
H. Zhang et al., "Self-Attention Generative Adversarial Networks," Proceedings of ...
S. Zhao et al., "CLIP۴STR: A Simple Baseline for Scene ...
F. Ariai et al., "Enhancing Aspect-based Sentiment Analysis with ParsBERT ...

نمایش کامل مراجع