A New Dataset of Persian Handwritten Documentsand its Segmentation

Alireza Alaei; P. Nagabhushan; Umapada Pal

A New Dataset of Persian Handwritten Documentsand its Segmentation

محل انتشار: هفتمین کنفرانس ماشین بینایی و پردازش تصویر ایران

سال انتشار: 1390

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 2,364

فایل این مقاله در 5 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/159172

شناسه ملی سند علمی:

ICMVIP07_138

تاریخ نمایه سازی: 28 مرداد 1391

چکیده مقاله:

In document image analysis and especially inhandwritten document image recognition, standard datasets playvital roles for evaluating performances of algorithms andcomparing results obtained by different groups of researchers. Inthis paper, an unconstrained Persian handwritten text dataset(PHTD) is introduced. The PHTD contains 140 handwrittendocuments of three different categories written by 40 individuals.Total number of text-lines and words/subwords in the dataset are1787 and 27073, respectively. In most of the PHTD documentseither an overlapping or a touching text-lines is present. Theaverage number of text-lines in documents of the PHTD is 13.Two types of ground truths based on pixels information andcontent information are generated for the dataset. Providingthese two types of ground truths for the PHTD, it can be utilizedin many areas of document image processing such as sentencerecognition/understanding, text-line segmentation, wordsegmentation, word recognition, and character segmentation. Toprovide a framework for other researches, recent text-linesegmentation results on this dataset are also reported

کلیدواژه ها:

Handwritten document ، Persian handwrittenrecognition ، Persian handwritten dataset ، Ground truth

نویسندگان

Alireza Alaei

Department of Studies in ComputerScience, University of MysoreMysore, ۵۷۰۰۰۶, India

P. Nagabhushan

Department of Studies in ComputerScience, University of MysoreMysore, ۵۷۰۰۰۶, India

Umapada Pal

Computer Vision and PatternRecognition Unit, Indian StatisticalInstitute, Kolkata–۱۰۸, India