A New Dataset of Persian Handwritten Documentsand its Segmentation
محل انتشار: هفتمین کنفرانس ماشین بینایی و پردازش تصویر ایران
سال انتشار: 1390
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 2,124
فایل این مقاله در 5 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
ICMVIP07_138
تاریخ نمایه سازی: 28 مرداد 1391
چکیده مقاله:
In document image analysis and especially inhandwritten document image recognition, standard datasets playvital roles for evaluating performances of algorithms andcomparing results obtained by different groups of researchers. Inthis paper, an unconstrained Persian handwritten text dataset(PHTD) is introduced. The PHTD contains 140 handwrittendocuments of three different categories written by 40 individuals.Total number of text-lines and words/subwords in the dataset are1787 and 27073, respectively. In most of the PHTD documentseither an overlapping or a touching text-lines is present. Theaverage number of text-lines in documents of the PHTD is 13.Two types of ground truths based on pixels information andcontent information are generated for the dataset. Providingthese two types of ground truths for the PHTD, it can be utilizedin many areas of document image processing such as sentencerecognition/understanding, text-line segmentation, wordsegmentation, word recognition, and character segmentation. Toprovide a framework for other researches, recent text-linesegmentation results on this dataset are also reported
کلیدواژه ها:
نویسندگان
Alireza Alaei
Department of Studies in ComputerScience, University of MysoreMysore, ۵۷۰۰۰۶, India
P. Nagabhushan
Department of Studies in ComputerScience, University of MysoreMysore, ۵۷۰۰۰۶, India
Umapada Pal
Computer Vision and PatternRecognition Unit, Indian StatisticalInstitute, Kolkata–۱۰۸, India