Persian Texts Part of Speech Tagging Using Artificial Neural Networks

  • سال انتشار: 1395
  • محل انتشار: مجله محاسبات و امنیت، دوره: 3، شماره: 4
  • کد COI اختصاصی: JR_JCSE-3-4_004
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 172
دانلود فایل این مقاله

نویسندگان

Zahra Hosseini Pozveh

Science and research branch Islamic Azad university, Tehran

Amirhassan Monadjemi

University of Isfahan

Ali Ahmadi

Khajeh Nasir Toosi University of Technology

چکیده

Part of speech tagging (POS) is a basic task in natural language processing applications such as morphological parsing, information retrieval, machine translation and question answering. POS Tagging is the task of giving a word its part of speech (e.g. noun or verb). It is followed by a lot of challenging steps, in particular, disambiguation, named entity recognition and compound verb detection. Most of tagging approaches for Persian language are focused on the hidden Markov models (HMMs) and rule based models. Since Persian is a free word order language, those models cannot cope with all the complexity of this language for POS tagging, named entity, word sense disambiguation and other related tasks. In this paper, artificial neural networks (ANNs) are used for POS tagging due to their ability to learn complex patterns. In the first study ANN is fed with raw data and in the second phase, data are clustered and multiple ANNs are trained separately for each cluster. The accuracy rates of ۹۵.۷% and ۹۶.۱۷% were received respectively. Comparing the results with the other approaches makes it clear that neural networks can do POS tagging and named entity recognition more precise than other methods.

کلیدواژه ها

POS Tagging, Neural Networks, Persian

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.