Persian Texts Part of Speech Tagging Using Artificial Neural Networks
- سال انتشار: 1395
- محل انتشار: مجله محاسبات و امنیت، دوره: 3، شماره: 4
- کد COI اختصاصی: JR_JCSE-3-4_004
- زبان مقاله: انگلیسی
- تعداد مشاهده: 172
نویسندگان
Science and research branch Islamic Azad university, Tehran
University of Isfahan
Khajeh Nasir Toosi University of Technology
چکیده
Part of speech tagging (POS) is a basic task in natural language processing applications such as morphological parsing, information retrieval, machine translation and question answering. POS Tagging is the task of giving a word its part of speech (e.g. noun or verb). It is followed by a lot of challenging steps, in particular, disambiguation, named entity recognition and compound verb detection. Most of tagging approaches for Persian language are focused on the hidden Markov models (HMMs) and rule based models. Since Persian is a free word order language, those models cannot cope with all the complexity of this language for POS tagging, named entity, word sense disambiguation and other related tasks. In this paper, artificial neural networks (ANNs) are used for POS tagging due to their ability to learn complex patterns. In the first study ANN is fed with raw data and in the second phase, data are clustered and multiple ANNs are trained separately for each cluster. The accuracy rates of ۹۵.۷% and ۹۶.۱۷% were received respectively. Comparing the results with the other approaches makes it clear that neural networks can do POS tagging and named entity recognition more precise than other methods.کلیدواژه ها
POS Tagging, Neural Networks, Persianاطلاعات بیشتر در مورد COI
COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.
کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.