A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

سال انتشار: 1399
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 294

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:


تاریخ نمایه سازی: 1 مرداد 1399

چکیده مقاله:

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performance in Conditional Random Field-based Persian Named Entity Recognition, a several syntactic features based on dependency grammar along with some morphological and language-independent features have been designed in order to extract suitable features for the learning phase. In this implementation, designed features have been applied to Conditional Random Field to build our model. To evaluate our system, the Persian syntactic dependency Treebank with about 30,000 sentences, prepared in NOOR Islamic science computer research center, has been implemented. This Treebank has Named-Entity tags, such as Person, Organization and location. The result of this study showed that our approach achieved 86.86% precision, 80.29% recall and 83.44% F-measure which are relatively higher than those values reported for other Persian NER methods.


L. Jafar Tafreshi

Computer Research Center of Islamic Sciences (CRCIS), Tehran, Iran.

F. Soltanzadeh

General Linguistics Department, Allameh Tabatabaei University, Tehran, Iran.