Persian Sentiment Analysis: Feature Engineering, Datasets, and Challenges

  • سال انتشار: 1400
  • محل انتشار: نشریه سیستمهای هوشمند کاربردی و علوم اطلاعات، دوره: 2، شماره: 2
  • کد COI اختصاصی: JR_JAISIS-2-2_001
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 247
دانلود فایل این مقاله

نویسندگان

Razieh Asgarnezhad

Department of Computer Engineering, Faculty of Electrical and Computer Engineering, Technical and Vocational University, Kashan, Iran

S. Amirhassan Monadjemi

Senior Lecturer, School of continuing and lifelong education, National University of Singapore, ۱۱۹۰۷۷, Singapore.

چکیده

With the pervasive growth of web-based businesses, sentiment analysis of online reviews has attracted increasing interest among text mining experts. The problem is complicated when these reviews are in the Persian language since all existing works are focused on the English language, leaving other languages to multilingual models with limited resources. Due to these drawbacks, we try to give an insight regarding different stages of Persian Sentiment Analysis. This study presents a taxonomy of all Persian Sentiment Analysis works considering the most common techniques. The four steps are considered, namely, pre-processing, feature engineering, lexicon generation, and classification. As a result, we reveal that newer works focus on deep learning methods. Also, we suggest applying other methods such as heuristic and hybrid approaches to be worthwhile for the performance of classification in Persian Sentiment Analysis. Finally, we summarize the most important issues in this domain including the lack of dataset, lexicon, tools, etc.

کلیدواژه ها

Data mining, Text Mining, Sentiment analysis, Feature selection, Persian language

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.