Improving Persian Sentiment Analysis Using Opposing Polarity Phrases

  • سال انتشار: 1397
  • محل انتشار: چهارمین کنفرانس بین المللی وب پژوهی
  • کد COI اختصاصی: IRANWEB04_016
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 934
دانلود فایل این مقاله

نویسندگان

Batoul Botshekanan Dehkordi

Master Student of Artificial Intelligence, Safahan Institute of Higher Education, Isfahan, Iran

Mohammad Ehsan Basiri

Assistant Professor of Computer Engineering, Shahrekord University, Shahrekord Iran

چکیده

The increasing growth of Web has given people the ability to simply express their opinion and know others’ opinion. Mining viewpoints and opinion or sentiment analysis is considered as a subfield of text mining and its main goal is to find writer’s opinion about a topic. Meeting this goal is not a simple task since emotions in a sentence or a phrase are usually recognized by combining emotions of its words. In this paper, we concentrate on bipolar terms which are those phrases containing at least one positive and one negative word. In order to consider bipolar terms, phrases with opposing polarity are first extracted from PerSent dataset then, based on the words of these phrases and their polarity in the sentence the final score is computed. Then, the score of each sentence is calculated using CNRC lexicon and maximum of absolute values, difference, and average methods with and without considering bipolar terms. The results of implementation of the proposed method show that employing bipolar terms improves the lexicon-based approach for both polarity detection and score prediction problems

کلیدواژه ها

Text Mining, Opinion Mining, Lexicon-based Method, Bipolar Terms, Persian Language

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.