The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

  • سال انتشار: 1397
  • محل انتشار: فصلنامه مدیریت فناوری اطلاعات، دوره: 10، شماره: 2
  • کد COI اختصاصی: JR_JITM-10-2_003
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 189
دانلود فایل این مقاله

نویسندگان

رضا سمیع زاده

Assistant Prof. of Industrial Engineering, Alzahra University, Tehran, Iran

الناز محمودی سعید آباد

MSc. Student of Industrial Engineering, Alzahra University, Tehran, Iran

چکیده

Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from ۱۳۹۲ to ۱۳۹۵ are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in ۵% confidence level.

کلیدواژه ها

Naïve bayes, neural network, Sentiment analysis, Support vector machine, Text mining

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.