The Effect of Data Augmentation Techniques on Persian Stance Detection

  • سال انتشار: 1401
  • محل انتشار: مجله بین المللی ارتباطات و فناوری اطلاعات، دوره: 15، شماره: 1
  • کد COI اختصاصی: JR_ITRC-15-1_007
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 162
دانلود فایل این مقاله

نویسندگان

Mojgan Farhoodi

Department of Information Technology Management, Science and Research Branch, Islamic Azad University, Tehran, Iran

Abbas Toloie Eshlaghi

Department of Information Technology Management, Science and Research Branch, Islamic Azad University, Tehran, Iran

Mohamadreza Motadel

Central Tehran Branch, Islamic Azad University, Tehran, Iran

چکیده

The purpose of stance detection is to identify the author's stance toward a particular topic or claim. Stance detection has become a key component in applications such as fake news detection, claim validation, argument searching, and author profiling. Although significant progress has been made in stance detection in languages such as English, little attention has been paid in some other languages, including Persian.  One of the main problems of research in Persian stance detection is the shortage of appropriate datasets. In this article, to address this problem, we consider data augmentation, the artificial creation of training data, which is used to conquer the shortage of datasets. In this research, we studied several methods of data augmentation such as EDA, back-translation, and merging source dataset with similar one in English language. The experimental results indicate that combining the primary data set with the translation of another dataset with similar content in another language (for example English) result in a significant improvement in the performance of the model.

کلیدواژه ها

stance detection, data augmentation. fake news, dataset

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.