TFDF, not TF-IDF in Financial Analysis

  • سال انتشار: 1402
  • محل انتشار: مجله محاسبات و امنیت، دوره: 10، شماره: 2
  • کد COI اختصاصی: JR_JCSE-10-2_003
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 66
دانلود فایل این مقاله

نویسندگان

Maxam Haseme

Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.

mehran rezaei

Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.

Marjan Kaedi

Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.

چکیده

Textual analysis in the realm of business depends on text-processing techniques borrowed mainly from information retrieval. Yet, these text-processing techniques are not viable in text-based financial forecasting. In this paper, we suggest developing financial home-grown techniques for processing textual data, specifically in the course of scoring words where standard techniques are not appropriate in financial analysis. On that matter, we pursue two issues. First, we examine major information retrieval heuristics, where we find TF-IDF too facile not only in predicting trends but also in generating accurate results (in terms of errors) on large numbers in text-based financial analysis. Second, we work on a new heuristic satisfying financial concerns. We consider the relationship between the publication rate of information and its importance. The proposed heuristic provides results of unmatchable performance in both predicting trends and precision measures. In an additional analysis, we optimize our scheme using a genetic algorithm as an optimization technique and get greater precision. In comparison with TF-IDF, our proposed heuristic conduces to a ۳۸.۵ percent lower error in closeness measures which is again reduced by ۱۶.۴۶ percent with the help of a genetic algorithm. Our findings suggest that researchers in the field of financial textual analysis should not rely on standard information retrieval heuristics.

کلیدواژه ها

Financial textual analysis, Term weighting, Genetic Algorithm, Stock market

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.