Experimental Comparison of Financial Distress Prediction Models Using Imbalanced data sets

seyed behrooz, razavi ghomi; Alireza, Mehrazin; Abolghasem, Massihabadi; Mohammad reza, shourvarzi

Experimental Comparison of Financial Distress Prediction Models Using Imbalanced data sets

عنوان مقاله: Experimental Comparison of Financial Distress Prediction Models Using Imbalanced data sets
شناسه ملی مقاله: JR_AMFA-7-3_016
منتشر شده در در سال 1401

مشخصات نویسندگان مقاله:

seyed behrooz razavi ghomi - Department of Accounting, Neyshabur Branch, Islamic Azad University,Neyshabur, Iran
Alireza Mehrazin - Department of Accounting, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
Abolghasem Massihabadi - Department of Accounting, Sabzevar Branch ,Islamic Azad University, Sabzevar, Iran
Mohammad reza shourvarzi - Department of Accounting, Neyshabur Branch , Islamic Azad University, Neyshabur, Iran

خلاصه مقاله:

From machine learning perspective, the problem of predicting financial distress is challenging because the distribution of the classes is extremely imbalanced. The goal of this study was comparing the performance of financial distress prediction models for the imbalanced data sets with different proportions. In this study, the data of the previous year before financial distress was used for ۷۶۰ company year for the time period of ۲۰۰۷-۲۰۱۷. Besides using traditional classifications such as logistic regression, linear discriminant analysis, artificial neural network, and the classification models of least square support vector machine with four kernel functions, random forest and the Knn algorithm, the measures of the area under the curve and Friedman and Nemenyi tests were also utilized to determine the average rank and the difference significance of the Auc of the models. For selecting the models´ optimal parameters, the combined method of grid search optimization and cross validation was used. The results of this experimental study showed that for the balanced and imbalanced datasets with lower proportions, the best performance was for the random forest. For more imbalanced datasets, the best performance belonged to the least square support vector machine with sigmoid, radial, and linear kernel functions; performance of Knn algorithm had no significant difference from the other models and the performance of the artificial neural network was average or appropriate. Also, the performances of the linear logistic regression and linear discriminant analysis were weaker than other nonlinear models.

کلمات کلیدی:

Imbalanced data sets, Financial distress prediction models, Grid search optimization, Tuning parameters, Financial ratios

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1461767/