Comparison Between XGBoost and Random Forest in Credit Risk Assessment Using the German Credit Dataset

Navid Goodarzi

Comparison Between XGBoost and Random Forest in Credit Risk Assessment Using the German Credit Dataset

محل انتشار: اولین کنفرانس بین المللی مدیریت، علوم کامپیوتر و هوش مصنوعی

سال انتشار: 1404

نوع سند: مقاله کنفرانسی

زبان: فارسی

مشاهده: 213

فایل این مقاله در 10 صفحه با فرمت PDF و WORD قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2600113

شناسه ملی سند علمی:

ICMCAI01_126

تاریخ نمایه سازی: 22 اردیبهشت 1405

چکیده مقاله:

The imperative for accurate, robust, and transparent credit risk models is a cornerstone of modern financial stability. This study conducts a rigorous comparative evaluation of two leading ensemble learning algorithms, XGBoost and Random Forest, within the context of credit riskassessment. Utilizing the benchmark German Credit Data dataset, we implement acomprehensive methodological framework encompassing meticulous preprocessing, class imbalance mitigation via the Synthetic Minority Over-sampling Technique (SMOTE), and systematic hyperparameter optimization using a ۵-fold cross-validate GridSearchCV. The predictive efficacy of the models is benchmarked through stratified ۱۰-fold cross-validated F۱- scores and Area Under the Receiver Operating Characteristic Curve (ROC-AUC). Critically, this research addresses a significant gap in the literature by moving beyond point-estimate comparisons to formally test for statistical significance using the Wilcoxon Signed-Rank Test and by dissecting model behavior through SHAP (SHapley Additive exPlanations) values for enhanced interpretability. Our findings reveal a nuanced performance landscape: while Random Forest exhibited a marginally higher mean F۱-score, the Wilcoxon test yielded a p-value of ۰.۰۶۴, indicating no statistically significant difference in the models' predictive capabilities. Both models achieved high discriminative power with ROC-AUC scores exceeding ۰.۹۰. SHAP analysis confirmed the primacy of features like 'Duration' and 'Credit Amount' while also uncovering subtle distinctions in feature interaction between the models. This study concludes that XGBoost and Random Forest demonstrate functional equivalence in this application, suggesting that model selection for practitioners could be guided by secondary criteria such as computational overhead,scalability, and the specific demands for model transparency.

کلیدواژه ها:

Credit Risk ، XGBoost ، Random Forest ، SHAP ، Wilcoxon Test ، SMOTE ، Machine Learning ، Financial Risk Modeling

نویسندگان

Navid Goodarzi

University Of Tehran