Mitigating data imbalance for enhanced third-party insurance claim prediction using machine ‎learning

سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 40

فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JMMF-5-1_011

تاریخ نمایه سازی: 30 تیر 1404

چکیده مقاله:

Accurate prediction of third-party insurance claims is critical for pricing policies and managing risk. However, the highly imbalanced nature of insurance data—where non-claim cases vastly outnumber claim cases—poses significant challenges to standard predictive models. This study explores the use of machine learning algorithms to enhance claim prediction by directly addressing this imbalance. We use real data from the Insurance Research Center of Iran, incorporating variables such as driver characteristics, vehicle features, location, and claims history. Five models are evaluated: logistic regression, decision tree, bagging, random forest, and boosting. To handle the imbalance, we apply random undersampling, oversampling, and SMOTE. Model performance is assessed using accuracy, sensitivity, specificity, precision, and F-score. Results indicate that when data imbalance is properly treated, ensemble methods—particularly decision trees, bagging, and random forest—significantly outperform logistic regression and boosting, especially in detecting actual claim cases. The study underscores the importance of using appropriate resampling techniques and evaluation metrics in imbalanced settings. These findings can help insurers develop more reliable models for pricing and risk classification.

نویسندگان

Maryam Esna-Ashari

Insurance Research Center, Tehran, Iran

Hamideh Badi

Department of statistics‎, ‎University of Birjand‎, ‎Birjand‎, ‎Iran

Majid Chahkandi

Department of statistics‎, ‎University of Birjand‎, ‎Birjand‎, ‎Iran

‎Hamid Saadatfar

Department of Computer Engineering‎, ‎University of Birjand‎, ‎Birjand‎, ‎Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • E.M., Aldahasi, R.K.,Alsheikh, F.A., Khan, G., Jeon, Optimizing fraud detection ...
  • A., Abdallah, M.A., Maarof, A., Zainal, Fraud detection system: A ...
  • P., Baecke, L., Bocca , The value of vehicle telematics ...
  • K., Ding, B., Lev, X., Peng, T., Sun, M.A., Vasarhelyi, ...
  • G., Dionne (Ed.), Handbook of Insurance, ۲nd ed. Springer, ۲۰۱۳ ...
  • M., Esna-Ashari, Using a new data mining method for automobile ...
  • M., Firuzi, M., Shakouri, L., Kazemi, S., Zahedi, A data ...
  • Available from: https://sid.ir/paper/۱۰۰۷۹۴/en (in Persian) ...
  • E.W., Frees , Regression modeling with actuarial and financial applications, ...
  • N.K., Frempong, N., Nicholas, M.A., Boateng, Decision tree as a ...
  • I., Goodfellow, Y., Bengio, A., Courville, Machine learning basics, Deep ...
  • N., Hajiheidari, S., Khaleie, A., Farahi, The insured risk classification ...
  • M., Hanafy, R., Ming, Machine learning approaches for auto insurance ...
  • M., Hanafy, R., Ming, Improving imbalanced data classification in auto ...
  • J.T., Hancock, T.M., Khoshgoftaar, J.M., Johnson, Evaluating classifier performancewith highly ...
  • G., James, D., Witten, T., Hastie, R., Tibshirani, An Introduction ...
  • V., Kaelan, L., Kaelan, M., Novovi Buri, A nonparametric data ...
  • F., Khamesian, M., Esna-Ashari, E., Dei Ofosu-Hene, F., Khanizadeh, Risk ...
  • G., Kowshalya, M., Nandhini, Predicting fraudulent claims in automobile insurance, ...
  • M., Manteqipour, V., Ghorbani, M., Aalaei, Classifying age of policyholders ...
  • R., Ming, O., Mohamad, N., Innab, M., Hanafy, (۲۰۲۴). Bagging ...
  • J., Pesantez-Narvaez, M., Guillen, M., Alcaniz ˜ , Predicting motor ...
  • K.A., Smith, R.J., Willis, M., Brooks, An analysis of customer ...
  • G.G., Sundarkumar, V., Ravi, A novel hybrid under-sampling method for ...
  • M., Torkestani, A., Dehpanah, M.T., Taghavifard, S., Shafiee, Providing a ...
  • K.P.M.LP., Weerasinghe, M.C., Wijegunasekara, A comparative study of data miningalgorithms ...
  • M.V., Wuthrich, M., Merz ¨ , Statistical foundations of actuarial ...
  • S., Wuyu, P., Cerna, Risk assessment predictive modelling in insurance ...
  • نمایش کامل مراجع