A hybrid data mining approach for predicting breast cancer survivability on imbalanced SEER data set

سال انتشار: 1395
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 668

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

CBCONF01_0296

تاریخ نمایه سازی: 16 شهریور 1395

چکیده مقاله:

With advances in diagnosis and treatment of breast cancer the number of patients who survive is more than the number of patients who die, So the breast cancer data sets have been imbalanced. An imbalanced problem is a challenging issue for Data Mining. In this study, we propose the hybrid approach to build a more accurate prediction model for 5-year survivability of breast cancer patients in presence of outliers and an imbalanced data set problem. To achieve this goal after data preprocessing and classifying data set into two classes, firstly outliers in minority class eliminated and boundary of minority class became stronger based on Borderline-SMOTE. Then three data mining techniques, such as Bayes Nets, Decision tree (C4.5) and 1-nearest neighbor search are applied to final improved data set. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean were utilized in order to evaluate the performance of proposed hybrid approach. Results showed that among all combinations, proposed approach with C4.5 presents best efficiency in criteria of accuracy, sensitivity, specificity, and G-mean with 98.962%, 0.926, 0.989 and 0.956, respectively.

نویسندگان

Samaneh Miri Rostami

Faculty of Computer Engineering & IT Shiraz University of Technology Shiraz, Iran

Marzieh Ahmadzadeh

Faculty of Computer Engineering & IT Shiraz University of Technology Shiraz, Iran

Raouf Khayami

Faculty of Computer Engineering & IT Shiraz University of Technology Shiraz, Iran