A hybrid data mining approach for predicting breast cancer survivability on imbalanced SEER data set
سال انتشار: 1395
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 668
فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
این مقاله در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CBCONF01_0296
تاریخ نمایه سازی: 16 شهریور 1395
چکیده مقاله:
With advances in diagnosis and treatment of breast cancer the number of patients who survive is more than the number of patients who die, So the breast cancer data sets have been imbalanced. An imbalanced problem is a challenging issue for Data Mining. In this study, we propose the hybrid approach to build a more accurate prediction model for 5-year survivability of breast cancer patients in presence of outliers and an imbalanced data set problem. To achieve this goal after data preprocessing and classifying data set into two classes, firstly outliers in minority class eliminated and boundary of minority class became stronger based on Borderline-SMOTE. Then three data mining techniques, such as Bayes Nets, Decision tree (C4.5) and 1-nearest neighbor search are applied to final improved data set. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean were utilized in order to evaluate the performance of proposed hybrid approach. Results showed that among all combinations, proposed approach with C4.5 presents best efficiency in criteria of accuracy, sensitivity, specificity, and G-mean with 98.962%, 0.926, 0.989 and 0.956, respectively.
کلیدواژه ها:
نویسندگان
Samaneh Miri Rostami
Faculty of Computer Engineering & IT Shiraz University of Technology Shiraz, Iran
Marzieh Ahmadzadeh
Faculty of Computer Engineering & IT Shiraz University of Technology Shiraz, Iran
Raouf Khayami
Faculty of Computer Engineering & IT Shiraz University of Technology Shiraz, Iran