A novel algorithm applied to classify imbalanced data in Breast Cancer Dataset

  • سال انتشار: 1393
  • محل انتشار: اولین کنفرانس ملی الگوریتم های فراابتکاری و کاربردهای آن در علوم و مهندسی
  • کد COI اختصاصی: MHAA01_045
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 1381
دانلود فایل این مقاله

نویسندگان

Aref Tahmasb

Graduate student, Shahid Bahonar University of Kerman

Ali Akbar Niknafs

Assistant Professor, Shahid Bahonar University of Kerman

Hamid Mirvaziri

Assistant Professor, Shahid Bahonar University of Kerman

چکیده

In today's world, the classification of imbalanced data is of great importance. Classifying such data is in a way that the class which is extremely important, in terms of Application Scope (minority class), includes fewer states compared to a class which is not (majority class). These datasets are called imbalanced data. Several methods have been proposed to classify these types of data. In the classification of these data, we are trying to increase the number of states of the minority class compared to majority class. In this paper, we suggest a new and effective algorithm in classification of 5-years data of cancer patients and there is an Imbalanced property in this dataset. The proposed algorithm is a combination of SMOTE algorithm, Imperialist Competitive Algorithm (ICA) and some well-known classifiers, and also to calculate the performance of the proposed method, some assessments such as GMean, Accuracy, Specificity, Sensitivity, have been used. The results show that combining the SMOTE+ICA+C5 algorithms would have the best result in the classification of imbalanced data. So this is an effective approach in imbalanced data classification.

کلیدواژه ها

Breast cancer, Classification, ICA, Synthetic Minority Over-sampling Technique

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.