Providing an efficient method based on machine learning for classifying imbalanced datasets
سال انتشار: 1397
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 474
فایل این مقاله در 8 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
این مقاله در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
KTCONG01_002
تاریخ نمایه سازی: 21 خرداد 1398
چکیده مقاله:
One of the most important issues in data mining is classifying imbalanced datasets. In many supervised learning applications, there is a significant difference between the prior probabilities of different classes, such as between the probabilities with which an example belongs to the different classes of the classification problem. This situation is known as the class imbalance problem (Chawla et al, 2004). Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews (Garcia et al, 2009). The term imbalanced dataset is generally referred to a dataset that has many differences in the number of instances in various classes (Wang and Yao, 2009). Traditional classification methods do not act well on imbalanced data in order to minimize overall errors, since they generally assume that the distribution of classes is balanced. This issue is very important and is considered as a challenging issue. In this work, the data is classified according to the Bagging algorithm, which uses the C4.5 Cost- Sensitive Random Tree as a single classifier. The imperialist competitive algorithm has also been used to determine the cost of misclassify classes in order to construct a cost-sensitive tree.
کلیدواژه ها:
Imbalanced dataset ، bagging algorithm ، C4.5 cost-sensitive random tree ، imperialist competitive algorithm ، G-Mean criterion
نویسندگان
Mostafa Boroumandzadeh
Department of Computer Engineering and Information Technology, Payame Noor University, IR. Iran