Colon Cancer Detection on Imbalanced Dataset based on using SMOTE

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 83

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_160

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

Colon cancer is the third most common cancer and the second most malignant cancer in theworld, which kills many people every year. In ۲۰۲۰, colon cancer claimed ۵۷۶,۸۵۸ lives globally,prompting a deep dive into a groundbreaking machine learning project for colon cancer detection [۱,۲]. Since the diagnosis of cancer with laboratory methods is expensive, the early detection of thisdisease is the main imaginable approach to increase the probability of survival of patients. In this article,machine learning techniques have been used to diagnose the disease on colon cancer gene expressiondata[۳]. Since working with colon cancer gene expression data has a series of challenges, such as thelarge number of features, the small number of samples, and the imbalance between the data classes inthis paper, we used the Synthetic Minority Over-sampling Technique (SMOTE) [۴], method to balancethe data. After applying SMOTE, the number of samples increased after SMOTE in the minority class,which made the classes balanced. We compared the accuracy on the data before and after balancing ondifferent categories. In the result we found that balancing between the classes will have a higher levelof accuracy, precision and recall. Another contribution in our study is applying feature selectiontechniques such as PCA, RFE (recursive feature elimination), after classifying the new data usingLogistic Regression, Naive Bayes, and Support Vector Machines (SVM), we achieved to ۱۰۰%accuracy (with F۱-score=۱). This was a significant turning point that greatly impacted the success ofour study. This study emphasizes the importance of balancing techniques, especially in managingunbalanced data sets. This highlights the influential role of feature selection in increasing algorithmicperformance and highlights the importance of machine learning in diagnosing colon cancer with thebest accuracy.

کلیدواژه ها:

نویسندگان

Seyedeh Zahra Ahmadi

Electrical and computer Engineering Department, University of Science and Technology of Mazandaran, Behshahr, Iran

Zahra Farokhi

Electrical and computer Engineering Department, University of Science and Technology of Mazandaran, Behshahr, Iran

Reza Javanmard Alitappeh

Electrical and computer Engineering Department, University of Science and Technology of Mazandaran, Behshahr, Iran