Identifying and Evaluating the Impact of Key Predictive Features in Breast Cancer Prediction: A Data-Driven Study

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 42

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

AIMS02_025

تاریخ نمایه سازی: 29 تیر 1404

چکیده مقاله:

Background and Aims: Breast cancer is the most common cancer among women globally, impacting quality of life and straining healthcare systems. Identifying and evaluating the key features of breast cancer is essential for improving early detection and predictive accuracy. This study aims to identify and assess the impact of these critical features to enhance diagnostic and treatment strategies. Methods: In this study, we used the Wisconsin Breast Cancer Diagnosis dataset, which consists of ۵۶۹ samples and ۳۱ attributes. These attributes were extracted from digitized images of fine needle aspirates (FNAs) of breast masses and include features such as radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. Each feature is represented by three statistical measures: mean, standard error, and worst value. We conducted a correlation analysis to assess relationships between the attributes and the target variable. Moreover, we performed feature selection using several methods, including Random Forest (RF), Logistic Regression (LR), XGBoost, Gini Index, and Information Gain. All analyses were performed using Python version ۳.۱۲.۷. Results: The correlation analysis showed strong correlations between the diagnosis and the following features: concave points (۰.۷۸), perimeter (۰.۷۴), radius (۰.۷۳), area (۰.۷۱), and concavity (۰.۷۰). In the Random Forest (RF) model, the feature importance scores were: concave points (۰.۳۰۲), perimeter (۰.۱۴۴), concavity (۰.۱۴۱), radius (۰.۱۲۶), and area (۰.۱۰۸). For the XGBoost model, concave points (۰.۷۰۲) were the most important feature, followed by area (۰.۰۷۹), texture (۰.۰۵۷), perimeter (۰.۰۳۵), and concavity (۰.۰۲۸). According to the Gini Index, concave points (۰.۷۱۱) ranked highest, followed by texture (۰.۰۸۰) and area (۰.۰۵۹). In terms of Information Gain (Entropy), the most significant feature was concave points (۰.۶۱۲), followed by area (۰.۱۴۹) and texture (۰.۱۱۹). Conclusion: This study underscores the importance of specific predictive features in breast cancer diagnosis. Strong correlations were identified, particularly with

نویسندگان

Parviz Marouzi

Department of Health Information Technology, School of Paramedical and Rehabilitation Sciences, Mashhad University of Medical Sciences, Mashhad, Iran

Hanieh Mohebbi

Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran

Fatemeh Karimzadeh

Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran

Alireza Aliabadi

Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran

Rozhin Habibi Gheshlagh

Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran

Alireza Banaye Yazdipour

Department of Health Information Technology, School of Paramedical and Rehabilitation Sciences, Mashhad University of Medical Sciences, Mashhad, Iran