Accurate and Interpretable Breast Cancer Diagnosis Using Logistic Regression: An Evaluation on the Wisconsin Diagnostic Dataset
محل انتشار: InfoScience Trends، دوره: 2، شماره: 9
سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 23
فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_ISJTREND-2-9_004
تاریخ نمایه سازی: 9 آذر 1404
چکیده مقاله:
Breast cancer remains one of the most prevalent and life-threatening diseases globally, necessitating the development of accurate, stable, and interpretable diagnostic tools. This study aims to evaluate the performance and stability of the classic Logistic Regression model on the standard Wisconsin Diagnostic Breast Cancer (WDBC) dataset, with emphasis on clinically important metrics such as sensitivity and specificity. This research was conducted on the public WDBC dataset containing ۵۶۹ samples with ۳۰ cytomorphological features. The Logistic Regression model was implemented within a robust evaluation framework comprising ۵۰ independent runs with different random splits of the data into training and testing sets. Data preprocessing included feature standardization (zero mean and unit variance), but any feature selection or dimensionality reduction was avoided to preserve model interpretability. Model performance was assessed using accuracy, sensitivity (also known as recall), specificity, and the Area Under the Receiver Operating Characteristic Curve (AUROC). The Logistic Regression model achieved a mean accuracy of ۹۳.۸%. More importantly, the model demonstrated a sensitivity of ۹۷.۲% in identifying malignant cases and a specificity of ۹۱.۲% in identifying benign cases. The ROC curve analysis yielded an AUROC of ۰.۹۴۵۲, indicating excellent discriminative power. Confusion matrix analysis revealed that the model made only ۶ errors out of ۲۱۲ malignant cases (false negatives) and ۲۹ errors out of ۳۵۷ benign cases (false positives). This error profile indicates the model's conservative behavior, prioritizing high sensitivity. Furthermore, the results showed remarkable stability across all ۵۰ runs (standard deviation less than ۱%). The findings demonstrate that Logistic Regression, when implemented with a rigorous evaluation framework and focus on clinical thresholds, can serve as a sensitive, stable, and interpretable baseline model for breast cancer diagnosis. The strong and consistent performance of this classical model makes it a suitable candidate for clinical decision support, particularly in screening environments where minimizing false negatives is paramount.
کلیدواژه ها:
نویسندگان
Mohammadmahdi Eftekharian
Department of Radiology, Hamadan University of Medical Sciences, Hamadan, Iran.
Mohammadhassan Hosseiny
Student Research Committee, Babol University of Medical Sciences, Babol, Iran.
Nastaran Motallebi
Student Research Committee, Babol University of Medical Sciences, Babol, Iran.
Saeid Norouzkhani Esterabadi
Student Research Committee, Babol University of Medical Sciences, Babol, Iran.
مراجع و منابع این مقاله:
لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :