Breast Cancer Diagnosis by Machine Learning Techniques: A Risk Prediction Model Study

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 140

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

AIMS01_193

تاریخ نمایه سازی: 1 مرداد 1402

چکیده مقاله:

Background and aims: Breast cancer (BC) is the most common cancer globally. Various studieshave been conducted to examine the risk factors for BC. The present study aimed to develop abreast cancer risk prediction model using different machine learning techniques, develop the bestmethod selection to identify the factors that account for the incidence of breast cancer, and identifythe relationship between those factors.Method: A total population sample of ۸۱۰ healthy subjects and patients with BC were investigatedusing thirty-two factors (e.g. pathology [T, N, M], genes [connexin ۳۷ Rs۱۷۶۴۳۹۱]).Accuracy, precision, and reproducibility of various machine learning algorithms were measured.Classification Algorithms in Machine Learning such as Naïve Bayes, K-Nearest Neighbors, DecisionTree, and Rando Forest were used in this study.Results: Among the methods and classifications used, the random forest algorithm had the greatestaccuracy, precision, reproducibility, and the AUC of ۹۹.۳%, ۹۹.۱%, ۹۵.۷%, and ۹۹,۴%, respectively.The results of assessing the impact and relationship of variables using the RF methodbased on PCA indicated that pathology, biochemistry, gene, and demographic factors with acoefficient of ۰.۳۵, ۰.۲۳, ۰.۱۵, ۰.۱۳, ۰.۰۸, ۰.۰۶, respectively, affected the risk of BC (r۲=۰.۵۴).Pathological features, genetic factors, and ER, Ki۶۷, CEA, CA۱۵۳, stage, rs۱۷۶۴۳۹۱gene, andP۵۳, respectively, were found to be the most important factors for BC risk. Furthermore, it wasfound that the stage, T, and rs۱۷۶۴۳۹۱ with a coefficient of ۰.۱۳, ۰.۱۲, and ۰.۰۹, had the highestcoefficients (r۲=۰.۷۷).Conclusion: Considering the interaction and importance of these factors we found that the RandomForests technique may be useful as an approach for developing a risk prediction model forBC in comparison with the other methods investigated.

نویسندگان

Arian Karimi Rouzba hani

Student Research committee, Lorestan University of Medical Sciences, Khorramabad, Iran- USERN Office, Lorestan University of Medical Sciences, Khorramabad, Iran

Elham Nazari

Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran

Samaneh Tehmasebi Ghorabi

Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran- Research Expert, Clinical Research Development Unit, Emam Khomeini Hospital, Ilam University of Medical Sciences, Ilam, Iran

Amir Avan

Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran