Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
محل انتشار: مجله تحقیقات در علوم سلامت، دوره: 22، شماره: 3
سال انتشار: 1401
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 116
فایل این مقاله در 8 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_JRHSU-22-3_001
تاریخ نمایه سازی: 28 تیر 1402
چکیده مقاله:
Background: This study aims to show the impact of imbalanced data and the typical evaluation methods
in developing and misleading assessments of machine learning-based models for preoperative thyroid
nodules screening.
Study design: A retrospective study.
Methods: The ultrasonography features for ۴۳۱ thyroid nodules cases were extracted from medical records
of ۳۱۳ patients in Babol, Iran. Since thyroid nodules are commonly benign, the relevant data are usually
unbalanced in classes. It can lead to the bias of learning models toward the majority class. To solve it, a
hybrid resampling method called the Smote-was used to creating balance data. Following that, the support
vector classification (SVC) algorithm was trained by balance and unbalanced datasets as Models ۲ and ۳,
respectively, in Python language programming. Their performance was then compared with the logistic
regression model as Model ۱ that fitted traditionally.
Results: The prevalence of malignant nodules was obtained at ۱۴% (n = ۶۱). In addition, ۸۷% of the
patients in this study were women. However, there was no difference in the prevalence of malignancy for
gender. Furthermore, the accuracy, area under the curve, and geometric mean values were estimated at
۹۲.۱%, ۹۳.۲%, and ۷۶.۸% for Model ۱, ۹۱.۳%, ۹۳%, and ۷۷.۶% for Model ۲, and finally, ۹۱%, ۹۲.۶%
and ۸۴.۲% for Model ۳, respectively. Similarly, the results identified Micro calcification, Taller than wide
shape, as well as lack of ISO and hyperechogenicity features as the most effective malignant variables.
Conclusion: Paying attention to data challenges, such as data imbalances, and using proper criteria
measures can improve the performance of machine learning models for preoperative thyroid nodules
screening.
کلیدواژه ها:
نویسندگان
Sajad Khodabandelu
MSc, Student Research Committee, School of Medicine, Faculty of Health, Babol University of Medical Science, Babol, Iran
Naser Chaemian
PhD, Department of Radiology, Babol University of Medical Sciences, Babol, Iran
Soraya Khafri
PhD, Research Center for Social Determinants of Health, Health Research Institute, Department of Biostatistics and Epidemiology, Faculty of Health, Babol University of Medical Sciences, Babol, Iran
Mehdi Ezoji
PhD, Faculty of Electrical and Computer Engineering, Babol Noshirvani University of Technology, Babol, Iran