Applying various strategies in machine learning models to predict Type II diabetes by using the risk factors

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 178

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_194

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

Diabetes is one of the most common chronic diseases in the world including in Iran. Theprevalence of type II diabetes in Iran in the age range of ۲۵ to ۶۴ is ۷.۷ percent. Despite the increasingprevalence of the disease, its treatment and effective control are suboptimal. There appear to besignificant gaps in disease detection and treatment. Early detection can lead to lifestyle modificationand more effective treatment. Hence, this highlights the significance of developing tools for predictingdiabetes using common health risk factors.The Behavioral Risk Factor Surveillance System is an annual survey that gathers healthcare informationfrom Americans regarding behaviors that pose health risks. We used the datasets of ۲۵۳,۶۸۰ surveyresponses to the CDC's BRFSS۲۰۱۵. The dataset has ۲۱ feature variables and it is not balanced. Thefeatures include high blood pressure, high cholesterol, Body Mass Index, and so on. The target featureDiabetes has ۲ classes the diabetes class, and no diabetes class. We explore to answer the followingresearch question, whether survey questions from the BRFSS provide accurate predictions of whetheran individual has diabetes.Due to the imbalance in the dataset, we employed data balancing techniques such as ADASYN andSMOTE to address the issue. Subsequently, we built multiple machine learning models to predict typeII diabetes using potential health risk factors.Out of all predictive models, XGBOOST demonstrates the highest accuracy ۸۱% but has a ۶۱% recallfor diabetics. On the other hand, Random Forest has an accuracy of ۷۳% and a ۷۲% recall for diabetics.Consequently, the Random Forest model is preferred for initial screening for type II diabetes, becauseit has the highest sensitivity and therefore, detection rate. This finding can be used for the early detectionof individuals with diabetes based on their general health information.

کلیدواژه ها:

نویسندگان

Fatemeh Mansoori

Department of Applied Mathematics and Statistics, University of Isfahan, Isfahan, Iran

Nima Chelongar

Department of Applied Mathematics and Statistics, University of Isfahan, Isfahan, Iran

Ali Zarrinkhat

Department of Applied Mathematics and Statistics, University of Isfahan, Isfahan, Iran