Applying various strategies in machine learning models to predict Type II diabetes by using the risk factors

  • سال انتشار: 1402
  • محل انتشار: دوازدهمین همایش ملی و سومین همایش بین المللی بیوانفورماتیک
  • کد COI اختصاصی: IBIS12_194
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 185
دانلود فایل این مقاله

نویسندگان

Fatemeh Mansoori

Department of Applied Mathematics and Statistics, University of Isfahan, Isfahan, Iran

Nima Chelongar

Department of Applied Mathematics and Statistics, University of Isfahan, Isfahan, Iran

Ali Zarrinkhat

Department of Applied Mathematics and Statistics, University of Isfahan, Isfahan, Iran

چکیده

Diabetes is one of the most common chronic diseases in the world including in Iran. Theprevalence of type II diabetes in Iran in the age range of ۲۵ to ۶۴ is ۷.۷ percent. Despite the increasingprevalence of the disease, its treatment and effective control are suboptimal. There appear to besignificant gaps in disease detection and treatment. Early detection can lead to lifestyle modificationand more effective treatment. Hence, this highlights the significance of developing tools for predictingdiabetes using common health risk factors.The Behavioral Risk Factor Surveillance System is an annual survey that gathers healthcare informationfrom Americans regarding behaviors that pose health risks. We used the datasets of ۲۵۳,۶۸۰ surveyresponses to the CDC's BRFSS۲۰۱۵. The dataset has ۲۱ feature variables and it is not balanced. Thefeatures include high blood pressure, high cholesterol, Body Mass Index, and so on. The target featureDiabetes has ۲ classes the diabetes class, and no diabetes class. We explore to answer the followingresearch question, whether survey questions from the BRFSS provide accurate predictions of whetheran individual has diabetes.Due to the imbalance in the dataset, we employed data balancing techniques such as ADASYN andSMOTE to address the issue. Subsequently, we built multiple machine learning models to predict typeII diabetes using potential health risk factors.Out of all predictive models, XGBOOST demonstrates the highest accuracy ۸۱% but has a ۶۱% recallfor diabetics. On the other hand, Random Forest has an accuracy of ۷۳% and a ۷۲% recall for diabetics.Consequently, the Random Forest model is preferred for initial screening for type II diabetes, becauseit has the highest sensitivity and therefore, detection rate. This finding can be used for the early detectionof individuals with diabetes based on their general health information.

کلیدواژه ها

Type ۲ Diabetes; Matching Learning; Healthcare

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.