Prediction of diabetes using machine learning and data mining algorithms

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 85

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

AIMS01_150

تاریخ نمایه سازی: 1 مرداد 1402

چکیده مقاله:

Introduction: Nowadays, diabetes has become one of the major health problems and one of themost important problems of the medical profession due to its prevalence in children and adults.On the other hand, machine learning has been a developing, reliable and supportive technologyin the field of health, and one of the techniques of interest for analyzing interventions, diseasesand conditions of the health system is the use of data mining. In fact, data mining is the processof selecting, exploring and modeling large amounts of data. Therefore, the aim of this study isto create a simple and reliable prediction model based on risk factors related to diabetes usingdecision tree algorithm.Methods: Data related to a diabetes screening program in Tehran were used in this study. Of ۳۳۷۶participants over ۳۰ years of age participated in this screening program in ۱۶ comprehensivehealth service centers. The prediction model was created using decision tree algorithms includingC۵.۰, CART, CHAID, Quest and Random Forest along with the Boosting hybrid learning methodto increase the accuracy of the model. Randomly, ۷۰% of the data (۲۳۵۲ records) were used totrain the model and ۳۰% (۱۰۲۴ records) were used to evaluate the model’s performance. Riskfactors included gender, age, blood pressure, smoking, body mass index, and waist-to-hip ratio.The models were compared based on accuracy index and the best model was selected. Sensitivity,specificity, accuracy and AUC indexes were used to evaluate the prediction model.Findings: The prevalence rate of diabetes in the studied population was ۲۱%. The best predictionmodel was obtained using the Quest algorithm with an accuracy of ۸۰.۰۷% and an AUC of ۷۲.۴%for the test data. The most important risk factors predicting diabetes were age, blood pressure,waist-to-hip ratio, and body mass. Also, the results showed that ۸۸% of people who were less than۵۰ years old and ۸۱% of people over ۵۰ years of age whose blood pressure and waist-to-hip ratiowere normal were in a healthy state in terms of diabetes.Conclusion: In this study, a prediction model was created using decision tree algorithm to identifythe most important risk factors related to diabetes. Age, blood pressure status and waist to hipratio were the most important risk factors for diabetes. This model can be used in the planning fordiabetes management.

نویسندگان

Hassan Shojaee Mend

۱Hassan Shojaee-Mend (PhD); Infectious Diseases Research Center, Gonabad University of Medical Sciences, Gonabad, Iran

Farnia Velayati

Farnia Velayati (PhD); Telemedicine Research Center, National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran

Ebrahim Babee

Ph.D, Research Assistant, Preventive Medicine and Public Health Research Center, Psychosocial Health Research Institute, Department of Community and Family Medicine, School of Medicine, Iran University of Medical Sciences, Tehran, Iran