Introduction: Nowadays, diabetes has become one of the major health problems and one of themost important problems of the medical profession due to its prevalence in children and adults.On the other hand,
machine learning has been a developing, reliable and supportive technologyin the field of health, and one of the techniques of interest for analyzing interventions, diseasesand conditions of the health system is the use of data mining. In fact,
data mining is the processof selecting, exploring and modeling large amounts of data. Therefore, the aim of this study isto create a simple and reliable prediction model based on risk factors related to diabetes usingdecision tree algorithm.Methods: Data related to a diabetes screening program in Tehran were used in this study. Of ۳۳۷۶participants over ۳۰ years of age participated in this screening program in ۱۶ comprehensivehealth service centers. The prediction model was created using
decision tree algorithms includingC۵.۰, CART, CHAID, Quest and Random Forest along with the Boosting hybrid learning methodto increase the accuracy of the model. Randomly, ۷۰% of the data (۲۳۵۲ records) were used totrain the model and ۳۰% (۱۰۲۴ records) were used to evaluate the model’s performance. Riskfactors included gender, age, blood pressure, smoking, body mass index, and waist-to-hip ratio.The models were compared based on accuracy index and the best model was selected. Sensitivity,specificity, accuracy and AUC indexes were used to evaluate the prediction model.Findings: The prevalence rate of diabetes in the studied population was ۲۱%. The best predictionmodel was obtained using the Quest algorithm with an accuracy of ۸۰.۰۷% and an AUC of ۷۲.۴%for the test data. The most important risk factors predicting diabetes were age, blood pressure,waist-to-hip ratio, and body mass. Also, the results showed that ۸۸% of people who were less than۵۰ years old and ۸۱% of people over ۵۰ years of age whose blood pressure and waist-to-hip ratiowere normal were in a healthy state in terms of diabetes.Conclusion: In this study, a prediction model was created using
decision tree algorithm to identifythe most important risk factors related to diabetes. Age, blood pressure status and waist to hipratio were the most important risk factors for diabetes. This model can be used in the planning fordiabetes management.