Comparing the performance of xgboost, Gradient Boosting and GBLUP models under different genomic prediction scenarios

سال انتشار: 1403
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 151

فایل این مقاله در 7 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_KLST-12-1_004

تاریخ نمایه سازی: 27 خرداد 1403

چکیده مقاله:

AbstractThe aim of this study was to study the performance of xgboost algorithm in genomic evaluation of complex traits as an alternative for Gradient Boosting algorithm (GBM). To this end, genotypic matrices containing genotypic information for, respectively, ۵,۰۰۰ (S۱), ۱۰,۰۰۰ (S۲) and ۵۰,۰۰۰ (S۳) single nucleotide polymorphisms (SNP) for ۱۰۰۰ individuals was simulated. Beside xgboost and GBM, the GBLUP which is known as an efficient algorithm in terms of accuracy, computing time and memory requirement was also used to predict genomic breeding values. xgboost, GBM and GBLUP were run in R software using xgboost, gbm and synbreed packages. All the analyses were done using a machine equipped with a Core i۷-۶۸۰۰K CPU which had ۶ physical cores. In addition, ۳۲ gigabyte of memory was installed on the machine. The Person's correlation between predicted and true breeding values (rp,t) and the mean squared error (MSE) of prediction were computed to compare predictive performance of different methods. While GBLUP was the most efficient user of memory, GBM required a considerably high amount of memory to run. By increasing size of data from S۱ to S۳, GBM went out from the competition mainly due to its high demand for memory. Parallel computing with xgboost reduced running time by %۹۹ compared to GBM. The speedup ratios (the ratio of the GBM runtime to the time taken by the parallel computing by xgboost) were ۴۴۴ and ۵۵۴ for the S۱ and S۲ scenarios, respectively. In addition, parallelization efficiency (speed up ratio/number of cores) were, respectively, ۷۴ and ۹۲ for the S۱ and S۲ scenarios, indicating that by increasing the size of data, the efficiency of parallel computing increased. The xgboost was considerably faster than GBLUP in all the scenarios studied. Accuracy of genomic breeding values predicted by xgboost was similar to those predicted by GBM. While the accuracy of prediction in terms of rp,t was higher for GBLUP, the MSE of prediction was lower for xgboost, specially for larger datasets. Our results showed that xgboost could be an efficient alternative for GBM as it had the same accuracy of prediction, was extremely fast and needed significantly lower memory requirement to predict the genomic breeding values.

نویسندگان

Farhad Ghafouri-Kesbi

Department of Animal Science, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • ReferencesAbdollahi-Arpanahi, R., Pakdel, A., Nejati-Javaremi, A, Moradi Shahre Babak, M., ...
  • Auinger, H.S., Wimmer, V., Auinger, H.J., Albrecht, T., Schoen, C.C., ...
  • Carlborg, Ö., Andersson-Eklund, L., Andersson, L., ۲۰۰۱. Parallel computing in ...
  • Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., ...
  • Fernando, RL., Grossman, M., ۱۹۸۹. Marker-assisted selection using best linear ...
  • Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M., Nejati-Javaremi, A., ۲۰۱۷. Predictive ...
  • González-Recio, O., Rosa, GJM., Gianola, D., ۲۰۱۴. Machine learning methods ...
  • Greenwell, B., Bradley, B., Cunningham, J., ۲۰۱۹. gbm: Generalized Boosted ...
  • Hastie, T.J., Tibshirani, R., Friedman, J., ۲۰۰۹. The Elements of ...
  • Intel® Hyper-Threading Technology., ۲۰۰۳. Technical User’s Guide. Available at: http://www.cslab.ece.ntua.gr/courses/advcomparch/۲۰۰۷/material/readings/Intel%۲۰Hyper-Threading%۲۰Technology.pdfMa, ...
  • Kim, B., Kim, S., ۲۰۱۸. Prediction of inherited genomic susceptibility ...
  • Matukumalli, L.K., Schroeder, S., DeNise, S.K., ۲۰۱۱. Analyzing LD blocks ...
  • Matthews, D., Kearney, J.F., Cromie, AR., ۲۰۱۹. Genetic benefits of genomic ...
  • Meuwissen, T.H.E., Hayes, B.J., Goddard, M.E., ۲۰۰۱. Prediction of total ...
  • Neves, H.H.R., Carvalheiro, R., Queiroz, S.A., ۲۰۱۲. A comparison of ...
  • Ødegård, J., Indahl, U., Strandén, I., Meuwissen, T.H.E., ۲۰۱۸. Large‑scale ...
  • Oguto, J.O., Piepho, H.P., Schulz-Streeck, T., ۲۰۱۱. A comparison of ...
  • Orozco-Arias, S., Tabares-Soto, R., Ceballos, D., Guyot, R., ۲۰۱۷. Parallel ...
  • R Core Team., ۲۰۲۲. R: A language and environment for ...
  • Singh, P.P., Nagpal R., Pal, R., Nagamani, V., Rao, B.B.P., ...
  • Smith, C., ۱۹۶۷. Improvement of metric traits through specific genetic ...
  • Technow, F., ۲۰۱۳. hypred: Simulation of genomic data in applied ...
  • Thompson, K., Charnigo, R., ۲۰۱۵. Parallel Computing in Genome-Wide Association ...
  • VanRaden, PM., ۲۰۰۸. Efficient methods to compute genomic predictions. Journal ...
  • Wang, X., Xu, Y., Hu, Z., Xu, C., ۲۰۱۸. Genomic ...
  • Wickham, H., ۲۰۱۸. pryr: Useful tools to pry back the ...
  • Wu, XL., Sun, C., Beissinger, TM., Rosa, GJ., Weigel, KA., ...
  • Ying, X., ۲۰۱۹. An overview of overfitting and its solutions. ...
  • Zhang, H., Yin, L., Wang, M., ۲۰۱۹. Genomic selection for ...
  • نمایش کامل مراجع