CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Imputation of ungenotyped individuals based on genotyped relatives using Machine Learning Methodology

عنوان مقاله: Imputation of ungenotyped individuals based on genotyped relatives using Machine Learning Methodology
شناسه ملی مقاله: JR_JEPUSB-2-2_002
منتشر شده در در سال 1400
مشخصات نویسندگان مقاله:

Naeem Rastin Bojnord - Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran
Mehdi Aminafshar - a Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran
Mahmood Honarvar - Department of Animal Science, Shahr-e-Qods Branch, Islamic Azad University, Tehran, Iran
Nasser Emam Jomeh Kashan - Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran

خلاصه مقاله:
Machine learning methods have been used in genetic studies to build models capable of predicting missing genotypes for both human and animal genetic variations. Genotype imputation is an important process of predicting unknown genotypes. The objective of this study was to investigate the idea of using machine learning as imputation to compare the family-based methods and tried to offer improving the imputation performance in different scenarios. Also, the accuracies of different methods i.e. Support vector Machine; SVM, Random forest; RF are compared. The final population were simulated in the form of different family structures. Therefore, ۱۰۰ families including one sire with different number of genotyped progenies (۲, ۳, ۴, ۵ or ۷) were simulated. The number of markers was set to ۵۰۰۰ for whole genome. The sires in families and other scenarios such as, BothParents, sire/dam and one progeny, sire and maternal grandsire were defined to investigate the ability of learning machine algorithm for imputation. The imputation accuracy ranged from ۰.۷۸ to ۰.۹۹ in different scenarios. Also, least amount of imputation accuracy were achieved for sire and maternal grand sire scenario with both methods. Increasing in number of progenies from ۲ to ۳ was considerably increased in imputation accuracy (SVM and RF). The imputation of non-genotyped individuals based on parent-offspring trios and close relatives paired is possible. But, the use of child- one parent genotyped, BothParents genotyped and sire and maternal grandsire genotyped, average imputation accuracy would not exceed ۸۵%. While genotyped progenies are the best source of predicted genotypes for ungenotyped individuals and if the number of progeny is more than ۴, the imputation accuracy is increased more than ۹۵%. These results confirmed, that the performance of machine learning methods in family of trios has a good accuracy and computational speed, which can be used in estimated breeding value.

کلمات کلیدی:
Genomic, Imputation Accuracy, Machine Learning, Random Forest, Support vector machine, Ungenotype

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1477581/