Similarity detection between modern human genome and their ancestors DNA sequences by Deep Learning

سال انتشار: 1400
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 195

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS10_043

تاریخ نمایه سازی: 5 تیر 1401

چکیده مقاله:

Neanderthals were a species of human that lived in Europe and parts of western Asia, Central Asia, andnorthern China (Altai). The first signs of early Neanderthals date back to about ۳۵۰,۰۰۰ years ago in Europe.There is ample genetic evidence that modern humans had sex with Neanderthals, Denisovans, and otherancient relatives.In this study, we used in-depth learning to identify areas of Neanderthal intrusion in the modern humangenome. Recent methods, such as the Markov latent model (HMM) to find the Neanderthal effect on thegenome, are a memoryless model that does not consider the relationship between nucleotide distances alongDNA sequences. Therefore, we used deep learning power to process crude genomic sequences and nucleotidelong-term memory in genomes with short-term long-term memory (LSTM).This model works better than linear models such as support vector machines (SVMs) or simple Bayesianclassifiers, so we recommend the LSTM method for analyzing ancient biological data.We first converted DNA sequences into k-mers with limited space. We then used the Bag Of Words modelto compare k-mers frequencies between sequences inherited from Neanderthals and sequences from weakancient ancestors. Finally, when classifying sequences, we learned Word Embeddings with a sequentialmodel with the Keras Embeddings layer. The model achieved an accuracy of ۸۷.۶% in the data set thatclassifies the input Neanderthal sequences against the discharged source.It should be noted that for the near future, our vision is to find similarities between modern humans and theirancestors in the genomic data of skin patients using the LSTM model.

نویسندگان

Keivan Naseri

Department of Bioinformatics, Kish International Campus University of Tehran, Kish, Iran

Mahboobeh Golchinpour

Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran