Metagenome Assembly: Explanation, Challenges and Future Trends
محل انتشار: هفتمین همایش بیوانفورماتیک ایران
سال انتشار: 1396
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 361
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
این مقاله در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS07_154
تاریخ نمایه سازی: 29 فروردین 1397
چکیده مقاله:
For many years, retrieving genomic sequence of undiscovered species was a complicated task, mainly because sequencing of the DNA as a whole was not possible. Therefore, original sequences of the genome had to be assembled de novo from huge number of overlapping small reads from different copies of the genome. When working on a microbiome sample, many of the including species can’t be cultured in the laboratory and genomic fragments of all species must be read and assembled later without knowing the origin of each fragment. The set of all these reads is called a metagenome. The above-mentioned circumstances make metagenome assembly even harder than genome assembly. The initial assembly of both genomic and metagenomic data is based on graph algorithms, specially those using De Bruijn graphs. In this paper, we will introduce different stages of metagenome assembly, the algorithms and time complexity of each stage and the influence of each technique on the final result of the assembly. Various challenges are encountered in this process, such as detection and correction of sequencing errors, grouping reads of each genome, finding shared reads and repeated regions, resolving differences between strains and time/memory complexity of the algorithms to examine feasibility of running them on big data. We will list significant metagenome assembly tools and as an example will briefly introduce metaSPAdes [1] (an extended version of SPAdes assembler [2] for metagenomic data). Finally, we will mention new trends and promising approaches in sequencing and assembly of both genomes and metagenomes which can alleviate current difficulties and have revolutionary improvements in length and accuracy of assembled sequences.
کلیدواژه ها:
نویسندگان
S Momken
Department of Algorithms and Computation, University of Tehran, Tehran, Postal code ۱۴۱۷۴۶۶۱۹۱, Iran
K Kavousi
Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Postal code ۱۴۱۷۶۱۴۴۱۱, Iran
A Banaei-Moghaddam
Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Postal code ۱۴۱۷۶۱۴۴۱۱, Iran
D Moazzami
Department of Algorithms and Computation, University of Tehran, Tehran, Postal code ۱۴۱۷۴۶۶۱۹۱, Iran