Metagenome Assembly: Explanation, Challenges and Future Trends

سال انتشار: 1396
محل انتشار: هفتمین همایش بیوانفورماتیک ایران
کد COI اختصاصی: IBIS07_154
زبان مقاله: انگلیسی
تعداد مشاهده: 460

دانلود فایل این مقاله

نویسندگان

S Momken

Department of Algorithms and Computation, University of Tehran, Tehran, Postal code ۱۴۱۷۴۶۶۱۹۱, Iran

K Kavousi

Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Postal code ۱۴۱۷۶۱۴۴۱۱, Iran

A Banaei-Moghaddam

Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Postal code ۱۴۱۷۶۱۴۴۱۱, Iran

D Moazzami

Department of Algorithms and Computation, University of Tehran, Tehran, Postal code ۱۴۱۷۴۶۶۱۹۱, Iran

چکیده

For many years, retrieving genomic sequence of undiscovered species was a complicated task, mainly because sequencing of the DNA as a whole was not possible. Therefore, original sequences of the genome had to be assembled de novo from huge number of overlapping small reads from different copies of the genome. When working on a microbiome sample, many of the including species can’t be cultured in the laboratory and genomic fragments of all species must be read and assembled later without knowing the origin of each fragment. The set of all these reads is called a metagenome. The above-mentioned circumstances make metagenome assembly even harder than genome assembly. The initial assembly of both genomic and metagenomic data is based on graph algorithms, specially those using De Bruijn graphs. In this paper, we will introduce different stages of metagenome assembly, the algorithms and time complexity of each stage and the influence of each technique on the final result of the assembly. Various challenges are encountered in this process, such as detection and correction of sequencing errors, grouping reads of each genome, finding shared reads and repeated regions, resolving differences between strains and time/memory complexity of the algorithms to examine feasibility of running them on big data. We will list significant metagenome assembly tools and as an example will briefly introduce metaSPAdes [1] (an extended version of SPAdes assembler [2] for metagenomic data). Finally, we will mention new trends and promising approaches in sequencing and assembly of both genomes and metagenomes which can alleviate current difficulties and have revolutionary improvements in length and accuracy of assembled sequences.

کلیدواژه ها

Metagenome; Assembly; Graph; Machine Learning; Algorithms

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.