Efficient classification of single cell ATAC sequence data using machine learning

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 74

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_041

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

Based on recent advances in next-generation sequencing technology and microfluidics, weare now able to study the structure of DNA of each cell type. Transposase-accessible chromatinsequencing (ATAC-seq) method [۱], which uses Tn۵ transposase to determine chromatic accessibilityacross the genome, provides more detailed information about chromatin packaging and also themolecular mechanism behind gene expression regulation in different tissues and phenotypes. On theother hand, Single-cell omics technologies gifted us the opportunity to analyze thousands of cellssimultaneously and distinguish between different cell types. One of the challenges in the single cellanalysis including ATAC-seq data is cell type annotation. To do this step one approach is usingclassification algorithms [۲]. In this paper, we have investigated the performance of ۶ well-establishedmachine learning methods for the classification of single scATAC-seq data. The main performancecriteria in this research were computational complexity and computational resource consumption.Hence, a new classification method based on Extremely Randomized Trees (ERT) [۳] was proposedwhich had faster performance while keeping the classification accuracy constant. For evaluation, theperformance of four public scATAC-seq datasets which were from different studies from variousorganisms and tissues was used to evaluate the performance of the methods. The results showed thatthese methods performed well in some specific cell types in a particular scATAC-seq dataset. In bothintra-dataset and inter-dataset tests, while support vector machine (SVM) and nearest mean classifier(NMC) showed overall better performance than other methods in all ۴ datasets; ERT method was ableto perform the classification operation in significantly less time with almost the same accuracy.

نویسندگان

H Haririmonfared

Department of Computer Engineering, Khatam University, Tehran, Iran

N Elmi

Institute of Biochemistry and Biophysics (IBB), Department of Bioinformatics, Laboratory of Complex Biological Systems and Bioinformatics (CBB), University of Tehran, Tehran, Iran

K Kavousi

Institute of Biochemistry and Biophysics (IBB), Department of Bioinformatics, Laboratory of Complex Biological Systems and Bioinformatics (CBB), University of Tehran, Tehran, Iran

B Majidi

Department of Computer Engineering, Khatam University, Tehran, Iran