Efficient classification of single cell ATAC sequence data using machine learning
سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 74
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
این مقاله در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS12_041
تاریخ نمایه سازی: 12 آبان 1403
چکیده مقاله:
Based on recent advances in next-generation sequencing technology and microfluidics, weare now able to study the structure of DNA of each cell type. Transposase-accessible chromatinsequencing (ATAC-seq) method [۱], which uses Tn۵ transposase to determine chromatic accessibilityacross the genome, provides more detailed information about chromatin packaging and also themolecular mechanism behind gene expression regulation in different tissues and phenotypes. On theother hand, Single-cell omics technologies gifted us the opportunity to analyze thousands of cellssimultaneously and distinguish between different cell types. One of the challenges in the single cellanalysis including ATAC-seq data is cell type annotation. To do this step one approach is usingclassification algorithms [۲]. In this paper, we have investigated the performance of ۶ well-establishedmachine learning methods for the classification of single scATAC-seq data. The main performancecriteria in this research were computational complexity and computational resource consumption.Hence, a new classification method based on Extremely Randomized Trees (ERT) [۳] was proposedwhich had faster performance while keeping the classification accuracy constant. For evaluation, theperformance of four public scATAC-seq datasets which were from different studies from variousorganisms and tissues was used to evaluate the performance of the methods. The results showed thatthese methods performed well in some specific cell types in a particular scATAC-seq dataset. In bothintra-dataset and inter-dataset tests, while support vector machine (SVM) and nearest mean classifier(NMC) showed overall better performance than other methods in all ۴ datasets; ERT method was ableto perform the classification operation in significantly less time with almost the same accuracy.
کلیدواژه ها:
نویسندگان
H Haririmonfared
Department of Computer Engineering, Khatam University, Tehran, Iran
N Elmi
Institute of Biochemistry and Biophysics (IBB), Department of Bioinformatics, Laboratory of Complex Biological Systems and Bioinformatics (CBB), University of Tehran, Tehran, Iran
K Kavousi
Institute of Biochemistry and Biophysics (IBB), Department of Bioinformatics, Laboratory of Complex Biological Systems and Bioinformatics (CBB), University of Tehran, Tehran, Iran
B Majidi
Department of Computer Engineering, Khatam University, Tehran, Iran