A Novel Mechanistic Simulation Model for Single-Cell DNA Sequencing
سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 96
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS12_148
تاریخ نمایه سازی: 12 آبان 1403
چکیده مقاله:
Single-cell DNA sequencing technology amplifies tiny DNA quantities through two primarymethods: PCR-based and polymerase-based, with the latter exhibiting lower error rates, particularlyadvantageous for detecting single-nucleotide variations. The second category employs MultipleDisplacement Amplification (MDA) [۱] and Primary Template-Directed Amplification (PTA) [۲]. MDAproduces longer amplicons compared to PTA, resulting in a higher amplification imbalance. Simulatingthe amplification process is crucial for evaluating variant callers, yet there is a noticeable absence ofmechanistic simulation tools for MDA and PTA. This study introduces a new simulation tool tailoredfor both techniques.In the initial step, single-cell genomes are simulated, encompassing SNPs, SNVs, and CNVs. Thesimulated single-cell genome comprises maternal and paternal strands. Each amplicon is defined byparameters such as start position, end position, direction, and release status. The algorithm generatesnew amplicons through hexamer attachment and extension, introducing amplification errors. Additionalparameters include hexamer attachment rate, maximum length, and the probability of amplificationerror. The simulation incorporates allelic imbalance, with biased selection of maternal or paternalregions as cycles progress. Subsequently, the produced amplicons FASTA file is used to generate thesingle-cell reads as FASTQ file. Real single-cell datasets of PTA and MDA were analyzed for simulationevaluation. Germline and somatic variant distributions were determined using the HaplotypeCaller andMutect, respectively. We offer a computationally efficient implementation of this simulation model inPython.The dataset consists of ۱۰ PTA and ۱۰ MDA single-cell DNAseq records was obtained from the SRAdatabase under the accession code SRP۱۷۸۸۹۴. Analysis reveals distinct distribution patterns betweenPTA and MDA. The simulator's effectiveness was evaluated using the real dataset, revealing nonuniformcoverage in alignment with actual single-cell datasets and Variants Allele Frequency (VAF)distribution. It facilitates the comparative assessment of various SNV callers, such as ProSolo [۳] andSCAN-SNV [۴].
کلیدواژه ها:
نویسندگان
Sajedeh Bahonar
Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Laura Tomas
Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
Joao Miguel Alves
Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
David Posada
Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
Hesam Montazeri
Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran