A Novel Mechanistic Simulation Model for Single-Cell DNA Sequencing

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 96

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_148

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

Single-cell DNA sequencing technology amplifies tiny DNA quantities through two primarymethods: PCR-based and polymerase-based, with the latter exhibiting lower error rates, particularlyadvantageous for detecting single-nucleotide variations. The second category employs MultipleDisplacement Amplification (MDA) [۱] and Primary Template-Directed Amplification (PTA) [۲]. MDAproduces longer amplicons compared to PTA, resulting in a higher amplification imbalance. Simulatingthe amplification process is crucial for evaluating variant callers, yet there is a noticeable absence ofmechanistic simulation tools for MDA and PTA. This study introduces a new simulation tool tailoredfor both techniques.In the initial step, single-cell genomes are simulated, encompassing SNPs, SNVs, and CNVs. Thesimulated single-cell genome comprises maternal and paternal strands. Each amplicon is defined byparameters such as start position, end position, direction, and release status. The algorithm generatesnew amplicons through hexamer attachment and extension, introducing amplification errors. Additionalparameters include hexamer attachment rate, maximum length, and the probability of amplificationerror. The simulation incorporates allelic imbalance, with biased selection of maternal or paternalregions as cycles progress. Subsequently, the produced amplicons FASTA file is used to generate thesingle-cell reads as FASTQ file. Real single-cell datasets of PTA and MDA were analyzed for simulationevaluation. Germline and somatic variant distributions were determined using the HaplotypeCaller andMutect, respectively. We offer a computationally efficient implementation of this simulation model inPython.The dataset consists of ۱۰ PTA and ۱۰ MDA single-cell DNAseq records was obtained from the SRAdatabase under the accession code SRP۱۷۸۸۹۴. Analysis reveals distinct distribution patterns betweenPTA and MDA. The simulator's effectiveness was evaluated using the real dataset, revealing nonuniformcoverage in alignment with actual single-cell datasets and Variants Allele Frequency (VAF)distribution. It facilitates the comparative assessment of various SNV callers, such as ProSolo [۳] andSCAN-SNV [۴].

کلیدواژه ها:

نویسندگان

Sajedeh Bahonar

Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

Laura Tomas

Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain

Joao Miguel Alves

Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain

David Posada

Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain

Hesam Montazeri

Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran