Enhancing Precision for Background Mutation Rate Estimation in Cancer Genomes
سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 139
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS12_196
تاریخ نمایه سازی: 12 آبان 1403
چکیده مقاله:
Accurate estimation of the background mutation rate (BMR) is crucial for identifying cancerdrivers, significantly impacts the precision of prioritizing both coding and non-coding cancer drivers.This study investigates the influence of neighboring regions on BMR prediction by applying a noveltransformer-based neural network using self-attention mechanisms. Additionally, it systematicallycompares various machine learning algorithms, investigates the impact of the number of genomicelements in the training set, and explores different bin generation methods for BMR prediction.The analysis was performed on ۲۲۵۳ whole cancer genomes of ۳۳ cancer types from non-melanomalymphomadonors. Somatic mutations were used from the PCAWG project, with the TCGA portion ofdata obtained from the Genotypes and Phenotypes (dbGap) database (project #۳۲۶۰۷). A total of ۹۷۱۲۹۰variable-size intergenic genomic coordinates were derived by removing functional genomic elementsbased on PCAWG definition and retaining callable genomic regions. A set of ۱۳۷۲ genomic context andepigenomic features affecting mutation rate were extracted for each genomic region. This studycompared a transformer-based model to XGBoost, random forest, classic neural network, and binomialgeneralized linear model. Model performance was compared using Spearman correlation and meansquared error.We comprehensively analyzed the performance of various models for BMR estimation, examining theeffects of binning strategies, sample sizes, and model types on the accuracy of predictions. The resultsof our analysis indicate that the transformer model outperformed traditional deep neural networks andthe binomial generalized linear model. The performance of the transformer model was similar to theXGBoost method.
کلیدواژه ها:
نویسندگان
Farideh Bahari
Department of Nanobiotechnology, New Technologies Research Group, Pasteur Institute of Iran, Tehran, Iran
Reza Ahangari Cohan
Department of Nanobiotechnology, New Technologies Research Group, Pasteur Institute of Iran, Tehran, Iran
Hesam Montazeri
Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Iran