An Element-Specific Mutational Burden Testfor Prioritizing Cancer Genomic Elements
محل انتشار: اولین کنگره بین المللی ژنومیک سرطان
سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 318
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CGC01_029
تاریخ نمایه سازی: 29 آبان 1402
چکیده مقاله:
Background: The mutational burden tests are one of the primarymethods for identifying cancer drivers. These tests compareobserved mutation rates to background mutation rates (BMR)to identify positively-selected genomic elements. However,predicting BMR can be challenging due to heterogeneity inmutation rates. This study investigates the impact of accountingfor genomic element types, namely protein coding sequence(CDS), splice site, ۵’-UTR, ۳’-UTR, enhancer, and core promoter,on BMR estimates. Specifically, we estimated the BMRof each element with a specific element type using other elementswith the same type. We used element-specific RandomForest (RF) and Generalized Linear Mixed Model (GLMM) forBMR modeling and compared the results with intergenic BMRmodeling using a binomial generalized linear model and gradientboosting machine.Materials and Methods: We used the mutation annotation filesof ۲,۲۵۳ non-hypermutated donors across ۳۳ cancer types fromthe PCAWG project and used ۱,۳۷۳ features from the Driver-Power method. These features range from nucleotide-level tomegabase scale features, including nucleotide content, conservationscores, RNA expression, histone modifications and DNAaccessibility, replication timing, and chromatin conformation.We employed the element-specific RF and GLMM model, withthe genomic element type as a random effect in the GLMM,to identify cancer driver elements. In the inference step, weused the negative binomial distribution to account for the observedoverdispersion in the mutation rates. To compare theperformance of our approach with GLM and GBM models thatuse intergenic regions for BMR estimation, we defined a goldstandard set. All of the protein-coding genes that are reported inthe OncoKB considered as true hits. Due to the scarcity of thehand-curated database for non-coding driver elements, if a genewas reported in the oncoKB, we considered its cis-regulatoryelements as true hits. For the enhancer elements, we obtainedgene-enhancer maps from the PCAWG project and included all enhancers whose target genes were reported in the OncoKB inour gold standard set.Results: We applied these machine learning models to thePCAWG data and compared the performance of the resultsin terms of AUC and AUPR. The element-specific RF modeloutperformed intergenic GBM, element-specific GLMM, andintergenic RF models for CDSs. In the non-coding regions,both element-specific models, RF and GLMM, provided bettermutation rate estimates compared to intergenic models, GBMand RF. By considering element types for training, we obtainedmore accurate BMR estimates, which helped better prioritizedriver elements, particularly in non-coding elements.
کلیدواژه ها:
نویسندگان
Farideh BAHARI
Department of Nanobiotechnology, New Technologies ResearchGroup, Pasteur Institute of Iran, Tehran, Iran . Department of Bioinformatics, Institute of Biochemistry and Biophysics,University of Tehran, Iran
reza ahangari cohan
Department of Nanobiotechnology, New Technologies ResearchGroup, Pasteur Institute of Iran, Tehran, Iran