Functional annotation of Missense Mutations Based on Protein Features

  • سال انتشار: 1400
  • محل انتشار: اولین همایش بین المللی و دهمین همایش ملی بیوانفورماتیک ایران
  • کد COI اختصاصی: IBIS10_037
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 221
دانلود فایل این مقاله

نویسندگان

Motahareh Hakiminejad

Department of Biophysics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran

Hesam Montazeri

Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran

Bahram Goliaei

Department of Biophysics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran

چکیده

Background: Among all genetic alterations in cancers, Single Nucleotide Variant (SNV) are the mostcommon mutation. Still, identifying cancer-driving SNV (driver mutation) among the plenty of non-drivingones (passenger mutations) remains a challenge due to the innate bias in their population and the fact thatmost driver mutations are rare. We present a random forest tool that can annotate driver/passenger missensemutations based on protein information and helps find novel driving mutations.Materials and Methods: Pan-cancer mutation data from the TCGA database (n = ۶۰۰k) fetched and labeledpassenger/driver (based on their prevalence). According to the mutations, a feature-set containing fivecategories was built: ۱) Physio-chemical changes of the changed amino acid, ۲) Changes in Pseudo-aminoacid composition of the ۲۱-mer sequence around the point of mutation ۳) disorderness of the mutation region,۴) site of mutation reported region/functions in uniport ۵) whether the gene is reported to be Oncogene/Tumor Suppressor or none. A random forest model was trained on the feature set by the ranger package inR.Results: The accuracy of the method on test data is ۹۹% (sensitivity = ۹۹%, specificity = ۵۴%). The methodwas evaluated against other cancer missense annotations such as CHASMplus, CHASM, Mutation Assessor,Polyphen۲, and VEST on experimentally-labeled cancer missense mutations. The receiver operatingcharacteristic curve (auROC) of methods were ۸۸%, ۶۷%, ۶۷%, ۵۹%, ۷۲%, respectively, and our methodauROC was ۸۳%. Also, it was tested against cancer SNV Golden standard based on extensive literature anddatabase review, in which the accuracy was reported to be ۷۲%, (sensitivity = ۷۴%, specificity = ۷۱%)Conclusion: We developed a random forest method that discriminates drivers from passenger missensemutations. As the method is solely based on protein descriptors, it can give insight into the mutation modeof action.

کلیدواژه ها

Protein structure/function, cancer-type-specific driver, missense mutation, rare drivers

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.