Learning state machines on protein sequences

سال انتشار: 1401
محل انتشار: یازدهمین همایش ملی و دومین همایش بین المللی بیوانفورماتیک ایران
کد COI اختصاصی: IBIS11_048
زبان مقاله: انگلیسی
تعداد مشاهده: 145

نویسندگان

University of tehran kish international campus.

Tehran university

چکیده

Defining signatures of known families of biologically related protein sequences (at the functional or structural level) is of great significance and may help identify conserved regions among the family of proteins, revealing the importance of the function of their structural properties. Some scholars argued for the benefit of viewing the sequences as sentences derived from formal grammar. This allows us not only to overcome the positionspecific characterization of the sequences but also to benefit from the explicit modeling provided by grammar. The use of sequencing through grammar gives a prediction on whether a sequence belongs to a particular family and provides information about the reason why a sequence belongs to a particular family. Automata can be learned successfully on proteins and protein sequences. In this study, based on grammatical inference and multiple alignment techniques, a sequence-driven approach is used to learn automata on protein sequences. The approach is inspired by grammatical inference and multiple alignment techniques . The study focuses on fragment similarity to identify locally conserved regions and then improves the characterization by identifying informative positions. More attempts are required to raise prediction accuracy by developing distances taking into account the weights of the amino acids at each position with respect to the training sequences. The study attempts to identify di↵erences and synergies between various approaches (such as learning syntactical models) converging from pattern discovery, multiple alignment, and grammatical inference to learning explicit models on proteins

کلیدواژه ها

finite-state machine (FSM), finite-state automaton (FSA), protein sequences, learning syntactical models

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.