Improving Persian Dependency-Based Parser Using Deep Learning

سال انتشار: 1401
محل انتشار: مجله مهندسی کامپیوتر و دانش، دوره: 5، شماره: 1
کد COI اختصاصی: JR_CKE-5-1_002
زبان مقاله: انگلیسی
تعداد مشاهده: 222

دانلود فایل این مقاله

نویسندگان

soghra lazemi

Department of Computer Eng., Faculty of Electrical & Computer Eng., The University of Kashan, Kashan, Iran.

hossein Ebrahimpour-komleh

Department of Computer Eng., Faculty of Electrical & Computer Eng., University of Kashan, Kashan, Iran.

nasser Noroozi

Department of Computer Eng., Faculty of Electrical & Computer Eng., University of Kashan, Kashan, Iran.

چکیده

One of the most important problems in computational linguistics is the grammar and, consequently, syntactic structures and structural parsing. The structural parser tries to analyze the relationships between words and to extract the syntactic structure of the sentence. The dependency-based structural parser is proper for free-word-order and morphologically-rich languages such as Persian. The data-driven dependency parser performs the categorization process based on a wide range of features, which, in addition to the problems such as sparsity and curse of dimensionality, it requires the correct selection of the features and proper setting of the parameters. The aim of this study is to obtain high performance with minimal feature engineering for dependency parsing of Persian sentences. In order to achieve this goal, the required features of the Maximum Spanning Tree Parser (MSTParser) are extracted with a Bidirectional Long Short-Term Memory (Bi-LSTM) Network and the edges of the dependency graph is scored by that. Experiments are conducted on the Persian Dependency Treebank (PerDT) and the Uppsala Persian Dependency Treebank (UPDT). The obtained results indicate that the definition of new features improves the performance of the dependency parser for Persian. The achieved unlabeled attachment scores for PerDT and UPDT are ۹۰.۵۳% and ۸۷.۰۲%, respectively.

کلیدواژه ها

Dependency Parser, MSTParser, Phrase-structure Tree, deep Learning, Persian

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.