Corpus-Based Analysis for Multi-Token Units in Persian

Because of the joining behavior of Persian script and its orthographic variation, the morphological and syntactic annotations of multi-token units meet various issues. By the analysis of Perso-Arabic script and its problems, the various collocation types of the tokens including the compositional, non-compositional and the new semi compositional constructions are described in the present paper. Then, to illustrate these constructions, the static and dynamic multi-token units will be presented for the generative and non-generative structures of the main categories including the verbs, infinitives, prepositions, conjunctions, adverbs, adjectives and nouns. Defining the multi-token unit templates for these categories is one of the important results of this research. The findings can be input to the segmentation module of the Persian Treebank generator system. The other usage of the present research is in the design and implementation of the morphological analyzers and syntactical parsers.

کلیدواژه ها:

Persian script ، orthographic variation ، morphological and syntactic annotations ، Persian Treebank generator system ، syntactical parsers ، morphological analyzers

نویسندگان

Masoud Sharifi Atashgah

Department of Literature and Human Science University,Tehran University,Tehran,Iran

Mahmoud Bijankhan

Department of Literature and Human Science University,Tehran University,Tehran,Iran

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1426635

شناسه ملی سند علمی:

JR_ITRC-1-3_004

تاریخ نمایه سازی: 23 فروردین 1401

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Sharifi Atashgah, Masoud and Bijankhan, Mahmoud,1388, Corpus-Based Analysis for Multi-Token Units in Persian ,https://civilica.com/doc/1426635

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1388, Sharifi Atashgah, Masoud؛ Mahmoud Bijankhan)
برای بار دوم به بعد: (1388, Sharifi Atashgah؛ Bijankhan)
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

علم سنجی و رتبه بندی مقاله

مشخصات مرکز تولید کننده این مقاله به صورت زیر است:

رتبه علمی دانشگاه تهران

نوع مرکز: دانشگاه دولتی

تعداد مقالات: 120,729

در بخش علم سنجی پایگاه سیویلیکا می توانید رتبه بندی علمی مراکز دانشگاهی و پژوهشی کشور را بر اساس آمار مقالات نمایه شده مشاهده نمایید.