Fine-Tuning BERT for Persian Poet Identification

سال انتشار: 1400
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 288

فایل این مقاله در 7 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

EMAA20_015

تاریخ نمایه سازی: 16 اسفند 1400

چکیده مقاله:

In this experiment, we have attempted the task of poet identification for four prominent Persian poets (i.e., Hafez, Omar Khayyam, Rumi, and Saadi Shirazi) by fine-tuning the BERT language representation model on a dataset of hemistichs. Among the challenges of this task was the imbalanced distribution of the hemistichs, with one class containing more than ۵۲۰۰۰ hemistichs while another class contained less than ۱۳۰۰ hemistichs. Moreover, the short length of the hemistichs made this task more challenging than poet identification using a whole poem or even a verse. It was also demonstrated that the diction used by the poets was similar to some degree, which further added to the challenge of the task at hand. The model attained a Matthews correlation coefficient of ۰.۶۴۶ on the test set, and the effectiveness of transfer learning in processing works of literature was demonstrated even in case of unsubstantial data, an imbalanced dataset, and similar diction.

نویسندگان

Soroosh Akef

Languages and Linguistics Center, Sharif University of Technology,

Mohammad Bahrani

Department of Computer, Allameh Tabataba'i University,