Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP

Mahsa Khajeh Zadeh; Meisam Zaifar

Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP

محل انتشار: دومین کنفرانس ملی تحول دیجیتال و سیستم های هوشمند

سال انتشار: 1403

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 189

فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2040043

شناسه ملی سند علمی:

DTIS02_039

تاریخ نمایه سازی: 14 مرداد 1403

چکیده مقاله:

Manually detecting plagiarism in the huge volume of published documents is not feasible.Existing automatic plagiarism detection tools mostly focus on lexical matching, missing semantic andsyntactic aspects of plagiarism. A challenging area of plagiarism detection is the semantic area which is thecombination of lexical and syntactic conversions. NLP can be exploited to analyze the semantic similarityand detect document plagiarism. Hybrid methods, made by a combination of different kinds of algorithms,have proven to be more comprehensive. In this study an existing hybrid similarity algorithm is improvedand a plagiarism detection method and plagiarism score is defined to compare document plagiarism levels.The results on MASRP dataset show a few percent improvement in all similarity evaluation criteria,including accuracy, precision, recall and F-measure. Moreover, the document plagiarism score shows agood reflection of the amount of plagiarism detected in the documents. Our tests on CPSA corpus verifythat the defined plagiarism score correlates to the level of plagiarism in the suspicious document.

کلیدواژه ها:

نویسندگان

Mahsa Khajeh Zadeh

Meisam Zaifar