Detecting Machine-Generated Text in Academic Writing: Stylometric Fingerprinting of Humans and Large Language Models for Authorship Verification

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 18

فایل این مقاله در 21 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICAICS01_039

تاریخ نمایه سازی: 19 خرداد 1405

چکیده مقاله:

The proliferation of Large Language Models (LLMs) such as GPT-۴ and ChatGPT has introduced significant challenges to maintaining academic integrity, necessitating robust methods for distinguishing human-written texts from machine-generated content. This paper provides a comprehensive review of contemporary techniques for detecting machine-generated text in academic writing, with a particular focus on stylometric fingerprinting and machine learning approaches. We systematically analyze key methodologies, including stylometric analysis (lexical diversity, syntactic complexity, and punctuation patterns), psycholinguistic mapping, and the trigram-cosine delta metric. Furthermore, we examine advanced machine learning models such as supervised classification, ensemble learning, and Graph Neural Networks (GNNs) integrated with pre-trained language models. The review also explores multilingual and cross-domain detection strategies, benchmark datasets, and performance evaluation metrics. Despite high accuracy rates reported in recent studies (up to ۹۸%), significant challenges remain regarding generalizability across different LLMs and domains, computational efficiency, and ethical considerations related to privacy. The paper concludes that integrating stylometric analysis with advanced machine learning offers a promising pathway for safeguarding academic integrity, while emphasizing the need for continued research to address existing limitations.

نویسندگان

Abdollah Givechi

PhD Student in Islamic Philosophy and Theology, Imam Sadiq University, Tehran, Iran