Exploring the Role of Artificial Intelligence in Language Assessment: Assessing ChatGPT's Reliability in Grading IELTS Writing Task ۲

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 118

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

LLCSCONF22_055

تاریخ نمایه سازی: 17 مهر 1404

چکیده مقاله:

The integration of Artificial Intelligence (AI) in language assessment has become increasingly prevalent, especially with the emergence of large language models like ChatGPT. However, the reliability and accuracy of AI-driven grading systems in high-stakes language proficiency tests, such as the International English Language Testing System (IELTS), remain largely unexamined. This study aims to investigate the alignment between scores assigned by ChatGPT-۴ and those given by human IELTS raters for Writing Task ۲. A dataset of ۵۵ authentic IELTS writing samples was used to conduct a comparative analysis. Several statistical tests, including Wilcoxon, Intraclass Correlation Coefficient (ICC), and Rater Agreement tests, were employed to assess the consistency, agreement, and accuracy of the AI model’s grading in comparison to human evaluations. Results indicate a high degree of alignment, with an ICC value of ۰.۸۱۴ and a weighted kappa of ۰.۸۱۱, suggesting that ChatGPT-۴’s grading closely mirrors human raters in most cases. However, discrepancies were found in certain individual cases, highlighting areas where AI scoring may still require refinement. The findings suggest that while ChatGPT has potential as a supplementary grading tool, further research is needed to address its limitations and ensure fairness in automated language assessment systems.

نویسندگان

Ebrahim Fakhri Alamdari

Department of Foreign Languages, QaS.C., Islamic Azad University, Qaemshahr, Iran

Shideh Nahavandi

Department of Foreign Languages, QaS.C., Islamic Azad University, Qaemshahr, Iran