Exploring the Role of Artificial Intelligence in Language Assessment: Assessing ChatGPT's Reliability in Grading IELTS Writing Task ۲
سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 118
فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
LLCSCONF22_055
تاریخ نمایه سازی: 17 مهر 1404
چکیده مقاله:
The integration of Artificial Intelligence (AI) in language assessment has become increasingly prevalent, especially with the emergence of large language models like ChatGPT. However, the reliability and accuracy of AI-driven grading systems in high-stakes language proficiency tests, such as the International English Language Testing System (IELTS), remain largely unexamined. This study aims to investigate the alignment between scores assigned by ChatGPT-۴ and those given by human IELTS raters for Writing Task ۲. A dataset of ۵۵ authentic IELTS writing samples was used to conduct a comparative analysis. Several statistical tests, including Wilcoxon, Intraclass Correlation Coefficient (ICC), and Rater Agreement tests, were employed to assess the consistency, agreement, and accuracy of the AI model’s grading in comparison to human evaluations. Results indicate a high degree of alignment, with an ICC value of ۰.۸۱۴ and a weighted kappa of ۰.۸۱۱, suggesting that ChatGPT-۴’s grading closely mirrors human raters in most cases. However, discrepancies were found in certain individual cases, highlighting areas where AI scoring may still require refinement. The findings suggest that while ChatGPT has potential as a supplementary grading tool, further research is needed to address its limitations and ensure fairness in automated language assessment systems.
کلیدواژه ها:
نویسندگان
Ebrahim Fakhri Alamdari
Department of Foreign Languages, QaS.C., Islamic Azad University, Qaemshahr, Iran
Shideh Nahavandi
Department of Foreign Languages, QaS.C., Islamic Azad University, Qaemshahr, Iran