Comprehensive Survey on Evaluation Methods for Large Language Models

Pouya Faridfar; Roshanak ghasemian; Reza Foroughnia

Comprehensive Survey on Evaluation Methods for Large Language Models

محل انتشار: نهمین همایش بین المللی مهندسی فناوری اطلاعات، کامپیوتر و مخابرات ایران

سال انتشار: 1404

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 94

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2636710

شناسه ملی سند علمی:

ICTBC09_043

تاریخ نمایه سازی: 26 خرداد 1405

چکیده مقاله:

Large Language Models (LLMs) have rapidly gained prominence in both research and industry due to their remarkable performance across diverse linguistic and cross-disciplinary tasks. This rapid progress highlights the need for systematic evaluation, not only in terms of technical performance but also regarding ethical, social, and safety concerns. This survey provides a comprehensive overview of evaluation approaches for LLMs along three dimensions: what to evaluate, where to evaluate, and how to conduct evaluation. It reviews tasks such as natural language processing, reasoning, medicine, education, social sciences, and agent-based applications. In addition, it summarizes common datasets, benchmarks, and metrics, while identifying both success cases and limitations of current models. Finally, the paper outlines future challenges, including the design of new evaluation protocols, handling bias, ensuring factual reliability, and improving trustworthiness, aiming to guide the development of more capable, safe, and reliable LLMs.

کلیدواژه ها:

Large Language Models (LLMS) ، Evaluation Methods ، Benchmarks ، Trustworthiness ، Bias and Robustness

نویسندگان

Pouya Faridfar

Department of computer engineering, Mashhad branch, Islamic Azad University, Mashhad, Iran.

Roshanak ghasemian

Independent Researcher, Mashhad, Iran.

Reza Foroughnia

Department of computer engineering, Mashhad branch, Islamic Azad University, Mashhad, Iran.