Exploiting Fine-Tuning Vulnerabilities in Large Language Models
محل انتشار: همایش بین المللی هوش مصنوعی و تمدن آینده
سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 169
فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
ICAII01_074
تاریخ نمایه سازی: 19 اسفند 1403
چکیده مقاله:
Large Language Models have become the face of modern AI, finding applications in almost all fields with great aptitude. However, large language models are more vulnerable now, as fine-tuning introduces vulnerabilities that make them less secure and robust. This work will discuss the security of LLMs, focusing on those vulnerabilities introduced in the fine-tuning process. We show in this paper how such vulnerabilities can be used to implement and test existing exploits to compromise integrity, bypass safety, and increase model vulnerability to adversarial attacks. To mitigate these risks, we present practical hardening techniques to harden the fine-tuning process and discuss their performances. Results show that hardening techniques provide better security of the model with only limited degradation in performance on downstream tasks. Comparing the model's behavior before and after the application of those showed considerable improvements in resiliency to exploits. This research underlines the importance of robust fine-tuning practices for the mitigation of emerging vulnerabilities in light of their eventual safe deployment.
کلیدواژه ها:
نویسندگان
Fatemeh Zahra Arshia
Faculty of Electrical and Computer Engineering, Malek Ashtar University of Technology
Saeedeh Sadat Sadidpour
Faculty of Electrical and Computer Engineering, Malek Ashtar University of Technology