ETwins: Enhanced Twins Transformer for Facial Expression Recognition
سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 71
فایل این مقاله در 13 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
EITCONF03_259
تاریخ نمایه سازی: 18 فروردین 1404
چکیده مقاله:
Facial expression recognition (FER) requires distinguishing between subtle visual differences, making it a challenging fine-grained classification task. This paper presents ETwins, an Enhanced Twins transformer architecture specifically tailored for facial expression recognition. ETwins builds on the Twins vision transformer and incorporates three key improvements: a weighted global average pooling layer that emphasizes critical facial regions, the integration of a class token for capturing both local and global features, and an enhanced positional encoding mechanism to embed essential contextual information. Our evaluations on widely used FER datasets demonstrate that ETwins achieves improved accuracy compared to larger baseline models while using fewer parameters. This result underscores the efficiency of custom vision transformers for fine-grained tasks, achieving competitive FER performance with a compact, parameter-efficient design.
کلیدواژه ها:
نویسندگان
Ali Mohammad Pazandeh
Electrical Engineering Department, Sharif University of Technology, Azadi Ave., Tehran, Iran
Emad Fatemizadeh
Electrical Engineering Department, Sharif University of Technology, Azadi Ave., Tehran, Iran